news

Oct 25, 2025 🎉 TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling gets the honourable Mention Awards at Nanyang Speech Technology Forum (NYSF) 2025. Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning gets the Best Paper Finalist at APSIPA 2025.
Sep 19, 2025 🎉 Our paper TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling and Metis: A Foundation Speech Generation Model with Masked Generative Pre-training got accepted by NeurIPS 2025!
May 17, 2025 🎉 Our paper Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment, got accepted by ACL 2025 main!
Jan 25, 2025 🎉 MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer have been accepted to ICLR 2025!
Oct 28, 2024 🔥 We released code (3k+ stars in one week) and checkpoints of MaskGCT, which has been used in Quwan All Voice.
Sep 20, 2024 🎉 Our paper, SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words, got accepted by NeurIPS 2024!
Aug 25, 2024 🎉 Our papers, Amphion and Emila, got accepted by IEEE SLT 2024!
Jul 28, 2024 🔥 We released Emila: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation, with 101k hours of speech in six languages and features diverse speech with varied speaking styles.
May 15, 2024 🎉 Our paper Factorized Diffusion Models are Natural and Zero-shot Speech Synthesizers, aka NaturalSpeech 3, got accepted by ICML 2024 as an Oral presentation!
Nov 26, 2023 🔥 We released Amphion v0.1 GitHub stars, which is an open-source toolkit for audio, music, and speech generation.
Sep 20, 2023 🎉 My first paper about audio generation and editing AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models got accepted by NeurIPS 2023!