| Oct 25, 2025 | 🎉 TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling gets the honourable Mention Awards at Nanyang Speech Technology Forum (NYSF) 2025. Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning gets the Best Paper Finalist at APSIPA 2025. |
| Sep 19, 2025 | 🎉 Our paper TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling and Metis: A Foundation Speech Generation Model with Masked Generative Pre-training got accepted by NeurIPS 2025! |
| May 17, 2025 | 🎉 Our paper Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment, got accepted by ACL 2025 main! |
| Jan 25, 2025 | 🎉 MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer have been accepted to ICLR 2025! |
| Oct 28, 2024 | 🔥 We released code (3k+ stars in one week) and checkpoints of MaskGCT, which has been used in Quwan All Voice. |
| Sep 20, 2024 | 🎉 Our paper, SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words, got accepted by NeurIPS 2024! |
| Aug 25, 2024 | 🎉 Our papers, Amphion and Emila, got accepted by IEEE SLT 2024! |
| Jul 28, 2024 | 🔥 We released Emila: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation, with 101k hours of speech in six languages and features diverse speech with varied speaking styles. |
| May 15, 2024 | 🎉 Our paper Factorized Diffusion Models are Natural and Zero-shot Speech Synthesizers, aka NaturalSpeech 3, got accepted by ICML 2024 as an Oral presentation! |
| Nov 26, 2023 | 🔥 We released Amphion v0.1 , which is an open-source toolkit for audio, music, and speech generation. |
| Sep 20, 2023 | 🎉 My first paper about audio generation and editing AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models got accepted by NeurIPS 2023! |