Yuancheng Wang

Yuancheng Wang is a second-year Ph.D. student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), SDS, supervised by Professor Zhizheng Wu. before that, he received my B.S. degree at CUHK-Shenzhen. He also collaborates with Xu Tan from Microsoft Research Asia.
His research interest includes text-to-speech synthesis, text-to-audio generation, and unified audio representation and generation. He is one of the main contributors and leaders of the open-source Amphion toolkit. He has developed some advanced TTS models, including NaturalSpeech 3, MaskGCT.
news
Jan 25, 2025 | 🎉 MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer have been accepted to ICLR 2025! |
---|---|
Oct 28, 2024 | 🔥 We released code (2.5k+ stars in one week) and checkpoints of MaskGCT, which has been used in Quwan All Voice. |
Sep 20, 2024 | 🎉 Our paper, SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words, got accepted by NeurIPS 2024! |
Aug 25, 2024 | 🎉 Our papers, Amphion and Emila, got accepted by IEEE SLT 2024! |
Jul 28, 2024 | 🔥 We released Emila: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation, with 101k hours of speech in six languages and features diverse speech with varied speaking styles. |
May 15, 2024 | 🎉 Our paper Factorized Diffusion Models are Natural and Zero-shot Speech Synthesizers, aka NaturalSpeech 3, got accepted by ICML 2024 as an Oral presentation! |
Nov 26, 2023 | 🔥 We released Amphion v0.1 |
Sep 20, 2023 | 🎉 My first paper about audio generation and editing AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models got accepted by NeurIPS 2023! |
selected publications
- ICLR 2025MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer2025TL;DR: A fully non-autoregressive large-scale zero-shot TTS model eliminates the need for phone-level duration prediction.
- NeurIPS 2024SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words2024TL;DR: We propose a benchmark dataset to evaluate spoken dialogue understanding and generation.
- IEEE SLT 2024Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit2024TL;DR: We develop a unified toolkit for audio, music, and speech generation.
- IEEE SLT 2024Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation2024TL;DR: We collect a 10w hours in-the-wild speech dataset for speech generation.
internships
ByteDance | Research Intern · Shenzhen, China · 2024.05 - Present Speech Understanding |
---|---|
Microsoft Research Asia | Research Intern · Beijing, China · 2022.12 - 2023.06 Developed on audio generation & editing and larger scale text-to-speech synthesis. Audio Generation & Editing Speech Synthesis |