Yuancheng Wang

I’m Yuancheng Wang (王远程), a PhD student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), SDS, supervised by Prof. Zhizheng Wu. before that, I received the B.S. degree at CUHK-Shenzhen. My research interests include Multi-modal LLM, Generative AI for Speech and Audio, Post-Training, and Representation Learning. I am currently a research scientist intern at Meta Superintelligence Labs, working on enhancing the speech capabilities of Llama models. Previously, I have also interned at Microsoft Research Asia (MSRA) and ByteDance.
I have developed several advanced TTS models, including NaturalSpeech 3 and MaskGCT, and I am one of the main contributors and leaders of the open-source Amphion Amphion toolkit. My work has been published at top international AI conferences such as NeurIPS, ICML, ICLR, ACL, and IEEE SLT with citations of 760+.
I am looking for a full-time position now, feel free to contact me if you are interested in my experience!
news
May 17, 2025 | 🎉 Our paper Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment, got accepted by ACL 2025 main! |
---|---|
Jan 25, 2025 | 🎉 MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer have been accepted to ICLR 2025! |
Oct 28, 2024 | 🔥 We released code (3k+ stars in one week) and checkpoints of MaskGCT, which has been used in Quwan All Voice. |
Sep 20, 2024 | 🎉 Our paper, SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words, got accepted by NeurIPS 2024! |
Aug 25, 2024 | 🎉 Our papers, Amphion and Emila, got accepted by IEEE SLT 2024! |
Jul 28, 2024 | 🔥 We released Emila: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation, with 101k hours of speech in six languages and features diverse speech with varied speaking styles. |
May 15, 2024 | 🎉 Our paper Factorized Diffusion Models are Natural and Zero-shot Speech Synthesizers, aka NaturalSpeech 3, got accepted by ICML 2024 as an Oral presentation! |
Nov 26, 2023 | 🔥 We released Amphion v0.1 |
Sep 20, 2023 | 🎉 My first paper about audio generation and editing AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models got accepted by NeurIPS 2023! |
selected publications
- PreprintTaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling2025TL;DR: We introduce the Text-aware Diffusion Transformer Speech Codec with the token rate of 6.25 Hz for speech language modeling.
- PreprintMetis: A Foundation Speech Generation Model with Masked Generative Pre-training2025TL;DR: We propose a foundation speech generation model with masked generative pre-training.
- ICLR 2025MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer2025TL;DR: A fully non-autoregressive large-scale zero-shot TTS model eliminates the need for phone-level duration prediction.
- NeurIPS 2024SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words2024TL;DR: We propose a benchmark dataset to evaluate spoken dialogue understanding and generation.
- IEEE SLT 2024Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit2024TL;DR: We develop a unified toolkit for audio, music, and speech generation.
- IEEE SLT 2024Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation2024TL;DR: We collect a 10w hours in-the-wild speech dataset for speech generation.
Full publication list available at: Google Scholar.
internships
Meta, Superintelligence Labs | Research Scientist Intern · California, USA · 2025.05 - Present Speech LLM Speech Tokenizer |
---|---|
ByteDance | Research Intern · Shenzhen, China · 2024.05 - 2025.04 Speech Understanding Speech Language Model |
Microsoft Research Asia | Research Intern · Beijing, China · 2022.12 - 2023.06 Audio Generation Speech Synthesis |
invited talks
Xmart Youth Forum | Towards Natural and Efficient Speech Synthesis: Perspectives on Modeling, Alignment, and Representation Online · 2025.06 I was honored to be invited to give an online talk at the Xmart Youth Forum hosted by the SJTU X-LANCE lab. [Slides] [Video] |
---|---|
NUS Speech and Music AI Workshop | Speech Generation with Masked Generative Modeling NUS, Singapore · 2025.04 I was honored to be invited to give a talk at Professor Ye Wang’s Lab at National University of Singapore. |
OpenMMLab 社区开放麦 | MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Online · 2024.12 I was honored to be invited to give a talk at the joint event organized by OpenMMLab and SpeechHome. [Video] |
SpeechHome AI Tech Salon | NaturalSpeech 3: Speech Disentanglement and Zero-Shot TTS in the Era of Big Data Online · 2024.03 I was honored to be invited to give a talk with Zeqian at the SpeechHome (语音之家) AI Tech Salon. |