cv
Basics
Name | Yuancheng Wang |
Label | Ph.D. Student at CUHK(SZ) |
yuanchengwang@link.cuhk.edu.cn | |
Phone | (+86) 189-5643-5965 |
Url | https://HeCheng0625.github.io/ |
Summary | A third-year Ph.D. student at CUHK(SZ), interested in speech/audio generation & representation, large language models, and generative AI |
Internship
-
2025.05 - Present Shenzhen, China
Research Scientist Intern
Meta Superintelligence Labs
Enhancing the speech capabilities of Llama models.
- Speech LLM, Speech Tokenizer
-
2024.05 - 2025.04 Shenzhen, China
Research Intern
ByteDance
Built a benchmark dataset for spoken dialogue understanding, our work SD-Eval was Accepted to NeurIPS 2024.
- Speech Understanding
-
2022.12 - 2023.06 Beijing, China
Research Intern
Microsoft Research Asia
Developed on audio generation & editing and larger scale text-to-speech synthesis.
- Audio Generation & Editing
- Speech Synthesis
Volunteer
-
2024.12 - 2024.12 Macau, China
Education
-
2023.09 - Present Shenzhen, China
-
2019.09 - 2023.06 Shenzhen, China
-
2016.09 - 2019.06 Hefei, Anhui, China
Awards
Publications
-
2025 MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
ICLR 2025
A fully non-autoregressive large-scale zero-shot TTS model eliminates the need for phone-level duration prediction.
-
2025 Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
ACL 2025
We propose the INTP dataset and extend preference alignment to enhance the intelligibility and overall quality of TTS systems in challenging scenarios.
-
2024 Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
IEEE SLT 2024
We collect a 10w hours in-the-wild speech dataset for speech generation.
-
2024 Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit
IEEE SLT 2024
We develop a unified toolkit for audio, music, and speech generation.
-
2024 SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
NeurIPS 2024
We propose a benchmark dataset to evaluate spoken dialogue understanding and generation.
-
2024 Naturalspeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
ICML 2024 Oral
A large-scale zero-shot TTS model achieves on-par quality with human recordings.
-
2023 AUDIT: Audio Editing by following Instructions with Latent Diffusion Models
NeurIPS 2023
The first audio editing model that can follow natural language instructions.
Skills
Computer Science & AI | |
Python | |
PyTorch | |
Deep Learning | |
Generative Models |
Languages
Chinese | |
Native speaker |
English | |
Interests
Deep Learning | |
Generative Models | |
Speech Synthesis | |
Speech Language Models | |
Reinforcement Learning |