cv | Yuancheng Wang

Basics

Name	Yuancheng Wang
Label	Ph.D. Student at CUHK(SZ)
Email	yuanchengwang@link.cuhk.edu.cn
Phone	(+86) 189-5643-5965
Url	https://HeCheng0625.github.io/
Summary	A third-year Ph.D. student at CUHK(SZ), interested in speech/audio generation & representation, large language models, and generative AI

Internship

2025.05 - 2025.09

Shenzhen, China
Research Scientist Intern

Meta Superintelligence Labs

Enhancing the speech capabilities of Llama models.
- Speech LLM, Speech Tokenizer
2024.05 - 2025.04

Shenzhen, China
Research Intern

ByteDance

Built a benchmark dataset for spoken dialogue understanding, our work SD-Eval was Accepted to NeurIPS 2024.
- Speech Understanding
2022.12 - 2023.06

Beijing, China
Research Intern

Microsoft Research Asia

Developed on audio generation & editing and larger scale text-to-speech synthesis.
- Audio Generation & Editing
- Speech Synthesis

Volunteer

2024.12 - 2024.12

Macau, China
Student Volunteer

IEEE Spoken Language Technology Workshop 2024

Education

2023.09 - Present

Shenzhen, China
PhD

The Chinese University of Hong Kong, Shenzhen

Computer Science, Artificial Intelligence, Speech
2019.09 - 2023.06

Shenzhen, China
B.S.

The Chinese University of Hong Kong, Shenzhen

Computer Science
2016.09 - 2019.06

Hefei, Anhui, China
High School

Hefei No.1 High School

Awards

2024

Duan Yongping Outstanding Resesearch Award
2019 ~ 2023

CUHK(SZ) The Bowen Scholarship
2023

First prize of Guangdong Province for the Mathematics competition of Chinese College Student

Publications

2025

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

ICLR 2025

A fully non-autoregressive large-scale zero-shot TTS model eliminates the need for phone-level duration prediction.
2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment

ACL 2025

We propose the INTP dataset and extend preference alignment to enhance the intelligibility and overall quality of TTS systems in challenging scenarios.
2024

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

IEEE SLT 2024

We collect a 10w hours in-the-wild speech dataset for speech generation.
2024

Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit

IEEE SLT 2024

We develop a unified toolkit for audio, music, and speech generation.
2024

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

NeurIPS 2024

We propose a benchmark dataset to evaluate spoken dialogue understanding and generation.
2024

Naturalspeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

ICML 2024 Oral

A large-scale zero-shot TTS model achieves on-par quality with human recordings.
2023

AUDIT: Audio Editing by following Instructions with Latent Diffusion Models

NeurIPS 2023

The first audio editing model that can follow natural language instructions.

Skills

	Computer Science & AI
	Python
	PyTorch
	Deep Learning
	Generative Models

Languages

	Chinese
	Native speaker

	English

Interests

	Deep Learning
	Generative Models
	Speech Synthesis
	Speech Language Models
	Reinforcement Learning

Basics

Internship

Meta Superintelligence Labs

Enhancing the speech capabilities of Llama models.

ByteDance

Built a benchmark dataset for spoken dialogue understanding, our work SD-Eval was Accepted to NeurIPS 2024.

Microsoft Research Asia

Developed on audio generation & editing and larger scale text-to-speech synthesis.

Volunteer

IEEE Spoken Language Technology Workshop 2024

Education

The Chinese University of Hong Kong, Shenzhen

Computer Science, Artificial Intelligence, Speech

The Chinese University of Hong Kong, Shenzhen

Computer Science

Hefei No.1 High School

Awards

Publications

ICLR 2025

A fully non-autoregressive large-scale zero-shot TTS model eliminates the need for phone-level duration prediction.

ACL 2025

We propose the INTP dataset and extend preference alignment to enhance the intelligibility and overall quality of TTS systems in challenging scenarios.

IEEE SLT 2024

We collect a 10w hours in-the-wild speech dataset for speech generation.

IEEE SLT 2024

We develop a unified toolkit for audio, music, and speech generation.

NeurIPS 2024

We propose a benchmark dataset to evaluate spoken dialogue understanding and generation.

ICML 2024 Oral

A large-scale zero-shot TTS model achieves on-par quality with human recordings.

NeurIPS 2023

The first audio editing model that can follow natural language instructions.

Skills

Languages

Interests