publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- ICLR 2025MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer2025TL;DR: A fully non-autoregressive large-scale zero-shot TTS model eliminates the need for phone-level duration prediction.
2024
- NeurIPS 2024SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words2024TL;DR: We propose a benchmark dataset to evaluate spoken dialogue understanding and generation.
- IEEE SLT 2024Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit2024TL;DR: We develop a unified toolkit for audio, music, and speech generation.
- IEEE SLT 2024Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation2024TL;DR: We collect a 10w hours in-the-wild speech dataset for speech generation.
- PreprintFoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds2024
- Preprint
- PreprintRALL-E: Robust Audio Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis2024