Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleWe present Voicebox, a state-of-the-art speech generative model built upon Meta’s non-autoregressive flow matching model. By learning to solve a text-guided speech infilling task with a large scale of data, Voicebox outperforms single purpose AI models across speech tasks through in-context learning. Voicebox can synthesize spe