๐ŸŽค F5-TTS: Vietnamese Text-to-Speech Synthesis.

The model was trained for 250.000 steps with approximately 1000 hours of Vietnamese audio data.

Enter text and upload a sample voice to generate natural speech. CPU inference time may take minutes.

0.3 2