🔊 #F5TTS: New non-autoregressive #TextToSpeech system
• Uses flow matching with #DiffusionTransformer (#DiT)
• Employs #ConvNeXt for refined text representation
• Introduces Sway Sampling strategy for improved performance & efficiency
• Achieves 0.15 Real-Time Factor (#RTF), faster than state-of-the-art diffusion-based TTS models
• Trained on 100K hours multilingual dataset
• Demonstrates zero-shot ability, code-switching capability, and speed control
Key features:
📊 Faster training
🌐 Multilingual support
🔄 Seamless code-switching
⏩ Efficient speed control
Demo, code, and checkpoints available at: https://swivid.github.io/F5-TTS