#開源分享 F5-TTS, 10秒克隆人音,太強了

專案地址: github.com/SWivid/F5-TTS

#F5TTS

🔊 #F5TTS: New non-autoregressive #TextToSpeech system

• Uses flow matching with #DiffusionTransformer (#DiT)
• Employs #ConvNeXt for refined text representation
• Introduces Sway Sampling strategy for improved performance & efficiency
• Achieves 0.15 Real-Time Factor (#RTF), faster than state-of-the-art diffusion-based TTS models
• Trained on 100K hours multilingual dataset
• Demonstrates zero-shot ability, code-switching capability, and speed control

Key features:
📊 Faster training
🌐 Multilingual support
🔄 Seamless code-switching
⏩ Efficient speed control

Demo, code, and checkpoints available at: https://swivid.github.io/F5-TTS

#AI #MachineLearning #Speech #NLP

F5-TTS

#開源分享 中國上海交通大學開源了一個非常牛批的語音生成模型 F5-TTS。

剛好這幾天AI音訊和播客火爆,這下瞌睡送枕頭了。

模型特點有:

零樣本 (Zero-shot) 聲音複製
速度控制(基於總時長)
可以控制合成語音的情感表現
長文本合成
支持中文和英文多語言合成
在 10 萬小時數據上訓練
最重要的是支持商用
論文:arxiv.org/abs/2410.06885
模型下載: huggingface.co/SWivid/F5-TTS
示範Demo: huggingface.co/spaces/mrfakename/E2-F5-TTS
專案地址: github.com/SWivid/F5-TTS

#F5TTS