Mastodawn

王永帥🍥Dec 26, 2024

#開源分享 F5-TTS, 10秒克隆人音，太強了

專案地址： github.com/SWivid/F5-TTS

michabbb Oct 18, 2024

🔊 #F5TTS: New non-autoregressive #TextToSpeech system

• Uses flow matching with #DiffusionTransformer (#DiT)
• Employs #ConvNeXt for refined text representation
• Introduces Sway Sampling strategy for improved performance & efficiency
• Achieves 0.15 Real-Time Factor (#RTF), faster than state-of-the-art diffusion-based TTS models
• Trained on 100K hours multilingual dataset
• Demonstrates zero-shot ability, code-switching capability, and speed control

Key features:
📊 Faster training
🌐 Multilingual support
🔄 Seamless code-switching
⏩ Efficient speed control

Demo, code, and checkpoints available at: https://swivid.github.io/F5-TTS

#AI #MachineLearning #Speech #NLP

F5-TTS

王永帥🍥Oct 13, 2024

#開源分享中國上海交通大學開源了一個非常牛批的語音生成模型 F5-TTS。

剛好這幾天AI音訊和播客火爆，這下瞌睡送枕頭了。

模型特點有：

零樣本 (Zero-shot) 聲音複製
速度控制（基於總時長）
可以控制合成語音的情感表現
長文本合成
支持中文和英文多語言合成
在 10 萬小時數據上訓練
最重要的是支持商用
論文：arxiv.org/abs/2410.06885
模型下載： huggingface.co/SWivid/F5-TTS
示範Demo： huggingface.co/spaces/mrfakename/E2-F5-TTS
專案地址： github.com/SWivid/F5-TTS

#F5TTS