Mastodawn

王永帥🍥Nov 26, 2024

#每日推薦 OuteTTS-0.2-500M: 基於音訊提示的多語言文本轉語音開源模型

「基於 Qwen-2.5-0.5B 開發的 500M 參數開源語音合成模型，透過音訊提示技術實現自然流暢的語音生成，支持英語和中日韓等多語言，並具備聲音複製功能」

「主要特點和改進」
- 使用音訊提示(audio prompts)的方式進行語音合成
- 相比上一版本有顯著提升：
- 提高了語音合成的準確性和連貫性
- 語音更加自然流暢
- 擴大了詞彙量(超過50億音訊提示標記)
- 改進了聲音複製能力
- 新增了中文、日語和韓語的實驗性支持

「技術細節」
- 支持 bfloat16，可以使用 flash attention 2 進行最佳化
- 訓練數據集包括：Emilia-Dataset、LibriTTS-R、Multilingual LibriSpeech (MLS)

「應用場景」
- 文本轉語音
- 聲音複製
- 多語言語音合成（英文為主，中日韓為實驗性支持）

模型： huggingface.co/OuteAI/OuteTTS-0.2-500M

#OuteTTS

michabbb Nov 6, 2024

🎯 #OuteTTS introduces a novel approach to text-to-speech synthesis using pure #languagemodeling
🔧 Built on #LLaMa architecture with just 350M parameters, featuring:

Zero-shot #voicecloning capability
Integration with #WavTokenizer (75 tokens/sec)
Local deployment via #llamacpp
#GGUF format compatibility

🔍 Technical Implementation:

Audio tokenization process
CTC forced alignment
Structured prompt system
Temperature-adjustable outputs

⚠️ Current Limitations:

Limited vocabulary range
String-only input support
Best performance with shorter sentences
Variable temperature sensitivity

https://github.com/edwko/OuteTTS
https://huggingface.co/OuteAI/OuteTTS-0.1-350M

GitHub - edwko/OuteTTS: Interface for OuteTTS models.

Interface for OuteTTS models. Contribute to edwko/OuteTTS development by creating an account on GitHub.

GitHub

王永帥🍥Nov 6, 2024

#開源分享一個用純語言建模方法實現的TTS項目：OuteTTS，3.5億參數，實現了高品質語音合成

文本轉語音
支持聲音複製
可調整參數
適合短句輸入，較長文本建議分段處理

部落格： outeai.com/blog/OuteTTS-0.1-350M
專案地址： github.com/edwko/OuteTTS
模型： huggingface.co/OuteAI/OuteTTS-0.1-350M

#TTS #OuteTTS