🎯 #OuteTTS introduces a novel approach to text-to-speech synthesis using pure #languagemodeling
🔧 Built on #LLaMa architecture with just 350M parameters, featuring:
Zero-shot #voicecloning capability
Integration with #WavTokenizer (75 tokens/sec)
Local deployment via #llamacpp
#GGUF format compatibility
🔍 Technical Implementation:
Audio tokenization process
CTC forced alignment
Structured prompt system
Temperature-adjustable outputs
⚠️ Current Limitations:
Limited vocabulary range
String-only input support
Best performance with shorter sentences
Variable temperature sensitivity
https://github.com/edwko/OuteTTS
https://huggingface.co/OuteAI/OuteTTS-0.1-350M