Four open source models exist right now that do something the previous generation struggled with. They do not just generate speech. They clone a voice from a short audio sample and produce output that is genuinely difficult to compare from the original speaker.

The gap between open source and commercial TTS has been closing for a while. These four models suggest it has effectively closed for voice cloning specifically.
https://firethering.com/open-source-tts-voice-cloning/

#opensource #tts #ai #trending #texttospeech

4 Open-Source TTS Models That Can Clone Voices and Actually Sound Human

Voice cloning used to mean expensive studio software, proprietary APIs with per-character pricing, or models so heavy they needed server infrastructure just to run. That changed quietly over the last few months. Four open source models exist right now that do something the previous generation struggled with. They do not just generate speech. They clone a voice from a short audio sample and produce output that is genuinely difficult to compare from the original speaker. The gap between open source and commercial TTS has been closing for a while. These four models suggest it has effectively closed for voice cloning specifically. Here is what each one actually does and who it is for.

Firethering

Gesucht: Dokumente-Vorlese-App
#tts #texttospeech #text_to_speech
evtl #blind #disabled #screenreader

Fürs handsfree Studium such ich ne App, die mir pdf vorliest. Und zwar so, dass ich keine akustische Vollkrise bekomme und nicht meine Inhalte an eine BigTech-Firma weitergebe.

Ich habe SherpaTTS gefunden, aber mit welchem Programm nutze ich das auf dem Handy? Oder was gibt es als Alternative für Linux-PC/Firefox?

(an dieser Stelle: #barrierefreiheit sollte generell voran gebracht werden!)

Все переводчики речи в реальном времени — херня. Я написал свой. Тоже херня, но бесплатная

Перепробовал всё что есть на рынке, потратил на подписки больше чем на кофе, и в итоге сел писать с нуля. Вот что вышло AI Open Source Voice AI Real-time перевод Deepgram Groq Piper TTS STT TTS LLM Google Meet Zoom Личный опыт Elixir Rust macOS Apple Silicon Speech-to-Text Text-to-Speech Сижу на рабочем созвоне. Обсуждаем архитектуру нового сервиса. Технически я всё понимаю - документацию на английском читаю без словаря, код ревьюю, в Slack переписываюсь нормально. А вот когда надо открыть рот и сказать что-то сложнее "I agree" - начинается цирк. Пауза. Подбираю слова. Коллега уже ответил за меня. Знакомо? Мне - до зубного скрежета. Я CTO, последние годы плотно работаю с AI-интеграциями. Могу собрать систему автоматического обзвона клиентов с клонированием голосов, поднять флот ботов для скана Телеги, собрать архитектуру которая выдержит тысячи пользователей за копейки. А сам на созвоне звучу как иностранец с разговорником. Ирония уровня бог. И вот в голове простая картинка: я говорю по-русски, собеседник слышит английский. Он отвечает по-английски, я слышу русский. В реальном времени. Без пауз на 10 секунд. Без субтитров - именно голосом. С любым приложением: Meet, Zoom, Slack, Discord. Пошёл искать. И тут началось.

https://habr.com/ru/articles/1019458/

#realtime_communications #translations #speechtotext #texttospeech #deepgram #groq #elixir #rust #open_source #voice_ai

Все переводчики речи в реальном времени — херня. Я написал свой. Тоже херня, но бесплатная

Перепробовал всё что есть на рынке, потратил на подписки больше чем на кофе, и в итоге сел писать с нуля. Вот что вышло AI Open Source Voice AI Real-time перевод Deepgram Groq Piper TTS STT TTS LLM...

Хабр
https://handy.computer/ - Talk to any text field with Handy. #TextToSpeech #OpenSource
Handy

Handy is a cross platform, open-source, speech-to-text application for your computer

Handy

LLM SPEECH TECH SEES SHIFTS

AI researchers are working to fix accent issues in new AI speech technology. This could mean better voices for many languages by 2025.

#AISpeechTech, #LLM, #TextToSpeech, #AccentLeak, #LanguageAI

https://newsletter.tf/ai-speech-tech-fix-accent-problems-languages/

New AI speech tech aims to fix accent problems in multiple languages by 2025

AI researchers are working to fix accent issues in new AI speech technology. This could mean better voices for many languages by 2025.

NewsletterTF

New AI speech systems can have accent problems when speaking different languages. This is like trying to speak two languages at once and mixing them up.

#AISpeechTech, #LLM, #TextToSpeech, #AccentLeak, #LanguageAI
https://newsletter.tf/ai-speech-tech-fix-accent-problems-languages/

New AI speech tech aims to fix accent problems in multiple languages by 2025

AI researchers are working to fix accent issues in new AI speech technology. This could mean better voices for many languages by 2025.

NewsletterTF

Mistral releases Voxtral TTS, an open-weight speech synthesis model across nine languages

Mistral AI has released Voxtral TTS, an open-weight text-to-speech model with multilingual support and instant voice cloning. Available via API, Le Chat, and Hugging Face.

https://yoota.it/en/mistral-releases-voxtral-tts-an-open-weight-speech-synthesis-model-across-nine-languages/

Mistral Voxtral TTS: The Open-Weight Voice Model That Just Beat ElevenLabs (Full Guide 2026)

Mistral just released Voxtral TTS — an open-weight 4B text-to-speech model with 90ms latency, zero-shot voice cloning from 2 seconds of audio, and human evaluation scores that o...

https://wowhow.cloud/blogs/mistral-voxtral-tts-open-source-beats-elevenlabs-2026

#wowhow #mistral #voxtral #texttospeech

Mistral Voxtral TTS: The Open-Weight Voice Model That Just Beat ElevenLabs (Full Guide 2026)

Mistral Voxtral TTS is a free, open-weight 4B TTS model that beats ElevenLabs on naturalness. Self-host it, clone any voice in 2 seconds. Full guide 2026.

For any fans of retro speech synthesis, a legend has spoken at last - well at least a legend for me. In the early 00's, Wirtualna Polska, one of the leading web portals providing free-of-charge mailboxes, daily news, TV schedule and many things that made Internet attractive at that time, has released their own speech synthesizer dubbed, very creatively, Syntezator Mowy WP. The main goal of it was to read out messages and contact status changes in their instant messaging app, WP Kontakt, later Spik, pronounced like the English word "speak". It was an exe running in the system tray, waiting to be sent text. It wasn't compatible with any SAPI version and other than WP Kontakt, it was only ever used in various instant messaging software, mostly through third-party plugins. It's based off Festival but the voice base belonged to WP. Years have passed, the instant messaging scene in Poland, once ripe with local products, was taken over by the international solutions the likes of Facebook Messenger and Whatsapp and so Spik and the accompanying TTS ceased being developed further. The installers for both the male and female voices are hard to come by (the male one is still hosted by several general purpose websites with installers for all sorts of things but I still pulled it off one of the old copies of the original website that the Internet Archive has to offer). The link for the female voice is dead there and the only single lead I got by Googling the original installer's file name is a web folder of someone on a Polish file hosting website - sadly, this one's expired too, so unless someone's got a copy of it locally, it may very well be lost media. The original software can be installed but won't speak. Thanks to one friend tinkering a little, it started speaking through all sorts of modern things and the attached recording is the male voice reading huge numbers - it could actually go upto heptillions. #RetroTech #Accessibility #Blind #TextToSpeech

Теперь silero-tts v5 на русском языке умеет задавать вопросы

Мы недавно писали про обновление нашего публичного синтеза, silero-tts . В прошлый раз мы существенно увеличили скорость, качество и добавили поддержку омографов. В этот раз мы хотим вас порадовать особенной фичей, которая в большинстве случаев стабильно не работает даже в моделях синтеза, которые требуют для своей работы на 3-4 порядка больше вычислительных ресурсов и современные серверные видеокарты (наш синтез запускается даже на слабых процессорах). Как вы догадались, эта фича — это постановка вопросов . Хочу послушать вопросы

https://habr.com/ru/articles/1015942/

#silero #синтез_речи #tts #texttospeech #нейросети #синтезатор_речи #русский_язык #ударение #омографы #вопросы

Теперь silero-tts v5 на русском языке умеет задавать вопросы

Созрел вопрос Мы недавно писали про обновление нашего публичного синтеза, silero-tts . В прошлый раз мы существенно увеличили скорость, качество и добавили поддержку омографов. В этот раз мы хотим вас...

Хабр