Giga Launches Realtime Hallucination Correction

Giga는 음성 AI 에이전트의 환각(hallucination) 문제를 실시간으로 1% 미만으로 줄이는 기술을 발표했다. LLM이 텍스트를 생성하는 속도가 음성 출력 속도보다 훨씬 빠른 점을 활용해, 음성 출력과 동시에 별도의 추론 모델로 환각 여부를 검증한다. 환각이 감지되면 음성 출력 전 교체하거나, 이미 출력된 경우에는 즉시 정정 발화를 한다. 이 방식은 음성 대화의 자연스러운 흐름을 유지하면서도 정확도를 크게 개선해 AI 음성 서비스의 신뢰성을 높인다.

https://giga.ai/hallucinations

#hallucination #voiceagent #llm #realtimedetection #tts

Real-Time Hallucination Correction at Zero Latency Cost | Giga

Giga Research: voice agents that catch and correct hallucinations in real time, with zero added latency. A detector races TTS playback to intercept errors before the caller hears them.

Grok TTS: X's Latest TTS Model Sets a New Baseline

xAI가 출시한 Grok TTS는 현재 시장에서 가장 뛰어난 텍스트-음성 변환 모델로 평가받고 있다. 복잡한 발화와 다국어 코드스위칭을 자연스럽게 처리하며, 실시간 음성 에이전트 구축도 매우 간편하고 저렴한 가격에 제공된다. 다만, 음성 클로닝 기능은 미국 지역에 한정되어 있고, 대시보드에서 세밀한 음성 필터링 기능이 부족한 점은 아쉽다. 전반적으로 음성 AI 분야에서 주목할 만한 신기술로, 다양한 응용에 적합하다.

https://techstackups.com/articles/grok-tts-xai-text-to-speech-model/

#texttospeech #tts #voiceagent #multilingual #xai

Grok TTS: X's Latest TTS Model Sets a New Baseline | Tech Stackups

xAI's Grok TTS offers expressive speech tags, a realtime voice agent API, and pricing that undercuts ElevenLabs by 12x. Here's what it actually does.

Можно ли заменить диктора open-source TTS-моделью: тестируем OmniVoice на русском языке

Привет, Хабр! Меня зовут Музафаров Данил, я работаю DS инженером в компании Raft. В этой статье я протестирую OmniVoice - Open Source TTS модель, вокруг которой сейчас много внимания, и проверю, насколько хорошо она справляется с русскоязычными бизнес-сценариями: числами, датами, ФИО, аббревиатурами, смешанным русско-английским текстом, а также длинной озвучкой.

https://habr.com/ru/companies/raft/articles/1031560/

#Texttospeech #TTS

Можно ли заменить диктора open-source TTS-моделью: тестируем OmniVoice на русском языке

Введение Предыдущие статьи про TTS: https://habr.com/ru/companies/raft/articles/991844/ https://habr.com/ru/companies/raft/articles/1023206/ Еще несколько лет назад синтез речи в бизнесе часто...

Хабр

Some improvements to the concatenation, prosody is still missing.

Here is a well known phrase by SCP 079.

The audio contains the same phrase first performed by Dr. Sbaitso TTS and the by Godot reimplementation.

#TTS #DrSbaitso #VoiceSynthesis #TextToSpeech #079 #SCP079 #SCP #Godot

Guten Morgen Fediverse,
Kann mir jemand ein Text-to-speech Tool empfehlen? Also ein Tool um Text in #Audio umzuwandeln. Möglichst ohne es an amerikanische KIs zu schicken. Eine euopäische Variante wäre super, perfekt wäre eine lokal installierbare Variante, die keine Daten abfließen lässt.
Gern teilen.
#KI #TTS #AI

Голосовой агент — это не чатбот с телефоном: 40 часов экономии и $100, сожженные на ботах

Я однажды примерно за сутки сжег около $100 на голосовом агенте. Не на большом запуске. Не на огромной базе. Не на хитрой рекламной кампании. Просто на небольшом пуле холодных контактов, где агент периодически попадал на voicemail, IVR, секретарей и других ботов. В какой-то момент два не очень умных голосовых процесса могли довольно долго вежливо говорить друг другу что-то в духе:

https://habr.com/ru/articles/1031148/

#голосовые_агенты #voice_agents #LLM #Twilio #ElevenLabs #Retell #OpenClaw #STT #TTS #latency

Голосовой агент — это не чатбот с телефоном: 40 часов экономии и $100, сожженные на ботах

Я однажды примерно за сутки сжег около $100 на голосовом агенте. Не на большом запуске. Не на огромной базе. Не на хитрой рекламной кампании. Просто на небольшом пуле холодных контактов, где агент...

Хабр

Dr. Sbaitso compared to my reimplementation in Godot (Sbaitso first)  

Implemented: basic waveform concatenation
Missing: Interpolation, pitch control, prosody, text to phonemes

Im very happy with the progress, will be great to be able to run the voice without needing emulation.

#TTS #DrSbaitso #VoiceSynthesis #TextToSpeech #079 #SCP079

Lucas Meijer (@lucasmeijer)

Gemini TTS의 음성 품질이 매우 좋다고 언급하며, 3분 분량의 긴 텍스트를 여러 청크로 나눠 처리할 때 구간마다 목소리 특성이 달라지는 문제를 지적한다. 긴 음성 합성에서 일관성을 유지하는 방법을 질문하는 내용으로, 구글의 TTS 기술 활용 이슈를 보여준다.

https://x.com/lucasmeijer/status/2050697179111604397

#gemini #tts #speechsynthesis #googleai #audiomodel

Lucas Meijer (@lucasmeijer) on X

gemini TTS is so good, but I need to run it on 3 minutes of text, which is too long. When I split it up in different chunks, the voice sounds different between the different chunks. @OfficialLoganK any tricks?

X (formerly Twitter)

SwitchSynth v1.4.1 - Changelog

Accessibility Fixes:
- Checkboxes in the Languages tab are now properly labeled for screen readers. They are read as a single element instead of a separate checkbox and label.
- The "Use Accessibility Volume" switch is now properly labeled the same way.
- A heading is now shown above the content area announcing which tab you are on (Languages, Voices, or Misc), so screen readers always tell you where you are.

UI Improvements:
- Tabs are now at the bottom of the screen instead of the top.
- Each language in the Voices tab is now a collapsible card. Tap a language to expand its settings panel with voice selection, speech rate, pitch, and volume controls. Everything is contained in one card per language, keeping the interface clean.

Per-Voice Speech Settings:
- Speech rate, pitch, and volume are now configured per voice instead of globally. Each language/script has its own sliders inside its settings card.

eSpeak Fallback:
- When a language is not configured (not ticked and no voice chosen), text in that language now falls back to eSpeak automatically, but only if eSpeak is installed on the device.

Download:
https://github.com/Destranis/SwitchSynthAndroid/releases/tag/v1.4.1

#accessibility #a11y #tts #blind #disabled #android #synthesizer

Release v1.4.1 · Destranis/SwitchSynthAndroid

Full Changelog: v1.4...v1.4.1

GitHub