Akshay (@akshay_pachaar)

단어 단위로 음성을 제어할 수 있는 새로운 100% 오픈소스 TTS 모델 공개. 기존 TTS는 문장 전체의 톤이 바뀌는 한계가 있었지만, 이 모델은 문장 내 특정 단어·구간만 따로 감정·억양을 지정할 수 있어 세밀한 음성 연출이 가능해진다.

https://x.com/akshay_pachaar/status/2033922460551418268

#tts #speechsynthesis #opensource #controllability

Akshay 🚀 (@akshay_pachaar) on X

Finally, you can control speech word by word. (Using a new 100% open-source TTS model) Every TTS system before this had the same core limitation. You'd say "speak in an angry tone" and the whole sentence shifted. There was no way to say "be calm here, then laugh right at this

X (formerly Twitter)

田中義弘 | taziku CEO / AI × Creative (@taziku_co)

LTXStudio의 엔터프라이즈용 립싱크 기술이 175개 언어를 지원하며, 완벽한 립싱크 구현과 함께 각 언어권에서 화자의 보이는 방식까지 최적화한다고 소개합니다. 단순 음성 합성을 넘어서 지역별·언어별 시각적 최적화를 강조하는 비디오 데모를 포함한 발표입니다.

https://x.com/taziku_co/status/2033654374011179249

#ltxstudio #lipsync #localization #speechsynthesis

田中義弘 | taziku CEO / AI × Creative (@taziku_co) on X

175言語対応に加えて、完璧なリップシンクを実現。 @LTXStudioのEnterprise向けのリップシンクは、ただ喋らせるだけでは無く、話者の見え方ごと各言語圏向けに最適化する。 動画の最後は日本語も。是非音声ONで

X (formerly Twitter)

Free download codes:

The 5D-Droid - The Holiest Hole (Deep's Sword Swinging Remix)

"Expect the unexpected in BASS music"

https://getmusic.fm/l/0nrnZb

#ambient #electronic #experimental #idm #downtempo #lofi #beats #triphop #bass #triphop #deepmusic #speechsynthesis #london #music

Today we launch Fish Audio S2, a new generation of expressive TTS with absurdly controllable emotion.

- open-source
- sub 150ms latency
- multi-speaker in one pass

Real freedom of speech starts now

https://x.com/FishAudio/status/2031411140820152560

#tts #speechsynthesis #opensource #lowlatency #multispeaker

Fish Audio (@FishAudio) on X

Today we launch Fish Audio S2, a new generation of expressive TTS with absurdly controllable emotion. - open-source - sub 150ms latency - multi-speaker in one pass Real freedom of speech starts now 👇

X (formerly Twitter)
Local AI Text-to-Speech Demo with Coqui TTS

Coqui TTS is an AI-powered text-to-speech synthesis platform that can automatically convert written text into natural-sounding speech. The system is based on modern deep learning models and can run entirely locally, making it particularly suitable for privacy-friendly applications and offline projects.

In this example, Coqui TTS is used directly through the Python API. This allows the model to be flexibly integrated into custom scripts and controlled automatically, for example to convert text into audio files or to process larger amounts of text.

Since many text-to-speech models can only process very long texts to a limited extent, the input text is divided into smaller sections (chunks) before processing. These are synthesized one after another and then combined into a complete audio output.

In this example, the model is executed locally on the CPU. Although some AI models support GPU acceleration, Coqui TTS can run reliably without specialized hardware and can therefore be used on many different systems.

The audio output generated by the model is initially a raw file. To improve sound quality, additional post-processing is recommended, such as removing clicks or artifacts, slightly smoothing audio transitions, or applying other minor corrections.

The Creepypasta used in this demo is in German and contains disturbing content.

https://creepypasta.fandom.com/de/wiki/Trypophobia

Video workflow:

- Recorded with OBS
- Edited in Kdenlive
- Transcoded with VAAPI (H.264)

No cloud, no API keys, real hardware, just Python.
Everything runs on Linux + Python (FOSS), so anyone can set this up.
No GPU? In this case… it doesn't matter.

#AI #TextToSpeech #CoquiTTS #Python #AIVoice #SpeechSynthesis #foss #LocalAI #OpenSourceAI #AItools #Artificialtelligence #AIDevelopment

We should do a crowdfunding campaign for a "Starcraft Terran siege tank driver" text-to-speech voice for Piper, so that Orca can angrily read GTK widgets at you with this kind of confident and upbeat intonation: https://youtu.be/dtoIv9BzPHk?t=16

#Piper #Orca #TTS #texttospeech #speechsynthesis #GNOME

Siege Tank All Quotes - StarCraft Remastered

YouTube

Dylan Malone (@dylanmalone)

Speaklone은 다양한 악센트를 자연스럽게 구현하는 음성 생성(또는 합성) 앱으로, 약 1분 만에 '뉴욕 발레의 젊은 러시아 발레리나' 캐릭터를 생성할 정도로 표현력과 잠재력이 높습니다. 트윗은 또한 MLX에서 @awnihannun의 공헌을 그리워한다는 언급을 포함해 커뮤니티와 오픈소스 기여자에 대한 감상을 전하고 있습니다.

https://x.com/dylanmalone/status/2028138632419172476

#speechsynthesis #speaklone #voicecloning #ai

Dylan Malone (@dylanmalone) on X

Speaklone does interesting accents. Lots of depth to unlock in these powerful models! Meet this young Russian ballerina in the New York Ballet. Took about a minute to create the character. https://t.co/unlMrcYOud We're going to miss @awnihannun on MLX! He's brought us miracles.

X (formerly Twitter)

MiniMax (official) (@MiniMax_AI)

MiniMax Speech-2.8이 @callmesenseien의 Sensei 음성에 적용되었다는 발표. 작성자는 Hyperbond 팀과 협력해 음성 품질과 감정 표현력을 더욱 향상시켰다고 밝힘.

https://x.com/MiniMax_AI/status/2019696741172572596

#minimax #speechsynthesis #voiceai #tacotron

MiniMax (official) (@MiniMax_AI) on X

Excited to see MiniMax Speech-2.8 powering the Sensei voices in @callmesenseien. Great working closely with the Hyperbond team to push voice quality and emotional expressiveness further.

X (formerly Twitter)

新清士@(生成AI)インディゲーム開発者 (@kiyoshi_shin)

Qwen3-ASR을 자신의 목소리로 시험한 결과를 공유한 트윗입니다. 과거 강연에서 7초를 참조음으로 잘라내어 입력하자, 아스키 기사 텍스트를 읽게 했을 때 단 7초 참조만으로도 그럴듯한 읽기 음성이 생성되었다고 보고하며, 앞부분이 참조음성(7초), 후반이 합성(25초)이라고 설명합니다.

https://x.com/kiyoshi_shin/status/2019236962520158225

#qwen3asr #asr #voicecloning #speechsynthesis

新清士@(生成AI)インディゲーム開発者 (@kiyoshi_shin) on X

Qwen3-ASRを自分の声で試してみた。過去の講演から7秒を切り出し、その声を参照して、アスキー記事を読み上げさせた。わずか7秒なのに、それっぽい読み上げ音声が出来てしまうのは衝撃。最初が参照音声(7秒)、後半が読み上げ音声(25秒)。

X (formerly Twitter)

Pocket TTS proves you don't need a GPU for high-quality text-to-speech. 100M parameters, CPU-only, 200ms latency, voice cloning included. The first local TTS that doesn't compromise.

More details here: https://ostechnix.com/pocket-tts-local-text-to-speech-no-gpu/

#PocketTTS #TTS #TextToSpeech #AI #Python #Opensource #KyutaiLabs #SpeechSynthesis #VoiceCloning

Pocket TTS: High-Quality Local Voice Cloning Without GPU - OSTechNix

Pocket TTS delivers high-quality text-to-speech on standard CPUs. No GPU, no cloud APIs. It is the first local TTS with voice cloning.

OSTechNix