Want to run speech‑AI locally? Learn step‑by‑step how to generate a Hugging Face read token, set up PersonaPlex with NVIDIA models, and export it for offline use. We cover token creation, audio codec (Opus) handling, and quick testing. Boost your open‑source projects with secure access! #HuggingFace #AccessToken #SpeechAI #Opus

🔗 https://aidailypost.com/news/how-create-export-hugging-face-read-token-local-speechai

Perplexity (@perplexity_ai)

Perplexity Computer에 'Voice Mode' 기능이 도입되었습니다. 이제 사용자가 음성으로 명령하고 작업을 수행할 수 있어 대화형 인터페이스로 질의·명령·작업 흐름을 음성으로 제어할 수 있습니다. 간단한 말걸기로 검색·조작·인터랙션이 가능해져 접근성과 생산성이 향상될 것으로 보입니다.

https://x.com/perplexity_ai/status/2029302896026853379

#perplexity #voicemode #speechai #productivity

Perplexity (@perplexity_ai) on X

Introducing Voice Mode in Perplexity Computer. You can now just talk and do things.

X (formerly Twitter)

Voxtral Transcribe 2 from Mistral AI brings open, production ready speech AI to everyone: fast, accurate transcription, solid diarization and support for long, multilingual audio. It is a strong option if you want powerful speech understanding without locking into closed APIs.

#Voxtral #Transcribe2 #MistralAI #SpeechAI #AITranscription #OpenSourceAI #FLOSS

ICONIQ (@ICONIQCapital)

음성 합성 플랫폼 ElevenLabs가 Series D를 발표한 것을 축하하는 트윗으로, 플랫폼이 여러 언어의 음성을 전 세계적으로 접근 가능하게 만든다는 점을 강조합니다. 공동 창업자 @matiii와 @dabkowski_piotr가 의도한 방식대로 음성으로 순간을 기념했다는 내용입니다.

https://x.com/ICONIQCapital/status/2019068892946342163

#elevenlabs #voice #funding #speechai

ICONIQ (@ICONIQCapital) on X

“Congratulations, @elevenlabsio”– spoken in a few of the many languages their platform helps bring to life. As they announce their Series D, it felt right to mark the moment the way @matiii and @dabkowski_piotr intended: globally, accessibly, and through voice.

X (formerly Twitter)

🚀 Demo mới: hệ thống lồng tiếng video‑2‑video chất lượng cao, hiện hỗ trợ dubbing tiếng Anh → Pháp. Pipeline: TIGER (tách âm), WhisperX (diarization & STT), Mistral_Tower (dịch), CosyVoice3 (TTS). Tiếp tục cải thiện giữ nguyên tone & prosody. Mọi ý kiến, đề xuất đều chào đón!

#AI #SpeechAI #VoiceCloning #Dubbing #CôngNghệ #TríTuệNhânTạo #TinCôngNghệ #AIVietnam #VoiceAI #NhậnDạngGiọng #DịchTựĐộng

https://www.reddit.com/r/LocalLLaMA/comments/1qinq1x/we_have_come_a_long_way_in_voice_prosody_cl

Demo hệ thống lồng tiếng video AI chất lượng cao, hiện hỗ trợ dịch tiếng Anh → Pháp. Pipeline: TIGER (tách âm thanh), WhisperX (diarization + STT), Mistral_Tower (dịch), CosyVoice3 (TTS). Tiếng nói chưa giữ được ngữ điệu sau dịch, sẽ cải thiện. Mong nhận ý kiến! #AI #VoiceCloning #SpeechAI #Dubbing #CôngNghệ #AIÂmThanh

https://www.reddit.com/r/LocalLLaMA/comments/1qinq1x/we_have_come_a_long_way_in_voice_prosody_cloning/

Die Stimme kann biometrisch sein:

Aus dem Sprachsignal lässt sich mehr ableiten als Worte – bis hin zu Gesundheit, Bildung und politischen Präferenzen. Und betroffen sind auch Unbeteiligte, wenn ihre Stimme als Hintergrund in Aufnahmen landet.

Konsequenz: Kommunikation konsequent Ende-zu-Ende verschlüsseln – und keine proprietären Sprachassistenten oder Cloud-Transkription nutzen.

https://www.telepolis.de/article/Privatsphaere-endet-wo-das-Sprechen-beginnt-11145260.html

#Datenschutz #Privatsphäre #SpeechAI #Spracherkennung #Überwachung #KI #Biometrie #Datenminimierung #OnDevice #EUAIAct

Privatsphäre endet, wo das Sprechen beginnt

Computer lesen aus der Stimme bald Gesundheit, Bildung und politische Haltung heraus – selbst wenn man gar nicht direkt mitredet.

heise online

ModelScope (@ModelScope2022)

StepFun의 음성 모델 'Step-Audio-R1.1'이 Artificial Analysis Speech Reasoning 리더보드에서 SOTA를 달성했습니다(정확도 96.4%). Grok, Gemini, GPT-Realtime 등을 능가했으며 네이티브 오디오 추론(End-to-End), 오디오-네이티브 CoT, 실시간 처리를 특징으로 합니다.

https://x.com/ModelScope2022/status/2011687986338136089

#speechai #audiomodel #sota #stepaudior1.1

ModelScope (@ModelScope2022) on X

Step-Audio-R1.1 by @StepFun_ai just set a new SOTA on the Artificial Analysis Speech Reasoning leaderboard! 🏆 It outperforms Grok, Gemini, and GPT-Realtime with a 96.4% accuracy rate. ✅ Native Audio Reasoning (End-to-End) ✅ Audio-native CoT (Chain of Thought) ✅ Real-time

X (formerly Twitter)
FOSS Advent Calendar - Door 14: Bring Text to Life with Coqui TTS

Meet Coqui TTS, a powerful, open-source deep learning toolkit for cutting-edge Text-to-Speech. It turns written words into natural, expressive audio using state-of-the-art neural models, all while running completely offline on your own machine.

Coqui TTS supports a wide range of languages and voices, and its real strength lies in flexibility: you can use pre-trained models for instant results or train custom voices with your own datasets. Everything happens locally, your data stays private, no APIs or subscriptions required. Whether for accessibility tools, narration, creative projects, or research, Coqui gives you full control over synthetic speech, from tone and pace to emotional delivery.

Pro tip: Experiment with voice cloning or fine-tune a model for a unique vocal character. With Coqui, you’re not just generating speech you’re crafting it.

Link: https://github.com/coqui-ai/TTS

What would you create with open-source, local TTS-audiobooks, game dialogue, or your own custom assistant voice?

#AdventCalendar #AI #OpenSource #TTS #Python #MachineLearning #CoquiTTS #AIVoices #VoiceSynthesis #LocalAI #FOSS #Privacy #Accessibility #TextToSpeech #CreativeTech #VoiceTech #DeepLearning #ArtificialIntelligence #TechNerds #Innovation #FOSSAdvent #ContentCreation #EthicalAI #VoiceCloning #DevTools #FutureTech #AITools #SpeechAI #linux #ki #adventskalender

Mô hình giọng nói AI của Sesame gây ấn tượng với khả năng biểu cảm, đối thoại tự nhiên và thông minh vượt trội so với Moshi, dù cả hai dùng công nghệ nền tảng tương tự (Mimi, Llama). Cộng đồng đang tìm hiểu điều gì đã tạo nên bước nhảy vọt này: dữ liệu huấn luyện, hàm mất mát, kiến trúc, tích hợp LLM hay quy trình tổng thể?

#AI #SpeechAI #TextToSpeech #SesameAI #MoshiAI #LLM #Technology #TríTuệNhânTạo #GiọngNóiAI #CôngNghệ #MôHìnhNgônNgữ

https://www.reddit.com/r/LocalLLaMA/comments/1paj990/why