[Supertonic 3 - 초경량 온디바이스 TTS 출시, 31개 언어 및 감정 태그 지원

Supertonic 3은 **초경량 온디바이스 TTS(Text-to-Speech) 모델**로, **31개 언어**와 **10가지 감정 태그** (<laugh>, <breath>, <scream> 등)를 지원하며, **99M 파라미터**의 경량 모델로 프라이버시 보장과 네트워크 지연 없이 브라우저/PC/모바일/Raspberry Pi 등에서 실행 가능. **오픈 모델**로 상업적 사용이 가능하며, 발음 정확도와 음성 복제 성능이 개선되었다. Hugging Face와 GitHub에서 모델 및 코드 배포 중.

https://news.hada.io/topic?id=29522

#tts #ondeviceai #multilingualai #opensourceai #edgeai

Supertonic 3 - 초경량 온디바이스 TTS 출시, 31개 언어 및 감정 태그 지원 | GeekNews

한국어 포함 31개 언어 지원감정 태그 신규 지원: <laugh>, <breath>, <scream> 등 10종의 태그를 텍스트에 삽입하여 감정 표현 가능품질 개선: 발음 정확도 향상, 단어 반복/생략 실패 감소, 음성 복제 성능 개선모델 크기: 99M Parameters온디바이스 TTS: 완전한 프라이버시 보장, 네트워크 지연 없음배포 용이성: 브라우저,

GeekNews
So what does this mean for multilinguality? seems like problems go deeper than we initially anticipated Got any ideas on how to fix this? Want to hear more? Feel free to contact through DMs or join our Discord server. 📬
#AIResearch #NLP #LLMs #MultilingualAI

Deepgram released Flux Multilingual, a speech recognition model that handles 10 languages with real-time switching during conversations. The system detects language changes mid-call and processes conversational turns in under 400ms. Available as cloud API or self-hosted at the same price as English-only versions. Could simplify multilingual voice applications that previously required separate detection and routing systems.

#SpeechRecognition #MultilingualAI #VoiceTech

https://www.implicator.ai/deepgram-launches-flux-multilingual-speech-model-with-10-language-mid-call-switching/

Deepgram Launches Flux Multilingual With 10-Language Mid-Cal

Deepgram launched Flux Multilingual, a conversational speech recognition model supporting 10 languages with real-time detection and mid-call code-switching. Uses conversational turn detection at under 400ms. Available as cloud API or self-hosted with EU endpoint support.

Implicator.ai

Microsoft Research (@MSFTResearch)

Microsoft 연구진과 데이터 과학자들이 사용자가 적은 희귀 언어권의 의료, 교육 등 격차를 줄이기 위한 연구를 진행 중이라고 밝혔다. 다국어 AI의 공공 서비스 활용과 언어 불평등 해소에 초점을 맞춘 의미 있는 연구 업데이트다.

https://x.com/MSFTResearch/status/2041611341127901396

#microsoft #multilingualai #healthcare #education

Microsoft Research (@MSFTResearch) on X

For people who speak the world’s most popular languages, AI is helping improve their health care, education and more. But what about the millions who speak less common languages? Microsoft researchers and data scientists are working to close the gap. https://t.co/Q8bOMznopv

X (formerly Twitter)
AI’s fluency in other languages hides a Western worldview that can mislead users − a scholar of Indonesian society explains | The-14

AI speaks many languages but reflects a Western worldview, risking cultural misguidance, says scholar Gareth Barkin on Indonesian society.

The-14 Pictures

Microsoft Research (@MSFTResearch)

저자원 언어를 위한 강건한 음성 모델 구축을 돕는 인터랙티브 플레이북 Paza가 공개됐다. 벤치마킹 도구를 제공해 번역, 데이터 활용, 모델 선택 등 최적의 접근 방식을 판단할 수 있게 해주며, AI 인프라가 부족한 언어권의 음성 AI 개발에 초점을 맞춘다.

https://x.com/MSFTResearch/status/2038949540976316664

#speechmodels #lowresource #paza #benchmarking #multilingualai

Microsoft Research (@MSFTResearch) on X

Not every language has the AI infrastructure others do. Paza is an interactive playbook to help you build robust speech models for low-resource languages with benchmarking tools to help you choose the right approach. https://t.co/S2A2AK1CRT

X (formerly Twitter)

Microsoft Research (@MSFTResearch)

다국어 AI 구축에서 발생하는 어려운 선택을 돕는 인터랙티브 플레이북 Vibhasha가 소개됐다. 번역할지 파인튜닝할지, 단일 모델로 갈지 다중 모델로 갈지 같은 결정을 지원하며, 다양한 언어권 앱을 설계하는 개발자에게 유용한 도구다.

https://x.com/MSFTResearch/status/2039009826907300292

#multilingualai #vibhasha #playbook #finetuning #aiapps

Microsoft Research (@MSFTResearch) on X

Multilingual AI means making hard tradeoffs. Translate or fine-tune? One model or many? Vibhasha is an interactive playbook to help you build multicultural apps and make those decisions with confidence. https://t.co/5MmB5kUIyx

X (formerly Twitter)

👏 Congratulations on this achievement and all the best for Cecilia’s new role as postdoctoral researcher at the University of Cambridge!

#NLP #PhDDefense #MultilingualAI #CulturalAI #LanguageModels #UKPLab #TUDarmstadt Computer Science, TU Darmstadt

AIトレンド速報|最新ニュース & 活用術 (@AI_Bridge_Japan)

Allen AI가 개발한 다국어 OCR 벤치마크 'OlmOCR-Bench'가 Hugging Face의 공식 벤치마크 데이터세트로 공개되었습니다. 이로써 다양한 OCR 모델의 성능을 표준화된 환경에서 비교·평가할 수 있게 되어, 다국어 문자 인식 연구에 중요한 기준을 제시합니다.

https://x.com/AI_Bridge_Japan/status/2026120760423764157

#ocr #benchmark #allenai #huggingface #multilingualai

AIトレンド速報|最新ニュース & 活用術 (@AI_Bridge_Japan) on X

Allen AI (@allen_ai) が開発した多言語OCRベンチマーク「OlmOCR-Bench」が、Hugging Face (@huggingface) の公式ベンチマークデータセットとして公開されました。これにより、様々なOCRモデルの性能を標準化された環境で評価できます。(via @mervenoyann) https://t.co/me42EnGvyN

X (formerly Twitter)
I tried Qwen2.5-Coder-7B-Instruct.Q6_K locally with Ollama as the loader, asking it to create a simple Snake game in Python with Pygame and as an extra challenge, the instructions were given in German.

The game works well: the snake grows correctly, the grid and colors are fine. I just had to give the model a little nudge in two places:

- Don’t change the food color every frame
- Avoid recursive gameLoop() for "Play Again"

Qwen2.5 is a great co-pilot that handles most of the work, leaving only minor bugs to correct. German works surprisingly well ("Schlankkörpers" instead of "Schlangenkörper" does not matter, such errors can also occur in large models from time to time...) even though the main language is English. The model supports many programming languages such as: Python, C, C++, Java, JavaScript, HTML/CSS, Bash, SQL… and more.

Conclusion: It still doesn't work completely without programming knowledge, but as a local assistant Qwen2.5-Coder is excellent.

btw my prompt was: "write the game again."

Video workflow:

- Recorded with OBS
- Edited in Kdenlive
- Transcoded with VAAPI (H.264)

No cloud, real hardware.
Everything runs on Linux + Text Generation Web UI (FOSS), so anyone can set this up.
No GPU? No problem, you can also run it using PyTorch’s CPU backend, just much slower.

Background music: ALICE - CROSS THE BORDER (https://www.youtube.com/watch?v=dcqbWgxW4oU)

#Qwen2 #LLM #LocalAI #Ai #vibecoding #Python #Pygame #CodingAI #FOSS #Linux #SnakeGame #Ollama #AIcoPilot #MultilingualAI #TextGenerationWebUI #OBS #Kdenlive #VAAPI #NoCloud #LocalAIWorkflow