AssemblyAI (@AssemblyAI)

의료 상담 음성인식에서 범용 ASR의 한계를 보완하기 위해, Universal-3 Pro 위에 동작하는 ‘Medical Mode’를 소개했다. 단일 파라미터로 활성화하며, 의료 용어 인식에 최적화된 보정 단계로 특정 약물명 같은 전문 용어 오인식을 줄이는 것이 핵심이다.

https://x.com/AssemblyAI/status/2036956122779906310

#asr #medicalai #speechrecognition #llm #healthcare

AssemblyAI (@AssemblyAI) on X

General-purpose ASR: 95%+ accuracy on a clinical consult. Also general-purpose ASR: gets "hydrochlorothiazide" wrong every time. Introducing Medical Mode — a correction pass on top of Universal-3 Pro optimized for medical entity recognition. Enable it with one parameter.

X (formerly Twitter)

AssemblyAI (@AssemblyAI)

임상 워크플로우용 Medical Mode가 공개되었습니다. 일반적인 음성인식 정확도가 높아도 임상에서는 약물명 같은 핵심 토큰 오류 때문에 실사용이 어렵다는 문제를 해결하려는 기능입니다.

https://x.com/AssemblyAI/status/2036822463347302652

#medicalai #speechrecognition #clinicalworkflow #asr #healthcare

AssemblyAI (@AssemblyAI) on X

Medical Mode is now available for clinical workflows. We built Medical Mode because a transcript that's 95% accurate can still be unusable in a clinical setting. Errors in general-purpose ASR are often concentrated on exactly the tokens clinicians care about most: drug names,

X (formerly Twitter)
Categorizing Emacs News items by voice in Org Mode :: Sacha Chua

Chrome extension adjusts video speed based on how fast the speaker is talking

https://github.com/ywong137/speech-speed

#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource

Hands on with AI audio generation: GAI voice, music, and sound effects

This is the second post in a series exploring the multimodal possibilities of generative AI. This series will take a detailed, hype-free look at text, image, audio, video, and code generation and explore the creative potential as well as the ethical concerns of GAI. Although Generative AI isn't a new technology, it's definitely been having a hype moment since the release of ChatGPT in November 2022. Unfortunately, the focus has been squarely on the text-based chatbot at the exclusion of […]

https://leonfurze.com/2023/09/25/hands-on-with-ai-audio-generation-gai-voice-music-and-sound-effects/

Nico Martin (@nic_o_martin)

MistralAI의 Voxtral과 Transformers.js, WebGPU 조합으로 브라우저에서 실시간 음성 전사가 가능해졌다는 발표입니다. 다양한 언어를 지원하며 문장 중간에 언어가 바뀌어도 인식하는 기능을 강조하여 웹 기반 ASR(자동 음성인식)의 저지연·다국어 적용 사례로 의미가 큽니다.

https://x.com/nic_o_martin/status/2032087412462022663

#mistralai #voxtral #transformersjs #webgpu #speechrecognition

🤷 Nico Martin (@nic_o_martin) on X

Voxtral (by @MistralAI) + Transformers.js + WebGPU enables real-time speech transcription in the browser. It also supports a wide range of languages, which are even recognized in the middle of a sentence 🚀

X (formerly Twitter)
🚀🎤 Behold the future of AI: TADA—a magical acronym promising to talk your ear off with "natural" speech! Who knew aligning text and audio was just a matter of stuffing them into the same #AI blender? 🎧🤖 Guess we can finally say goodbye to the charming quirks of robots muttering nonsense—or so they say... 😂
https://www.hume.ai/blog/opensource-tada #TADA #NaturalSpeech #FutureTech #SpeechRecognition #HackerNews #ngated
Opensourcing TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization

TADA (Text-Acoustic Dual Alignment) is Hume AI's open-source speech-language model that synchronizes text and audio one-to-one.

ElevenLabs: Audio to Text. New Version

https://peertube.eqver.se/w/9QSEZPWAhSv78ScqKmYkk5

testshort_126_en

PeerTube