AssemblyAI (@AssemblyAI)

AssemblyAI의 Universal-3 Pro Streaming을 활용해 Claude Code에 음성 모드를 도입했다는 소식입니다. 이제 사용자는 타이핑 대신 음성으로 프롬프트를 입력해, 손을 쓰지 않고도 Claude Code를 사용할 수 있습니다. 음성 인식 정확도와 개발자 생산성 향상 측면에서 주목할 만합니다.

https://x.com/AssemblyAI/status/2042240955030630629

#claudecode #assemblyai #voiceai #developertools #speechrecognition

AssemblyAI (@AssemblyAI) on X

Vibe coding just leveled up. We brought voice mode to Claude Code using AssemblyAI's Universal-3 Pro Streaming. Why type your prompts when you can just say them? You get insane entity accuracy from AssemblyAI and the full power of Claude Code, all hands-free. Here's the full

X (formerly Twitter)

This dataset includes diverse audio samples with accurate transcriptions, covering multiple languages, accents, and real-world environments. Perfect for building and testing Automatic Speech Recognition (ASR), voice assistants, and NLP systems.

With structured annotations and rich metadata, it helps developers create more accurate, scalable, and reliable voice-based AI solutions. 🚀

#SpeechRecognition #AI #MachineLearning

Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. https://hackernoon.com/the-embarrassingly-simple-voice-input-system-running-my-home-server-workflow #speechrecognition
The Embarrassingly Simple Voice Input System Running My Home Server Workflow | HackerNoon

Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python.

RE: https://mastodon.social/@zugaldia/116351933343098498

The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: https://flathub.org/en/apps/io.speedofsound.SpeedOfSound

This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.

Works really well in English. It struggles with French, but who doesn't?!

#Whisper #speechrecognition #GNOME #accessibility #a11y

Omar Sanseviero (@osanseviero)

새 모델이 성능 대비 크기 효율이 매우 뛰어나다고 소개하며, 지난 12개월간의 피드백을 반영해 추론 능력, 멀티모달 이해(OCR·음성 인식·객체 탐지), 긴 컨텍스트, 에이전트 기능 등을 크게 강화했다고 밝혔습니다. 구체적 모델명은 없지만 기술 업데이트 성격이 강합니다.

https://x.com/osanseviero/status/2039736380272570478

#multimodal #ocr #speechrecognition #agenticai #longcontext

Omar Sanseviero (@osanseviero) on X

The team cooked a super impressive model, specially for the sizes! We've incorporated all the feedback from the last 12 months: thinking, expanded multimodal understanding (OCR, speech recognition, object detection), longer context, agentic, and more! https://t.co/llozjYtrkJ

X (formerly Twitter)
🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated
Cohere Transcribe: state-of-the-art speech recognition

Unmatched accuracy and speed. Transcribe converts your business’ audio data into precise text for search, analytics, and automation.

Cohere

AssemblyAI (@AssemblyAI)

의료 상담 음성인식에서 범용 ASR의 한계를 보완하기 위해, Universal-3 Pro 위에 동작하는 ‘Medical Mode’를 소개했다. 단일 파라미터로 활성화하며, 의료 용어 인식에 최적화된 보정 단계로 특정 약물명 같은 전문 용어 오인식을 줄이는 것이 핵심이다.

https://x.com/AssemblyAI/status/2036956122779906310

#asr #medicalai #speechrecognition #llm #healthcare

AssemblyAI (@AssemblyAI) on X

General-purpose ASR: 95%+ accuracy on a clinical consult. Also general-purpose ASR: gets "hydrochlorothiazide" wrong every time. Introducing Medical Mode — a correction pass on top of Universal-3 Pro optimized for medical entity recognition. Enable it with one parameter.

X (formerly Twitter)

AssemblyAI (@AssemblyAI)

임상 워크플로우용 Medical Mode가 공개되었습니다. 일반적인 음성인식 정확도가 높아도 임상에서는 약물명 같은 핵심 토큰 오류 때문에 실사용이 어렵다는 문제를 해결하려는 기능입니다.

https://x.com/AssemblyAI/status/2036822463347302652

#medicalai #speechrecognition #clinicalworkflow #asr #healthcare

AssemblyAI (@AssemblyAI) on X

Medical Mode is now available for clinical workflows. We built Medical Mode because a transcript that's 95% accurate can still be unusable in a clinical setting. Errors in general-purpose ASR are often concentrated on exactly the tokens clinicians care about most: drug names,

X (formerly Twitter)
Categorizing Emacs News items by voice in Org Mode :: Sacha Chua