Omar Sanseviero (@osanseviero)

새 모델이 성능 대비 크기 효율이 매우 뛰어나다고 소개하며, 지난 12개월간의 피드백을 반영해 추론 능력, 멀티모달 이해(OCR·음성 인식·객체 탐지), 긴 컨텍스트, 에이전트 기능 등을 크게 강화했다고 밝혔습니다. 구체적 모델명은 없지만 기술 업데이트 성격이 강합니다.

https://x.com/osanseviero/status/2039736380272570478

#multimodal #ocr #speechrecognition #agenticai #longcontext

Omar Sanseviero (@osanseviero) on X

The team cooked a super impressive model, specially for the sizes! We've incorporated all the feedback from the last 12 months: thinking, expanded multimodal understanding (OCR, speech recognition, object detection), longer context, agentic, and more! https://t.co/llozjYtrkJ

X (formerly Twitter)
🎤🤖 Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. 🚀✨
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated
Cohere Transcribe: state-of-the-art speech recognition

Unmatched accuracy and speed. Transcribe converts your business’ audio data into precise text for search, analytics, and automation.

Cohere

AssemblyAI (@AssemblyAI)

의료 상담 음성인식에서 범용 ASR의 한계를 보완하기 위해, Universal-3 Pro 위에 동작하는 ‘Medical Mode’를 소개했다. 단일 파라미터로 활성화하며, 의료 용어 인식에 최적화된 보정 단계로 특정 약물명 같은 전문 용어 오인식을 줄이는 것이 핵심이다.

https://x.com/AssemblyAI/status/2036956122779906310

#asr #medicalai #speechrecognition #llm #healthcare

AssemblyAI (@AssemblyAI) on X

General-purpose ASR: 95%+ accuracy on a clinical consult. Also general-purpose ASR: gets "hydrochlorothiazide" wrong every time. Introducing Medical Mode — a correction pass on top of Universal-3 Pro optimized for medical entity recognition. Enable it with one parameter.

X (formerly Twitter)

AssemblyAI (@AssemblyAI)

임상 워크플로우용 Medical Mode가 공개되었습니다. 일반적인 음성인식 정확도가 높아도 임상에서는 약물명 같은 핵심 토큰 오류 때문에 실사용이 어렵다는 문제를 해결하려는 기능입니다.

https://x.com/AssemblyAI/status/2036822463347302652

#medicalai #speechrecognition #clinicalworkflow #asr #healthcare

AssemblyAI (@AssemblyAI) on X

Medical Mode is now available for clinical workflows. We built Medical Mode because a transcript that's 95% accurate can still be unusable in a clinical setting. Errors in general-purpose ASR are often concentrated on exactly the tokens clinicians care about most: drug names,

X (formerly Twitter)
Categorizing Emacs News items by voice in Org Mode :: Sacha Chua

Chrome extension adjusts video speed based on how fast the speaker is talking

https://github.com/ywong137/speech-speed

#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource

Hands on with AI audio generation: GAI voice, music, and sound effects

This is the second post in a series exploring the multimodal possibilities of generative AI. This series will take a detailed, hype-free look at text, image, audio, video, and code generation and explore the creative potential as well as the ethical concerns of GAI. Although Generative AI isn't a new technology, it's definitely been having a hype moment since the release of ChatGPT in November 2022. Unfortunately, the focus has been squarely on the text-based chatbot at the exclusion of […]

https://leonfurze.com/2023/09/25/hands-on-with-ai-audio-generation-gai-voice-music-and-sound-effects/

Nico Martin (@nic_o_martin)

MistralAI의 Voxtral과 Transformers.js, WebGPU 조합으로 브라우저에서 실시간 음성 전사가 가능해졌다는 발표입니다. 다양한 언어를 지원하며 문장 중간에 언어가 바뀌어도 인식하는 기능을 강조하여 웹 기반 ASR(자동 음성인식)의 저지연·다국어 적용 사례로 의미가 큽니다.

https://x.com/nic_o_martin/status/2032087412462022663

#mistralai #voxtral #transformersjs #webgpu #speechrecognition

🤷 Nico Martin (@nic_o_martin) on X

Voxtral (by @MistralAI) + Transformers.js + WebGPU enables real-time speech transcription in the browser. It also supports a wide range of languages, which are even recognized in the middle of a sentence 🚀

X (formerly Twitter)