๐ŸŽค๐Ÿค– Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. ๐Ÿš€โœจ
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated
Cohere Transcribe: state-of-the-art speech recognition

Unmatched accuracy and speed. Transcribe converts your businessโ€™ audio data into precise text for search, analytics, and automation.

Cohere

AssemblyAI (@AssemblyAI)

์˜๋ฃŒ ์ƒ๋‹ด ์Œ์„ฑ์ธ์‹์—์„œ ๋ฒ”์šฉ ASR์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด, Universal-3 Pro ์œ„์— ๋™์ž‘ํ•˜๋Š” โ€˜Medical Modeโ€™๋ฅผ ์†Œ๊ฐœํ–ˆ๋‹ค. ๋‹จ์ผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํ™œ์„ฑํ™”ํ•˜๋ฉฐ, ์˜๋ฃŒ ์šฉ์–ด ์ธ์‹์— ์ตœ์ ํ™”๋œ ๋ณด์ • ๋‹จ๊ณ„๋กœ ํŠน์ • ์•ฝ๋ฌผ๋ช… ๊ฐ™์€ ์ „๋ฌธ ์šฉ์–ด ์˜ค์ธ์‹์„ ์ค„์ด๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ด๋‹ค.

https://x.com/AssemblyAI/status/2036956122779906310

#asr #medicalai #speechrecognition #llm #healthcare

AssemblyAI (@AssemblyAI) on X

General-purpose ASR: 95%+ accuracy on a clinical consult. Also general-purpose ASR: gets "hydrochlorothiazide" wrong every time. Introducing Medical Mode โ€” a correction pass on top of Universal-3 Pro optimized for medical entity recognition. Enable it with one parameter.

X (formerly Twitter)

AssemblyAI (@AssemblyAI)

์ž„์ƒ ์›Œํฌํ”Œ๋กœ์šฐ์šฉ Medical Mode๊ฐ€ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ์Œ์„ฑ์ธ์‹ ์ •ํ™•๋„๊ฐ€ ๋†’์•„๋„ ์ž„์ƒ์—์„œ๋Š” ์•ฝ๋ฌผ๋ช… ๊ฐ™์€ ํ•ต์‹ฌ ํ† ํฐ ์˜ค๋ฅ˜ ๋•Œ๋ฌธ์— ์‹ค์‚ฌ์šฉ์ด ์–ด๋ ต๋‹ค๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค.

https://x.com/AssemblyAI/status/2036822463347302652

#medicalai #speechrecognition #clinicalworkflow #asr #healthcare

AssemblyAI (@AssemblyAI) on X

Medical Mode is now available for clinical workflows. We built Medical Mode because a transcript that's 95% accurate can still be unusable in a clinical setting. Errors in general-purpose ASR are often concentrated on exactly the tokens clinicians care about most: drug names,

X (formerly Twitter)
Categorizing Emacs News items by voice in Org Mode :: Sacha Chua

Chrome extension adjusts video speed based on how fast the speaker is talking

https://github.com/ywong137/speech-speed

#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource

Hands on with AI audio generation: GAI voice, music, and sound effects

This is the second post in a series exploring the multimodal possibilities of generative AI. This series will take a detailed, hype-free look at text, image, audio, video, and code generation and explore the creative potential as well as the ethical concerns of GAI. Although Generative AI isn't a new technology, it's definitely been having a hype moment since the release of ChatGPT in November 2022. Unfortunately, the focus has been squarely on the text-based chatbot at the exclusion of [โ€ฆ]

https://leonfurze.com/2023/09/25/hands-on-with-ai-audio-generation-gai-voice-music-and-sound-effects/

Nico Martin (@nic_o_martin)

MistralAI์˜ Voxtral๊ณผ Transformers.js, WebGPU ์กฐํ•ฉ์œผ๋กœ ๋ธŒ๋ผ์šฐ์ €์—์„œ ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ „์‚ฌ๊ฐ€ ๊ฐ€๋Šฅํ•ด์กŒ๋‹ค๋Š” ๋ฐœํ‘œ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์–ธ์–ด๋ฅผ ์ง€์›ํ•˜๋ฉฐ ๋ฌธ์žฅ ์ค‘๊ฐ„์— ์–ธ์–ด๊ฐ€ ๋ฐ”๋€Œ์–ด๋„ ์ธ์‹ํ•˜๋Š” ๊ธฐ๋Šฅ์„ ๊ฐ•์กฐํ•˜์—ฌ ์›น ๊ธฐ๋ฐ˜ ASR(์ž๋™ ์Œ์„ฑ์ธ์‹)์˜ ์ €์ง€์—ฐยท๋‹ค๊ตญ์–ด ์ ์šฉ ์‚ฌ๋ก€๋กœ ์˜๋ฏธ๊ฐ€ ํฝ๋‹ˆ๋‹ค.

https://x.com/nic_o_martin/status/2032087412462022663

#mistralai #voxtral #transformersjs #webgpu #speechrecognition

๐Ÿคท Nico Martin (@nic_o_martin) on X

Voxtral (by @MistralAI) + Transformers.js + WebGPU enables real-time speech transcription in the browser. It also supports a wide range of languages, which are even recognized in the middle of a sentence ๐Ÿš€

X (formerly Twitter)
๐Ÿš€๐ŸŽค Behold the future of AI: TADAโ€”a magical acronym promising to talk your ear off with "natural" speech! Who knew aligning text and audio was just a matter of stuffing them into the same #AI blender? ๐ŸŽง๐Ÿค– Guess we can finally say goodbye to the charming quirks of robots muttering nonsenseโ€”or so they say... ๐Ÿ˜‚
https://www.hume.ai/blog/opensource-tada #TADA #NaturalSpeech #FutureTech #SpeechRecognition #HackerNews #ngated
Opensourcing TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization

TADA (Text-Acoustic Dual Alignment) is Hume AI's open-source speech-language model that synchronizes text and audio one-to-one.