This dataset includes diverse audio samples with accurate transcriptions, covering multiple languages, accents, and real-world environments. Perfect for building and testing Automatic Speech Recognition (ASR), voice assistants, and NLP systems.

With structured annotations and rich metadata, it helps developers create more accurate, scalable, and reliable voice-based AI solutions. ๐Ÿš€

#SpeechRecognition #AI #MachineLearning

Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python. https://hackernoon.com/the-embarrassingly-simple-voice-input-system-running-my-home-server-workflow #speechrecognition
The Embarrassingly Simple Voice Input System Running My Home Server Workflow | HackerNoon

Whisper was too slow. Vosk was inconsistent. The answer was embarrassingly simple: Android speech recognition over local WiFi, and 80 lines of Python.

RE: https://mastodon.social/@zugaldia/116351933343098498

The "Speed of Sound" app by @zugaldia, once you set up a custom global keyboard shortcut that doesn't conflict with GNOME's, is pretty amazing: https://flathub.org/en/apps/io.speedofsound.SpeedOfSound

This is the first time I experience reliable speech recognition for #dictation on the desktop, particularly on #Linux! Until now I had given up on that being a possibility.

Works really well in English. It struggles with French, but who doesn't?!

#Whisper #speechrecognition #GNOME #accessibility #a11y

Omar Sanseviero (@osanseviero)

์ƒˆ ๋ชจ๋ธ์ด ์„ฑ๋Šฅ ๋Œ€๋น„ ํฌ๊ธฐ ํšจ์œจ์ด ๋งค์šฐ ๋›ฐ์–ด๋‚˜๋‹ค๊ณ  ์†Œ๊ฐœํ•˜๋ฉฐ, ์ง€๋‚œ 12๊ฐœ์›”๊ฐ„์˜ ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ˜์˜ํ•ด ์ถ”๋ก  ๋Šฅ๋ ฅ, ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ดํ•ด(OCRยท์Œ์„ฑ ์ธ์‹ยท๊ฐ์ฒด ํƒ์ง€), ๊ธด ์ปจํ…์ŠคํŠธ, ์—์ด์ „ํŠธ ๊ธฐ๋Šฅ ๋“ฑ์„ ํฌ๊ฒŒ ๊ฐ•ํ™”ํ–ˆ๋‹ค๊ณ  ๋ฐํ˜”์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์  ๋ชจ๋ธ๋ช…์€ ์—†์ง€๋งŒ ๊ธฐ์ˆ  ์—…๋ฐ์ดํŠธ ์„ฑ๊ฒฉ์ด ๊ฐ•ํ•ฉ๋‹ˆ๋‹ค.

https://x.com/osanseviero/status/2039736380272570478

#multimodal #ocr #speechrecognition #agenticai #longcontext

Omar Sanseviero (@osanseviero) on X

The team cooked a super impressive model, specially for the sizes! We've incorporated all the feedback from the last 12 months: thinking, expanded multimodal understanding (OCR, speech recognition, object detection), longer context, agentic, and more! https://t.co/llozjYtrkJ

X (formerly Twitter)
๐ŸŽค๐Ÿค– Behold, the latest in buzzword bingo: a speech recognition model that promises to transcribe your every "um" and "uh" with state-of-the-art accuracy! Because clearly, what the modern workplace needs is yet another AI tool to misinterpret your business jargon and turn it into garbled nonsense. ๐Ÿš€โœจ
https://cohere.com/blog/transcribe #speechrecognition #AItools #buzzwordbingo #workplaceinnovation #transcriptiontechnology #HackerNews #ngated
Cohere Transcribe: state-of-the-art speech recognition

Unmatched accuracy and speed. Transcribe converts your businessโ€™ audio data into precise text for search, analytics, and automation.

Cohere

AssemblyAI (@AssemblyAI)

์˜๋ฃŒ ์ƒ๋‹ด ์Œ์„ฑ์ธ์‹์—์„œ ๋ฒ”์šฉ ASR์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด, Universal-3 Pro ์œ„์— ๋™์ž‘ํ•˜๋Š” โ€˜Medical Modeโ€™๋ฅผ ์†Œ๊ฐœํ–ˆ๋‹ค. ๋‹จ์ผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํ™œ์„ฑํ™”ํ•˜๋ฉฐ, ์˜๋ฃŒ ์šฉ์–ด ์ธ์‹์— ์ตœ์ ํ™”๋œ ๋ณด์ • ๋‹จ๊ณ„๋กœ ํŠน์ • ์•ฝ๋ฌผ๋ช… ๊ฐ™์€ ์ „๋ฌธ ์šฉ์–ด ์˜ค์ธ์‹์„ ์ค„์ด๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ด๋‹ค.

https://x.com/AssemblyAI/status/2036956122779906310

#asr #medicalai #speechrecognition #llm #healthcare

AssemblyAI (@AssemblyAI) on X

General-purpose ASR: 95%+ accuracy on a clinical consult. Also general-purpose ASR: gets "hydrochlorothiazide" wrong every time. Introducing Medical Mode โ€” a correction pass on top of Universal-3 Pro optimized for medical entity recognition. Enable it with one parameter.

X (formerly Twitter)

AssemblyAI (@AssemblyAI)

์ž„์ƒ ์›Œํฌํ”Œ๋กœ์šฐ์šฉ Medical Mode๊ฐ€ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ์Œ์„ฑ์ธ์‹ ์ •ํ™•๋„๊ฐ€ ๋†’์•„๋„ ์ž„์ƒ์—์„œ๋Š” ์•ฝ๋ฌผ๋ช… ๊ฐ™์€ ํ•ต์‹ฌ ํ† ํฐ ์˜ค๋ฅ˜ ๋•Œ๋ฌธ์— ์‹ค์‚ฌ์šฉ์ด ์–ด๋ ต๋‹ค๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค.

https://x.com/AssemblyAI/status/2036822463347302652

#medicalai #speechrecognition #clinicalworkflow #asr #healthcare

AssemblyAI (@AssemblyAI) on X

Medical Mode is now available for clinical workflows. We built Medical Mode because a transcript that's 95% accurate can still be unusable in a clinical setting. Errors in general-purpose ASR are often concentrated on exactly the tokens clinicians care about most: drug names,

X (formerly Twitter)
Categorizing Emacs News items by voice in Org Mode :: Sacha Chua

Chrome extension adjusts video speed based on how fast the speaker is talking

https://github.com/ywong137/speech-speed

#HackerNews #ChromeExtension #VideoSpeed #SpeechRecognition #TechInnovation #OpenSource