@cwebber #Mozilla working more on speech models could also help them finally release the web #SpeechRecognition API in #Firefox. I know it's a hard problem but now's a good time to get funding for machine learning, and Mozilla is promoting "AI" which this accessibility infrastructure could be called.

Right now, as far as I know and have tested, only proprietary browsers support that API out of the box, and Firefox has an in-progress but non-functional implementation. Thank you so much to everyone who has done some of the partial work on the API in a libre browser; your work is *so* appreciated!

Surprisingly several free, libre, and open-source tools related with FLOSS movements (#BigBlueButton's main live captioning plugin; MidCamp) currently rely on that API and state they only support Chrome (not Chromium) -- in the default setup that sends live audio to Google's servers which I'm pretty sure then run proprietary models.

GrapheneOS Speech Services version 2 released - GrapheneOS Discussion Forum

GrapheneOS discussion forum

GrapheneOS Discussion Forum
RTF tells you how fast the model runs. It doesn't tell you how long users actually wait. This guide covers the four batch transcription you need to know. https://hackernoon.com/rtf-in-speech-ai-isnt-enough-your-2026-guide-for-evaluating-batch-transcription #speechrecognition
RTF in Speech AI Isn't Enough: Your 2026 Guide For Evaluating Batch Transcription | HackerNoon

RTF tells you how fast the model runs. It doesn't tell you how long users actually wait. This guide covers the four batch transcription you need to know.

Genau mein Humor. #speechrecognition #award

#UnplugBigTech Tipp 5: Open-Source-Sprachassistent

Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.

https://www.openvoiceos.org/

#Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy

Home

Home page of OVOS

Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)

На Android у меня живёт Govorun Lite — офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через GigaAM v2 от Сбера. Проблема одна: на ПК такого нет. Встроенная Windows-диктовка — онлайн. Whisper — либо медленный, либо требует видеокарту. Сторонние сервисы — снова облако. Я решил портировать Govorun на Windows, и для ускорения взял Claude как пару-программиста. Что из этого вышло — в этой статье.

https://habr.com/ru/articles/1031240/

#python #speechrecognition #onnx #windows #llm #голосовой_ввод

Govorun PC: переносим офлайн-диктовку с Android на Windows за один вечер (с Claude)

Предыстория На Android у меня живёт  Govorun Lite  - офлайн-диктовка на русском. Нажал кнопку, сказал, текст вставился. Никаких облаков, никакой отправки голоса на серверы. Работает через...

Хабр

Amical - Open-source AI dictation app

Cossmology Profile: https://dub.sh/Vk7tPkn

Key People: Haritabh Singh, Naomi Chopra

#SpeechRecognition #OpenSource #OSS #COSS

Deepgram released Flux Multilingual, a speech recognition model that handles 10 languages with real-time switching during conversations. The system detects language changes mid-call and processes conversational turns in under 400ms. Available as cloud API or self-hosted at the same price as English-only versions. Could simplify multilingual voice applications that previously required separate detection and routing systems.

#SpeechRecognition #MultilingualAI #VoiceTech

https://www.implicator.ai/deepgram-launches-flux-multilingual-speech-model-with-10-language-mid-call-switching/

Deepgram Launches Flux Multilingual With 10-Language Mid-Cal

Deepgram launched Flux Multilingual, a conversational speech recognition model supporting 10 languages with real-time detection and mid-call code-switching. Uses conversational turn detection at under 400ms. Available as cloud API or self-hosted with EU endpoint support.

Implicator.ai

Non-lexical sounds impact ASR in clinical documentation.

🔊 NLCS: 2.4% of total words, conveying key clinical info
😷 Google's WER: 40.8%, Amazon's: 57.2% (all NLCS)
❌ Error rates for clinically relevant NLCS: Google 94.7%, Amazon 98.7%
📝 Total words: 135,647; 3284 NLCS; 76 conveyed critical data
🗣️ Described implications on documentation accuracy

#ASR #ClinicalDocumentation #SpeechRecognition #AI #NLPSolutions #Pub2Post https://tnyp.me/Npmiz0F4/m

Neural networks

Learn the basics of neural networks and backpropagation, one of the most important algorithms for the modern world.

YouTube