Built something I think some of you may find useful: WhisperWeb

It’s a web app for turning audio into text quickly and simply in the browser. Great for voice notes, interviews, rough transcripts, and idea capture.

You can check it out here: https://whisperweb.app

I’d love to hear what you think.
#SpeechToText #Transcription #IndieWeb #WebApp #Productivity

Whisper Web — In‑Browser Speech‑to‑Text

Transcribe audio privately in your browser. No uploads. Try it now at whisperweb.app.

Whisper Web

AssemblyAI (@AssemblyAI)

음성-텍스트 벤치마크가 실제로는 평가용 정답 파일의 문제 때문에 왜곡될 수 있다는 사례를 소개한다. Universal-3 Pro 출시 후 일부 고객이 새 모델 성능이 더 나쁘게 나온다고 제보했고, 조사 결과 모델 자체보다 진실 파일(truth files)의 오류가 원인일 가능성을 발견했다.

https://x.com/AssemblyAI/status/2036458488436838663

#speechtotext #benchmark #aimodel #evaluation #machinelearning

AssemblyAI (@AssemblyAI) on X

Most speech-to-text benchmarks are broken. Not because the tools are bad—because the truth files are. When we launched Universal-3 Pro, some customers flagged that their benchmarks showed the new model performing worse than older ones. So we dug in. What we found: the model was

X (formerly Twitter)

For the ones that use #murena #eos and miss #speechToText

Give #futo keyboard a try https://keyboard.futo.org/

FUTO Keyboard

FUTO Keyboard is a modern, privacy-focused keyboard that runs fully offline. Enjoy swipe typing, autocorrect, predictive text, and more—no internet connection required.

What is a useful speech to text tool for Linux, with Wayland. ?

I want the software to be easy to start and stop recording and insert the items spoken where the cursor is located. #speechtotext #linux #debian

Почему одного Whisper оказалось недостаточно и как мы создали полноценный сервис распознавания речи

Всем привет! Меня зовут Наталья, я инженер машинного обучения в ЮMoney. Мы уже писали о том, как транскрибируем аудио с внутренних созвонов в текст. Прошёл год, и задача выросла: помимо созвонов мы решили транскрибировать все звонки службы поддержки, а также создать удобный интерфейс для работы с аудио и текстом. В этой статье расскажу, как нам удалось реализовать всё это, и при этом повысить качество распознавания и сохранить процесс внутри корпоративного контура. Мы протестировали различные решения и теперь делимся опытом, чтобы другие команды могли быстрее внедрять проверенные подходы и избегать распространённых ошибок.

https://habr.com/ru/companies/yoomoney/articles/1012870/

#распознавание_речи #speechtotext #whisper #аудиообработка #диаризация #речевая_аналитика #машинное_обучение #vad

Почему одного Whisper оказалось недостаточно и как мы создали полноценный сервис распознавания речи

Всем привет! Меня зовут Наталья, я инженер машинного обучения в ЮMoney. Мы уже писали о том, как транскрибируем аудио с внутренних созвонов в текст. Прошёл год, и задача выросла: помимо созвонов мы...

Хабр
Just ran Whisper (OpenAI) completely locally on my system (RX 6700 XT / 16 GB RAM).

Whisper is an open source speech recognition model that can transcribe audio, generate subtitles, and even translate between languages.

Test video: The Reason Why Cancer is so Hard to Beat by Kurzgesagt - In a Nutshell
(https://www.youtube.com/watch?v=uoJwt9l-XhQ)

Setup:

- Whisper installed via pip
- Model: small (fast, good enough for English)
- GPU acceleration via ROCm

Result:
~98% accurate transcription with only a few minor errors, already solid for generating subtitles.

Next steps / possibilities:

- Auto-generate subtitles (.srt)
- Correct subtitles with a local LLM
- Translate speech
- Burn subtitles directly into videos

Video workflow:

- Recorded with OBS
- Edited in Kdenlive
- Transcoded with VAAPI (H.264)

No cloud, real hardware.
Everything runs on Linux, so anyone can set this up.
No GPU? No problem, you can also run it using PyTorch’s CPU backend, just much slower.

Background music: End of Me - Ashes Remain [Female Rock Cover by Kryx] (https://www.youtube.com/watch?v=E430M8lKim8)


#Whisper #OpenAI #ROCm #AMD #Linux #SpeechToText #Transcription #Subtitles #FOSS #OpenSource #OfflineAI #localai #Fediverse #nocloud

TestingCatalog News (@testingcatalog)

Hypescribe로 보이는 서비스가 YouTube, TikTok, Instagram, Zoom 통화, Google Meet, 음성 메모, MP4 등 다양한 소스의 음성/비디오를 지원하며 100개 이상의 언어를 처리하고 최대 99% 정확도를 주장합니다. 파일 길이 제한 없이 토큰 기반 과금 모델을 적용하고 있으며, hypescribe.com에서 무료 테스트가 가능하다고 안내하고 있습니다.

https://x.com/testingcatalog/status/2033567051466371556

#transcription #speechtotext #hypescribe #multilingual

TestingCatalog News 🗞 (@testingcatalog) on X

It reportedly works with YouTube, TikTok, Instagram, Zoom calls, Google Meet, voice memos, MP4s, and more. 100+ languages. Up to 99% accuracy. Token-based billing with no file length limits. Free to test it on https://t.co/P2McvyuKBC 👇

X (formerly Twitter)

TestingCatalog News (@testingcatalog)

HypeScribe가 새로운 AI 전사(트랜스크립션) 플랫폼을 출시했습니다. 오디오·비디오·소셜 미디어 링크를 붙여넣거나 MP3를 올리면 30초 이내에 전체 대본, 요약, 액션 아이템을 생성하고 내장 AI 채팅으로 구조화된 텍스트를 질의할 수 있는 기능을 제공합니다.

https://x.com/testingcatalog/status/2033567049293680795

#transcription #speechtotext #summarization #hypescribe

TestingCatalog News 🗞 (@testingcatalog) on X

HypeScribe launched a new AI transcription platform that turns any audio, video, or social media link into a full transcript, summary, and action items in under 30 seconds. Users can paste a YouTube link or drop an MP3 to get structured, queryable text with a built-in AI chat on

X (formerly Twitter)
Has #gnome integrated #speechtotext ?

Est-ce qu'il y a des solutions de reconnaissance vocale qui marchent bien sur Linux ?

Dans mon idée, il faut un programme qui tourne en arrière plan, qu'on peut activer avec un raccourci clavier et qui envoie le résultat comme si on tapait au clavier.

(Évidemment, c'est pas pour moi... Il faut donc un truc qui soit simple à utiliser au quotidien. Il faut que le traitement se fasse en local et idéalement, sans IA..)

#reconnaissanceVocale #speechToText #linux #dysgraphie #dys #dysorthographie