Artificial Analysis (@ArtificialAnlys)

Microsoft가 음성 전사 모델 MAI-Transcribe-1을 공개했다. AA-WER 3.0%로 #4 성능을 기록했고, 69배 실시간 속도로 동작한다. Microsoft AI(MAI) Superintelligence 팀이 개발했으며 영어, 프랑스어, 아랍어, 일본어 등 25개 언어를 지원한다.

https://x.com/ArtificialAnlys/status/2039862705096659050

#microsoft #speechtotext #transcription #ai #multilingual

Artificial Analysis (@ArtificialAnlys) on X

Microsoft has released MAI-Transcribe-1: a speech transcription model achieving 3.0% on AA-WER (#4), and is fast at 69x real-time The model was developed by Microsoft AI (MAI)’s Superintelligence team and supports 25 languages including English, French, Arabic, Japanese, and

X (formerly Twitter)

Wes Roth (@WesRoth)

음성 받아쓰기 앱을 만드는 Willow가 실시간 дик테이션용 STT 모델 Atlas 1을 출시했다. 기존 Whisper 같은 범용 모델보다 받아쓰기 환경에 맞춰 설계된 독자 모델이며, 실시간 전사 품질 개선을 목표로 한다.

https://x.com/WesRoth/status/2039538310637601274

#stt #speechtotext #voiceai #dictation #model

Wes Roth (@WesRoth) on X

Willow, the startup behind the popular AI-powered voice dictation app, launched Atlas 1, a proprietary speech-to-text (STT) model designed specifically for real-time dictation. While legacy models (like OpenAI's Whisper) typically score a 5-7% WER on clean audio and plummet to

X (formerly Twitter)

Angry Tom (@AngryTomtweets)

Microsoft가 새로운 음성 인식 모델 MAI-Transcribe-1을 공개했다. 혼잡한 실제 환경에서도 높은 품질의 전사를 빠르고 효율적으로 제공하는 SOTA speech-to-text 모델이라고 소개된다.

https://x.com/AngryTomtweets/status/2039724108544704707

#microsoft #speechtotext #transcription #aimodel #stt

Angry Tom (@AngryTomtweets) on X

Microsoft just dropped MAI-Transcribe-1, a new SOTA speech-to-text model. The model is built to deliver high quality transcription in messy, real-world environments, while remaining incredibly fast and efficient. MAI-Transcribe-1 delivers SOTA speech-to-text transcription

X (formerly Twitter)

Wes Roth (@WesRoth)

AI 음성 받아쓰기 앱으로 알려진 Willow가 실시간 받아쓰기에 특화된 자체 STT 모델 Atlas 1을 출시했다. 기존 범용 모델보다 실시간 음성 입력에 맞춰 설계된 점이 핵심이며, 빠르고 정확한 전사 성능을 목표로 한다.

https://x.com/WesRoth/status/2039538310637601274

#speechtotext #voicedictation #aimodel #realtime #whisper

Wes Roth (@WesRoth) on X

Willow, the startup behind the popular AI-powered voice dictation app, launched Atlas 1, a proprietary speech-to-text (STT) model designed specifically for real-time dictation. While legacy models (like OpenAI's Whisper) typically score a 5-7% WER on clean audio and plummet to

X (formerly Twitter)

What are the most ethical options for on-device speech-to-text transcription?

(Ethical here meaning both low CO2 emissions and water use, and also not trained on stolen data)

FFmpeg v8 has support for Whisper, but AFAIK the required models appear to come from OpenAI, which scores poorly on both criteria.

I know Mozilla Common Voice offers data sets, but I don't see any models.

Is such a thing possible?

#ai #speechToText #transcription

Ive been trying to get voxtype to work but it can;t see the keyboard.

https://github.com/sk7n4k3d/voxtype/blob/main/docs/USER_MANUAL.md

ERROR Hotkey listener error: No keyboard device found in /dev/input/

Restarted the desktop with a log out and that fixed the problem
would appreciate any help on getting this configured.
#voxtype #speechtotext

voxtype/docs/USER_MANUAL.md at main · sk7n4k3d/voxtype

Fork of VoxType with remote Whisper server support + KDE Wayland AZERTY fix (wl-copy + Shift+Insert paste) - sk7n4k3d/voxtype

GitHub

Built something I think some of you may find useful: WhisperWeb

It’s a web app for turning audio into text quickly and simply in the browser. Great for voice notes, interviews, rough transcripts, and idea capture.

You can check it out here: https://whisperweb.app

I’d love to hear what you think.
#SpeechToText #Transcription #IndieWeb #WebApp #Productivity

Whisper Web — In‑Browser Speech‑to‑Text

Transcribe audio privately in your browser. No uploads. Try it now at whisperweb.app.

Whisper Web

AssemblyAI (@AssemblyAI)

음성-텍스트 벤치마크가 실제로는 평가용 정답 파일의 문제 때문에 왜곡될 수 있다는 사례를 소개한다. Universal-3 Pro 출시 후 일부 고객이 새 모델 성능이 더 나쁘게 나온다고 제보했고, 조사 결과 모델 자체보다 진실 파일(truth files)의 오류가 원인일 가능성을 발견했다.

https://x.com/AssemblyAI/status/2036458488436838663

#speechtotext #benchmark #aimodel #evaluation #machinelearning

AssemblyAI (@AssemblyAI) on X

Most speech-to-text benchmarks are broken. Not because the tools are bad—because the truth files are. When we launched Universal-3 Pro, some customers flagged that their benchmarks showed the new model performing worse than older ones. So we dug in. What we found: the model was

X (formerly Twitter)

For the ones that use #murena #eos and miss #speechToText

Give #futo keyboard a try https://keyboard.futo.org/

FUTO Keyboard

FUTO Keyboard is a modern, privacy-focused keyboard that runs fully offline. Enjoy swipe typing, autocorrect, predictive text, and more—no internet connection required.

Почему одного Whisper оказалось недостаточно и как мы создали полноценный сервис распознавания речи

Всем привет! Меня зовут Наталья, я инженер машинного обучения в ЮMoney. Мы уже писали о том, как транскрибируем аудио с внутренних созвонов в текст. Прошёл год, и задача выросла: помимо созвонов мы решили транскрибировать все звонки службы поддержки, а также создать удобный интерфейс для работы с аудио и текстом. В этой статье расскажу, как нам удалось реализовать всё это, и при этом повысить качество распознавания и сохранить процесс внутри корпоративного контура. Мы протестировали различные решения и теперь делимся опытом, чтобы другие команды могли быстрее внедрять проверенные подходы и избегать распространённых ошибок.

https://habr.com/ru/companies/yoomoney/articles/1012870/

#распознавание_речи #speechtotext #whisper #аудиообработка #диаризация #речевая_аналитика #машинное_обучение #vad

Почему одного Whisper оказалось недостаточно и как мы создали полноценный сервис распознавания речи

Всем привет! Меня зовут Наталья, я инженер машинного обучения в ЮMoney. Мы уже писали о том, как транскрибируем аудио с внутренних созвонов в текст. Прошёл год, и задача выросла: помимо созвонов мы...

Хабр