100% local hold-to-talk speech-to-text for #macOS.

Hold Control to record, release to transcribe and paste. No cloud APIs, no data leaves your machine.

#Swift #SpeechToText #opensource #webdev

https://github.com/matthartman/ghost-pepper

GitHub - matthartman/ghost-pepper: Hold-to-talk speech-to-text for macOS. 100% local, powered by WhisperKit and local LLM cleanup. Hold Control to record, release to transcribe and paste.

Hold-to-talk speech-to-text for macOS. 100% local, powered by WhisperKit and local LLM cleanup. Hold Control to record, release to transcribe and paste. - matthartman/ghost-pepper

GitHub
GitHub - matthartman/ghost-pepper: Hold-to-talk speech-to-text for macOS. 100% local, powered by WhisperKit and local LLM cleanup. Hold Control to record, release to transcribe and paste.

Hold-to-talk speech-to-text for macOS. 100% local, powered by WhisperKit and local LLM cleanup. Hold Control to record, release to transcribe and paste. - matthartman/ghost-pepper

GitHub

Все переводчики речи в реальном времени — херня. Я написал свой. Тоже херня, но бесплатная

Перепробовал всё что есть на рынке, потратил на подписки больше чем на кофе, и в итоге сел писать с нуля. Вот что вышло AI Open Source Voice AI Real-time перевод Deepgram Groq Piper TTS STT TTS LLM Google Meet Zoom Личный опыт Elixir Rust macOS Apple Silicon Speech-to-Text Text-to-Speech Сижу на рабочем созвоне. Обсуждаем архитектуру нового сервиса. Технически я всё понимаю - документацию на английском читаю без словаря, код ревьюю, в Slack переписываюсь нормально. А вот когда надо открыть рот и сказать что-то сложнее "I agree" - начинается цирк. Пауза. Подбираю слова. Коллега уже ответил за меня. Знакомо? Мне - до зубного скрежета. Я CTO, последние годы плотно работаю с AI-интеграциями. Могу собрать систему автоматического обзвона клиентов с клонированием голосов, поднять флот ботов для скана Телеги, собрать архитектуру которая выдержит тысячи пользователей за копейки. А сам на созвоне звучу как иностранец с разговорником. Ирония уровня бог. И вот в голове простая картинка: я говорю по-русски, собеседник слышит английский. Он отвечает по-английски, я слышу русский. В реальном времени. Без пауз на 10 секунд. Без субтитров - именно голосом. С любым приложением: Meet, Zoom, Slack, Discord. Пошёл искать. И тут началось.

https://habr.com/ru/articles/1019458/

#realtime_communications #translations #speechtotext #texttospeech #deepgram #groq #elixir #rust #open_source #voice_ai

Все переводчики речи в реальном времени — херня. Я написал свой. Тоже херня, но бесплатная

Перепробовал всё что есть на рынке, потратил на подписки больше чем на кофе, и в итоге сел писать с нуля. Вот что вышло AI Open Source Voice AI Real-time перевод Deepgram Groq Piper TTS STT TTS LLM...

Хабр

Artificial Analysis (@ArtificialAnlys)

Microsoft가 음성 전사 모델 MAI-Transcribe-1을 공개했다. AA-WER 3.0%로 #4 성능을 기록했고, 69배 실시간 속도로 동작한다. Microsoft AI(MAI) Superintelligence 팀이 개발했으며 영어, 프랑스어, 아랍어, 일본어 등 25개 언어를 지원한다.

https://x.com/ArtificialAnlys/status/2039862705096659050

#microsoft #speechtotext #transcription #ai #multilingual

Artificial Analysis (@ArtificialAnlys) on X

Microsoft has released MAI-Transcribe-1: a speech transcription model achieving 3.0% on AA-WER (#4), and is fast at 69x real-time The model was developed by Microsoft AI (MAI)’s Superintelligence team and supports 25 languages including English, French, Arabic, Japanese, and

X (formerly Twitter)

Wes Roth (@WesRoth)

음성 받아쓰기 앱을 만드는 Willow가 실시간 дик테이션용 STT 모델 Atlas 1을 출시했다. 기존 Whisper 같은 범용 모델보다 받아쓰기 환경에 맞춰 설계된 독자 모델이며, 실시간 전사 품질 개선을 목표로 한다.

https://x.com/WesRoth/status/2039538310637601274

#stt #speechtotext #voiceai #dictation #model

Wes Roth (@WesRoth) on X

Willow, the startup behind the popular AI-powered voice dictation app, launched Atlas 1, a proprietary speech-to-text (STT) model designed specifically for real-time dictation. While legacy models (like OpenAI's Whisper) typically score a 5-7% WER on clean audio and plummet to

X (formerly Twitter)

Angry Tom (@AngryTomtweets)

Microsoft가 새로운 음성 인식 모델 MAI-Transcribe-1을 공개했다. 혼잡한 실제 환경에서도 높은 품질의 전사를 빠르고 효율적으로 제공하는 SOTA speech-to-text 모델이라고 소개된다.

https://x.com/AngryTomtweets/status/2039724108544704707

#microsoft #speechtotext #transcription #aimodel #stt

Angry Tom (@AngryTomtweets) on X

Microsoft just dropped MAI-Transcribe-1, a new SOTA speech-to-text model. The model is built to deliver high quality transcription in messy, real-world environments, while remaining incredibly fast and efficient. MAI-Transcribe-1 delivers SOTA speech-to-text transcription

X (formerly Twitter)

Wes Roth (@WesRoth)

AI 음성 받아쓰기 앱으로 알려진 Willow가 실시간 받아쓰기에 특화된 자체 STT 모델 Atlas 1을 출시했다. 기존 범용 모델보다 실시간 음성 입력에 맞춰 설계된 점이 핵심이며, 빠르고 정확한 전사 성능을 목표로 한다.

https://x.com/WesRoth/status/2039538310637601274

#speechtotext #voicedictation #aimodel #realtime #whisper

Wes Roth (@WesRoth) on X

Willow, the startup behind the popular AI-powered voice dictation app, launched Atlas 1, a proprietary speech-to-text (STT) model designed specifically for real-time dictation. While legacy models (like OpenAI's Whisper) typically score a 5-7% WER on clean audio and plummet to

X (formerly Twitter)

What are the most ethical options for on-device speech-to-text transcription?

(Ethical here meaning both low CO2 emissions and water use, and also not trained on stolen data)

FFmpeg v8 has support for Whisper, but AFAIK the required models appear to come from OpenAI, which scores poorly on both criteria.

I know Mozilla Common Voice offers data sets, but I don't see any models.

Is such a thing possible?

#ai #speechToText #transcription

Built something I think some of you may find useful: WhisperWeb

It’s a web app for turning audio into text quickly and simply in the browser. Great for voice notes, interviews, rough transcripts, and idea capture.

You can check it out here: https://whisperweb.app

I’d love to hear what you think.
#SpeechToText #Transcription #IndieWeb #WebApp #Productivity

Whisper Web — In‑Browser Speech‑to‑Text

Transcribe audio privately in your browser. No uploads. Try it now at whisperweb.app.

Whisper Web

AssemblyAI (@AssemblyAI)

음성-텍스트 벤치마크가 실제로는 평가용 정답 파일의 문제 때문에 왜곡될 수 있다는 사례를 소개한다. Universal-3 Pro 출시 후 일부 고객이 새 모델 성능이 더 나쁘게 나온다고 제보했고, 조사 결과 모델 자체보다 진실 파일(truth files)의 오류가 원인일 가능성을 발견했다.

https://x.com/AssemblyAI/status/2036458488436838663

#speechtotext #benchmark #aimodel #evaluation #machinelearning

AssemblyAI (@AssemblyAI) on X

Most speech-to-text benchmarks are broken. Not because the tools are bad—because the truth files are. When we launched Universal-3 Pro, some customers flagged that their benchmarks showed the new model performing worse than older ones. So we dug in. What we found: the model was

X (formerly Twitter)