Mastodawn

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.

https://github.com/matthartman/ghost-pepper

GitHub - matthartman/ghost-pepper: Hold-to-talk speech-to-text for macOS. 100% local, powered by WhisperKit and local LLM cleanup. Hold Control to record, release to transcribe and paste.

Hold-to-talk speech-to-text for macOS. 100% local, powered by WhisperKit and local LLM cleanup. Hold Control to record, release to transcribe and paste. - matthartman/ghost-pepper

GitHub

Show thread

primaprashant 3d ago

Speech-to-text has become integral part of my dev flow especially for dictating detailed prompts to LLMs and coding agents.

I have collected the best open-source voice typing tools categorized by platform in this awesome-style GitHub repo. Hope you all find this useful!

https://github.com/primaprashant/awesome-voice-typing

[dead]

This thread is a support group for people who have each independently built the same macOS speech-to-text app.

Show thread

tpowell 3d ago

I cobbled my own together one night before I came across the thoughtfully-built KeyVox and got to talking shop with its creator. Our cups runneth over. https://github.com/macmixing/keyvox/

GitHub - macmixing/keyvox: KeyVox is a local-first dictation app for Mac and iPhone. Native ecosystem. Open source. An alternative to Wispr Flow.

KeyVox is a local-first dictation app for Mac and iPhone. Native ecosystem. Open source. An alternative to Wispr Flow. - macmixing/keyvox

GitHub

Show thread

karimf 3d ago

In the /r/macapps subreddit, they have huge influx of new apps posts, and the "whisper dictation" is one of the most saturated category. [0]

>“Compare” - This is the most important part. Apps in the most saturated categories (whisper dictation, clipboard managers, wallpaper apps, etc.) must clearly explain their differentiation from existing solutions.

https://www.reddit.com/r/macapps/comments/1r6d06r/new_post_r...

Show thread

pmarreck 3d ago

Are there any better than Superwhisper? Because I haven't found any.

Show thread

fragmede 3d ago

Yeah, but mine does it better because... Oh, hello.

Show thread

lxe 3d ago

hahaha I’m glad I’m just a procedurally generated NPC

I built one for cross platform — using parakeet mlx or faster whisper. :)

Show thread

arkensaw 3d ago

This is great, and I'm not knocking it, but every time I see these apps it reminds me of my phone.

My 2021 Google Pixel 6, when offline, can transcribe speech to text, and also corrects things contextually. it can make a mistake, and as I continue to speak, it will go back and correct something earlier in the sentence. What tech does Google have shoved in there that predates Whisper and Qwen by five years? And why do we now need a 1Gb of transformers to do it on a more powerful platform?

Show thread

com2kid 3d ago

Microsoft OneNote had this back in 2007 or so, granted the speech to text model wasn't nearly as advanced as they are now.

I was actually on the OneNote team when they were transitioning to an online only transcription model because there was no one left to maintain the on device legacy system.

It wasn't any sort of planned technical direction, just a lack of anyone wanting to maintain the old system.

Show thread

adamsmark 3d ago

The accuracy is much lower though.

I've switched away from Gboard to Futo on Android and exclusively use MacWhisper on MacOS instead of the default Apple transcription model.

Show thread

cootsnuck 3d ago

Interesting. My Pixel 7 transcription is barely usable for me. Makes way too many mistakes and defeats the purpose of me not having to type, but maybe that's just my experience.

The latest open source local STT models people are running on devices are significantly more robust (e.g. whisper models, parakeet models, etc.). So background noise, mumbling, and/or just not having a perfect audio environment doesn't trip up the SoTA models as much (all of them still do get tripped up).

I work in voice AI and am using these models (both proprietary and local open source) every day. Night and day different for me.

Show thread

fiatpandas 3d ago

The clean up prompt needs adjusting. If your transcription is first person and in the voice of talking to an AI assistant, it really wants to “answer” you, completing ignoring its instructions. I fiddled with the prompt but couldn’t figure out how to make it not want to act like an AI assistant.