Linux desktop voice control has a gap. Talon costs money. Other tools are X11-only or cloud-dependent.

So I built EasySpeak.

https://www.youtube.com/watch?v=dl5m2Zo1oIE

https://github.com/ctsdownloads/easyspeak/tree/dev?tab=readme-ov-file#easyspeak

- Free and open source (GPL-3.0)
- Fully local β€” no cloud, no accounts
- Wayland-native
- "Hey Jarvis, open downloads"

Built for RSI, accessibility, or anyone who wants to talk to their computer.

#Linux #OpenSource #Accessibility #VoiceControl #GNOME #Wayland #a11y

Voice Control for Linux/GNOME Desktop

YouTube
Voice Control for Linux/GNOME Desktop | Matt H.

Took me over a year. It had many iterations. Once I finally got the tech sorted, I created the GH repo for it. After settling on how I wanted to present this to folks, I am releasing it today and yup, I already have two (minor) bugs filed by me. πŸ€ͺ Introducing EasySpeak - Voice control for Linux desktops. Fully local, no cloud, Wayland-native. Linux desktop voice control has a gap. Talon exists but has a steep learning curve and costs money for the full version. Most other tools are X11-only, abandoned, or cloud-dependent. So I built EasySpeak. https://lnkd.in/gvjtDSBy https://lnkd.in/gsFCfxeE Free and open source β€” GPL-3.0, no paywalls Fully local β€” no cloud, no accounts, no data leaving your machine Wayland-native β€” works on modern GNOME where X11 tools fail Simple β€” say "Hey Jarvis, open downloads" and it works Extensible β€” drop a Python file in plugins/ to add commands Built for people with RSI, accessibility needs, hands-busy workflows, or anyone who just wants to talk to their computer. What's working now: - Wake word activation ("Hey Jarvis") - Mouse grid navigation ("grid", "3 7 5", "click") - Head tracking cursor control (experimental) - Browser control with link hints and tabs - Dictation with punctuation commands - App launcher, media controls, volume, brightness The stack: OpenWakeWord + Whisper + Piper. Everything runs locally. Still in active development, but it's real and it works. More to come. #Linux #OpenSource #Accessibility #VoiceControl #GNOME #Wayland #RSI #AssistiveTechnology

Just a reminder for folks here with questions, I put this together to better clarify stuff.

https://github.com/ctsdownloads/easyspeak/discussions

ctsdownloads easyspeak Β· Discussions

Explore the GitHub Discussions forum for ctsdownloads easyspeak. Discuss code, ask questions & collaborate with the developer community.

GitHub

@matthartley

Nice!

One data point though - #Talon does have a free version. It's the evolving beta which costs money.

@unchartedworlds Ah, that is great! It's important software. Glad to hear this! πŸ˜€
@matthartley very nice! This is an awesome software stack. I love the inclusion of openwakeword. If folks don't like "hey, Jarvis" it's easy enough to swap it out for another model. I was just experimenting with a "hey, potato" one with Home Assistant/esphome the other day.
@shredder7579 Yeah, I looked into it myself as well. Settled on Jarvis for now, with moving forward into custom-land later. It's a great feature.

@matthartley oh yeah! I should clarify. I wasn't being critical about "hey, Jarvis". I was applauding that you picked something flexible.

Moreover, this definitely fills a gap I noticed as well. Really excited to try it out!

@shredder7579 Oh no worries, I just wanted to provide context. πŸ™‚

@matthartley this is *fantastic* work. I especially appreciate the implementation of the "grid" keyword. Nice stuff.

I wrote something like this a few years ago but ran into a wall when pyautogui wasn't supporting Wayland. If you're interested, here's the announcement on Reddit back then, which included a video:

https://www.reddit.com/r/Mycroftai/comments/krmn8m/

Code is here: https://gitlab.com/danielquinn/majel

@danielquinn Thanks and thanks for the links as well! πŸ™

@matthartley I needed this three years ago when I dislocated my shoulder and couldn't type effectively. I cobbled something together with numen, but it was barely usable.

Bookmarking this for the future!

@matthartley Thank you for this! I work in Accessibility Tech and I need more reasons to move participants to Linux.
@Sammy I appreciate it. I am committed to keeping this going and developing it. It's young, but smoothing out rough edges and packaging are on my radar. πŸ˜€
@matthartley agree with @Sammy ! Thanks so much for creating this. Accessibility is one of the most common reasons people in my field (education) give me for why the can't recommend or switch to Linux. This will make such a difference. 🫢
@matthartley Can Emacs do it.
GitHub - ctsdownloads/easyspeak: Voice control for Linux desktops. Fully local, no cloud, Wayland-native.

Voice control for Linux desktops. Fully local, no cloud, Wayland-native. - ctsdownloads/easyspeak

GitHub

@matthartley Having more accessibility options on Linux especially Wayland is sorely needed. Thanks for making this thing!

I am fortunate enough to not require stuff like this, but sometimes accessibility helps out in a pinch, and I am grateful for that.

@matthartley Some Questions about the language model for your speech-to-text.
Where do you got it? Are there already some library's?
Will it (does it) support other languages.

And. Might be a key feature for a lot of (e.g. handicapped) people: is there a way to train on your own voice/pronunciations/way to speak.

I know some guys who fail on the speech-to-text because of their ability to speak.

@M Great questions. The key will be how I can best utilize OpenWakeWord. That's the piece using our existing default model, will need to grow.

The challenge and a doable one, will be folks being able to train their own. So it's on my radar and also how we best get past English models as well.

All are considerations I'm committed to sorting out over time.

@matthartley @M Train-your-own would be excellent! In the meantime, what accents has it been trained to recognise -- and/or tested on? (That might be useful info in the readme too.) I remember pre-Dragon I could get MacOS's built-in system to work sometimes if I faked the accent just right...
@zeborah @M Great questions. It's brand new, still in alpha dev mode, so just mine at this stage. part of the tuning phase is to decide on a path forward for accents and training.
@matthartley @M Cool thanks! Training seems like an Extra-Large T-shirt or whatever the cool PMs are calling it these days, but it would be absolutely awesome to have a Linux system that did that!
@matthartley RSI? Repetitive-Strain Injury?

@matthartley genuine question, what would it take financially to develop it into a fully functioning replacement for Nuance's Dragon Naturally Speaking for windows? I can't type or edit anything longer than an email. I also can't code. The one thing missing from the FOSS ecosystem is a proper, fully functional voice to text suite. I want to help make it happen.

also this is amazing and thank you

@RobertoArchimboldi Wow, that's an unexpected question.

I need to understand scope, features (I've never used it), available tech stack on Linux to match it. Without knowing anything about that software, my gut tells me this is the biggest challenge. (cont)

@RobertoArchimboldi That said, the Linux landscape is spattered with a number of solutions (one of which is used here for a plugin)

All that said, the biggest challenge is the lower level stack matching overall functionality.

I have been fighting for Linux users for 20+ years - literally. (cont)

@RobertoArchimboldi I am pretty good with impossible or difficult, but, I need time to understand the landscape, what is mature, what needs lower level dev help, etc.

The immediate plan for EasySpeak is dialing it in, smoothing the rough edges, fine tuning functionality with the plugins (like transcription which is still rough). (cont)

@RobertoArchimboldi Longer term, slowly, carefully work with smarter people than me to begin filling in the missing tech stack to accomplish what you describe plus other missing elements.

That said, thank you for this. Seeing your note makes this effort worthwhile. (cont)

@matthartley I definitely don't have the money. I imagine it is in the Β£100,000s bracket. I would like to help fundraise. There has to be the money for such an important tool
@matthartley Wow, that looks incredibly nice.
Something I especially noticed in the demo: why do you need a wakeword for every command? I imagine this would feel more natural: start EasySpeak -> always listening for commands -> pause/exit
@matthartley Would it work for we KDE users, too?
@matthartley Looks awesome, great job!
@matthartley is this gnome-specific or does it work on other desktops?
@pup KDE Plasma planned. Still early alpha.

@matthartley

Sounds interesting! Where can we see the video outside YouTube? Is there a Peertube version?

@matthartley
Sehr interessant, Matt.
Ich hatte vor knapp 30 Jahren "Dragon Dictate". Fand ich schlimm.
Da ich Linux habe, wΓ€re Easy Speak interessant.
Frage: Kennt das Programm auch die deutsche Sprache?

---

Very interesting, Matt.
I had "Dragon Dictate" almost 30 years ago. I found it bad.
Since I have Linux, Easy Speak would be interesting.
Question: Does the programme also know the German language?

@matthartley Added this one to my list of speech recognition frontends: https://slatecave.net/notebook/speech-recognition-frontends/#easyspeak 
Speech Recognition Frontends

Projects making use of speech recognition