Mastodawn

Manuel Vacelet Jan 6

Matt (It's really me) Hartley

Linux desktop voice control has a gap. Talon costs money. Other tools are X11-only or cloud-dependent.

So I built EasySpeak.

https://www.youtube.com/watch?v=dl5m2Zo1oIE

https://github.com/ctsdownloads/easyspeak/tree/dev?tab=readme-ov-file#easyspeak

- Free and open source (GPL-3.0)
- Fully local — no cloud, no accounts
- Wayland-native
- "Hey Jarvis, open downloads"

Built for RSI, accessibility, or anyone who wants to talk to their computer.

#Linux #OpenSource #Accessibility #VoiceControl #GNOME #Wayland #a11y

Voice Control for Linux/GNOME Desktop

YouTube

Show thread

Matt (It's really me) Hartley Jan 6

A little more backstory here on LinkedIn. https://www.linkedin.com/feed/update/urn:li:activity:7414297028702347264/

Voice Control for Linux/GNOME Desktop | Matt H.

Took me over a year. It had many iterations. Once I finally got the tech sorted, I created the GH repo for it. After settling on how I wanted to present this to folks, I am releasing it today and yup, I already have two (minor) bugs filed by me. 🤪 Introducing EasySpeak - Voice control for Linux desktops. Fully local, no cloud, Wayland-native. Linux desktop voice control has a gap. Talon exists but has a steep learning curve and costs money for the full version. Most other tools are X11-only, abandoned, or cloud-dependent. So I built EasySpeak. https://lnkd.in/gvjtDSBy https://lnkd.in/gsFCfxeE Free and open source — GPL-3.0, no paywalls Fully local — no cloud, no accounts, no data leaving your machine Wayland-native — works on modern GNOME where X11 tools fail Simple — say "Hey Jarvis, open downloads" and it works Extensible — drop a Python file in plugins/ to add commands Built for people with RSI, accessibility needs, hands-busy workflows, or anyone who just wants to talk to their computer. What's working now: - Wake word activation ("Hey Jarvis") - Mouse grid navigation ("grid", "3 7 5", "click") - Head tracking cursor control (experimental) - Browser control with link hints and tabs - Dictation with punctuation commands - App launcher, media controls, volume, brightness The stack: OpenWakeWord + Whisper + Piper. Everything runs locally. Still in active development, but it's real and it works. More to come. #Linux #OpenSource #Accessibility #VoiceControl #GNOME #Wayland #RSI #AssistiveTechnology

Show thread

Matt (It's really me) Hartley Jan 9

Just a reminder for folks here with questions, I put this together to better clarify stuff.

https://github.com/ctsdownloads/easyspeak/discussions

ctsdownloads easyspeak · Discussions

Explore the GitHub Discussions forum for ctsdownloads easyspeak. Discuss code, ask questions & collaborate with the developer community.

GitHub

Show thread

Matt (It's really me) Hartley Jan 9

Especially this.

https://github.com/ctsdownloads/easyspeak/discussions/6

Show thread

Jennifer Moore 😷Jan 6

@matthartley

Nice!

One data point though - #Talon does have a free version. It's the evolving beta which costs money.

Show thread

Matt (It's really me) Hartley Jan 6

@unchartedworlds Ah, that is great! It's important software. Glad to hear this! 😀

Show thread

GradientDescent Jan 6

@matthartley very nice! This is an awesome software stack. I love the inclusion of openwakeword. If folks don't like "hey, Jarvis" it's easy enough to swap it out for another model. I was just experimenting with a "hey, potato" one with Home Assistant/esphome the other day.

Show thread

Matt (It's really me) Hartley Jan 6

@shredder7579 Yeah, I looked into it myself as well. Settled on Jarvis for now, with moving forward into custom-land later. It's a great feature.

Show thread

GradientDescent Jan 6

@matthartley oh yeah! I should clarify. I wasn't being critical about "hey, Jarvis". I was applauding that you picked something flexible.

Moreover, this definitely fills a gap I noticed as well. Really excited to try it out!

Show thread

Matt (It's really me) Hartley Jan 6

@shredder7579 Oh no worries, I just wanted to provide context. 🙂

Show thread

Daniel Quinn Jan 6

@matthartley this is *fantastic* work. I especially appreciate the implementation of the "grid" keyword. Nice stuff.

I wrote something like this a few years ago but ran into a wall when pyautogui wasn't supporting Wayland. If you're interested, here's the announcement on Reddit back then, which included a video:

https://www.reddit.com/r/Mycroftai/comments/krmn8m/

Code is here: https://gitlab.com/danielquinn/majel

Show thread

Matt (It's really me) Hartley Jan 6

@danielquinn Thanks and thanks for the links as well! 🙏

Show thread

Matthew Weier O'Phinney Jan 6

@matthartley I needed this three years ago when I dislocated my shoulder and couldn't type effectively. I cobbled something together with numen, but it was barely usable.

Bookmarking this for the future!

Show thread

Sammy Jan 6

@matthartley Thank you for this! I work in Accessibility Tech and I need more reasons to move participants to Linux.

Show thread

Matt (It's really me) Hartley Jan 6

@Sammy I appreciate it. I am committed to keeping this going and developing it. It's young, but smoothing out rough edges and packaging are on my radar. 😀

Show thread

eLearningTechie Jan 6

@matthartley agree with @Sammy ! Thanks so much for creating this. Accessibility is one of the most common reasons people in my field (education) give me for why the can't recommend or switch to Linux. This will make such a difference. 🫶

Show thread

tusharhero Jan 6

@matthartley Can Emacs do it.

Show thread

Matt (It's really me) Hartley Jan 6

@tusharhero If I or someone else builds a plugin, yes. https://github.com/ctsdownloads/easyspeak/tree/dev?tab=readme-ov-file#writing-plugins

GitHub - ctsdownloads/easyspeak: Voice control for Linux desktops. Fully local, no cloud, Wayland-native.

Voice control for Linux desktops. Fully local, no cloud, Wayland-native. - ctsdownloads/easyspeak

GitHub

Show thread

cynical melomaniac

Jan 6

@matthartley Having more accessibility options on Linux especially Wayland is sorely needed. Thanks for making this thing!

I am fortunate enough to not require stuff like this, but sometimes accessibility helps out in a pinch, and I am grateful for that.

Show thread

Eric Schultz Jan 6

@matthartley this is super cool!

Show thread

Herr Dennis 🖖🙂Jan 6

@matthartley So nice! I love it.

Show thread

Meph Jan 6

@matthartley Some Questions about the language model for your speech-to-text.
Where do you got it? Are there already some library's?
Will it (does it) support other languages.

And. Might be a key feature for a lot of (e.g. handicapped) people: is there a way to train on your own voice/pronunciations/way to speak.

I know some guys who fail on the speech-to-text because of their ability to speak.

Show thread

Matt (It's really me) Hartley Jan 6

@M Great questions. The key will be how I can best utilize OpenWakeWord. That's the piece using our existing default model, will need to grow.

The challenge and a doable one, will be folks being able to train their own. So it's on my radar and also how we best get past English models as well.

All are considerations I'm committed to sorting out over time.

@matthartley Thanks.

@matthartley @M Train-your-own would be excellent! In the meantime, what accents has it been trained to recognise -- and/or tested on? (That might be useful info in the readme too.) I remember pre-Dragon I could get MacOS's built-in system to work sometimes if I faked the accent just right...

Show thread

Matt (It's really me) Hartley Jan 7

@zeborah @M Great questions. It's brand new, still in alpha dev mode, so just mine at this stage. part of the tuning phase is to decide on a path forward for accents and training.

Show thread

Zeborah Jan 7

@matthartley @M Cool thanks! Training seems like an Extra-Large T-shirt or whatever the cool PMs are calling it these days, but it would be absolutely awesome to have a Linux system that did that!

Show thread

Matt (It's really me) Hartley Jan 7

@zeborah @M 100% for sure :)

Show thread

Mike Spooner Jan 6

@matthartley RSI? Repetitive-Strain Injury?

Show thread

Matt (It's really me) Hartley Jan 6

@shelldozer Yep, you got it. 😀

Show thread

Roberto von Archimboldi Jan 6

@matthartley genuine question, what would it take financially to develop it into a fully functioning replacement for Nuance's Dragon Naturally Speaking for windows? I can't type or edit anything longer than an email. I also can't code. The one thing missing from the FOSS ecosystem is a proper, fully functional voice to text suite. I want to help make it happen.

also this is amazing and thank you

Show thread

Matt (It's really me) Hartley Jan 6

@RobertoArchimboldi Wow, that's an unexpected question.

I need to understand scope, features (I've never used it), available tech stack on Linux to match it. Without knowing anything about that software, my gut tells me this is the biggest challenge. (cont)

Show thread

Matt (It's really me) Hartley Jan 6

@RobertoArchimboldi That said, the Linux landscape is spattered with a number of solutions (one of which is used here for a plugin)

All that said, the biggest challenge is the lower level stack matching overall functionality.

I have been fighting for Linux users for 20+ years - literally. (cont)

Show thread

Matt (It's really me) Hartley Jan 6

@RobertoArchimboldi I am pretty good with impossible or difficult, but, I need time to understand the landscape, what is mature, what needs lower level dev help, etc.

The immediate plan for EasySpeak is dialing it in, smoothing the rough edges, fine tuning functionality with the plugins (like transcription which is still rough). (cont)

Show thread

Matt (It's really me) Hartley Jan 6

@RobertoArchimboldi Longer term, slowly, carefully work with smarter people than me to begin filling in the missing tech stack to accomplish what you describe plus other missing elements.

That said, thank you for this. Seeing your note makes this effort worthwhile. (cont)

Show thread

Roberto von Archimboldi Jan 6

@matthartley I definitely don't have the money. I imagine it is in the £100,000s bracket. I would like to help fundraise. There has to be the money for such an important tool

Show thread

Some Dude Jan 6

@matthartley thank you!

Show thread

despicable_me Jan 6

@matthartley Wow, that looks incredibly nice.
Something I especially noticed in the demo: why do you need a wakeword for every command? I imagine this would feel more natural: start EasySpeak -> always listening for commands -> pause/exit

Show thread

Larry Garfield Jan 6

@matthartley Would it work for we KDE users, too?

Show thread

Huriken Jan 6

@matthartley Looks awesome, great job!

Show thread

Kaizo Hellhound

Jan 6

@matthartley is this gnome-specific or does it work on other desktops?

Show thread

Matt (It's really me) Hartley Jan 6

@pup KDE Plasma planned. Still early alpha.

Show thread

Blort™ 🐀Ⓥ🥋☣️Jan 7

@matthartley

Sounds interesting! Where can we see the video outside YouTube? Is there a Peertube version?

Show thread

Hans Jan 7

@matthartley
Sehr interessant, Matt.
Ich hatte vor knapp 30 Jahren "Dragon Dictate". Fand ich schlimm.
Da ich Linux habe, wäre Easy Speak interessant.
Frage: Kennt das Programm auch die deutsche Sprache?

---

Very interesting, Matt.
I had "Dragon Dictate" almost 30 years ago. I found it bad.
Since I have Linux, Easy Speak would be interesting.
Question: Does the programme also know the German language?

Show thread

Slatian Jan 7

@matthartley Added this one to my list of speech recognition frontends: https://slatecave.net/notebook/speech-recognition-frontends/#easyspeak

Speech Recognition Frontends

Projects making use of speech recognition