Speech input is one of the missing features in #Phosh's stevia. I had looked at several possible solutions but didn't want to pull in a ton more dependencies into stevia itself.

While looking for something completely different I stumbled onto #vosk-server which runs fully locally but can be talked to via websocket and so I could punch that into the prototype I had already alying around (video has audio):

#LinuxMobile

There's more work needed to make this usable (we don't have arm64 docker containers of vosk-server and it's not in any distros but most of it is in alpine already). There's also room for improvement regarding the recognition (it likes to guess "the" when there's no input as you can see at the end of the video).

If you want to help out, jump into https://gitlab.gnome.org/World/Phosh/stevia/-/merge_requests/279 .

Thanks to dogman in the FLX1 channel for mentioning vosk which led me to vosk-server.

Draft: speech-input: Handle speech inpout via vosk-server (!279) · Merge requests · World / Phosh / Stevia · GitLab

vosk-server offers offline speech recognition for multiple languages. https://github.com/alphacep/vosk-server There's currently no container for arm64 (

GitLab
Maybe there are other engines we should consider and that also work via websocket or something similar?
@agx I'm using Speechnotes for this, where you can choose from different engines, all running locally. https://flathub.org/en/apps/net.mkiol.SpeechNote
Install Speech Note on Linux | Flathub

Notes with offline Speech to Text, Text to Speech and Machine Translation

@DF5RE Thanks for the hint, we want OSK integration on mobile I guess so you can use it everywhere you input text. That frees apps from caring about support and it's the same (switchable) engine for all inputs.

I had looked at the engines used by speech note before and whisper will be an additional one we want to add to stevia once vosk is working properly.

@agx OSK integration would be great which avoids the cumbersome copy & paste procedure when using speech note. Looking forward to your solution! 😀
@agx wow this is nice! It seems quiet similar to the homeassistant concept: https://www.home-assistant.io/voice_control/voice_remote_local_assistant/ maybe having a configureable engine in phosh that can be used by several apps is what just came to my mind..
Getting started - Local

Open source home automation that puts local control and privacy first. Powered by a worldwide community of tinkerers and DIY enthusiasts. Perfect to run on a Raspberry Pi or a local server.

Home Assistant
@agx
wow, that looks pretty cool!

@devrtz @agx
Hello, do you know Handy: https://handy.computer/
I'm not a technician, but it seems it is doing somewhat very close, and there may be synergies with what you are looking to achieve...

Regards.

Handy

Handy is a cross platform, open-source, speech-to-text application for your computer

Handy
@agx I think vosk would be a great way to go and was something I was thinking about some months ago. I'll give this MR a test on my device, thanks!
@agx got me it feels like the mic is now disabled, maybe it's better to have current status on the mic instead what will happen next time you press the button, it's confusing and not good UI design.
@agx I wonder, is there a mode for the keyboard to only show the top bar? Would be useful on devices with a mech keyboard.

@benjistokman You mean for the enable/disable button? I think we want a keybinding and maybe an indicator in Phosh's top-bar (rather than the OSK) so we can save that screen estate?

That would allow to hide the on screen keyboard also on mobile if one wants text input only. Does that make sense?

@agx Looks great! Your mic button is backwards to me. In my opinion, the icon should show current state not toggle or future state.