Speech input is one of the missing features in #Phosh's stevia. I had looked at several possible solutions but didn't want to pull in a ton more dependencies into stevia itself.

While looking for something completely different I stumbled onto #vosk-server which runs fully locally but can be talked to via websocket and so I could punch that into the prototype I had already alying around (video has audio):

#LinuxMobile

I'm trying to set up voice control for Home Assistant.... in Esperanto! There's only, as far as I know, one local option for an Esperanto STT model able to run on a Raspberry Pi: vosk. And let me tell you, the set up (especially with dockerized home assistant) is, uh, a labor of love, let's say.
Mi sukcesos !
#homeAssistant #esperanto #vosk #stt #docker #languages

Trying the speech to text engine (vosk) in Kdenlive to add subtitles to some videos I'm working on..

It is mostly right, but sometimes...

#vosk #kdenlive

Oni povas uzi #Vosk por #Esperanto? 🤯 😮
My #whisper plugin dev is stalled for now (blame the #AI CEOs). I'm looking into lighter alts for the CPU due to the RAM crisis—like #Vosk. On a positive note, #CUDA now runs on #Radeon via #ZLUDA, which means it might also work with a few tweaks. I just need to get my hands on a GPU for testing. 🐾

Как я снизил WER с 33% до 3.3% для русской речи на CPU: сравнение GigaAM, Whisper и Vosk

За два месяца я перепробовал три ASR-движка, шесть моделей Whisper, адаптивное чанкование, T5-коррекцию и ансамблевое голосование — и большая часть идей оказалась тупиком. В статье — подробный разбор шести тупиков и одной находки: почему GigaAM от Сбера на обычном CPU показывает 3.3% WER на русском, обходя Whisper large-v3-turbo на RTX 4090 (7.9%) в 2.4 раза. С бенчмарками, кодом и честными оговорками.

https://habr.com/ru/articles/1002260/

#speechtotext #gigaam #whisper #vosk #onnx #распознавание_речи #WER #голосовой_ввод #ASR #python

Как я снизил WER с 33% до 3.3% для русской речи на CPU: сравнение GigaAM, Whisper и Vosk

Мне нужен был офлайновый голосовой ввод для Windows — push‑to‑talk, без облака, с хорошим распознаванием русского. Звучит просто? Я тоже так думал. За два месяца...

Хабр

@techsimplified yes, accuracy issues indeed. The current state is good enough for the initial development tests, but the accuracy STT mistakes makes the rest of the pipeline mediocre, no matter how good it is. The input is the key.

#Vosk has been great but I feel I bump to the limits. I am testing #Whisper and should deliver punctuation and better accuracy, that translates to better interaction with the Chatbot, which brings improved user experience.

I will take a look at your suggestion, but I do focus on Voice to Text rather than Voice to action, as I aim for conversational experience more than simply executing tasks.

Thanx!

Marededéusinyó, 4 dies per fer que els ventiladors de la caixa del #Pitxu funcionin a diferents velocitats segons la temperatura, i en silenci.

He après molt aquesta dies. A nivell més de vida, aquest típic soroll d'aparell elèctric (el típic que ens fa canviar-lo per vell) és degut a que emet una freqüència audible, moltes vegades per error, com era el meu cas

Primer he conectat els ventiladors. Funcionen al 100%.
Després el pin de control. Tirar de llibreria GPIO per encendre'ls i apagar-los a certa temperatura.
Després aprendre d'Hysteresis, que és això de que ventili fins mes abaix del llindar per què no s'estigui encenent i apagant cada 5 segons.
Després convertir-lo a PWM, que permet variar la velocitat per que faci menys soroll.
Descobrir com funciona, i que a freqüències baixes el "zumbit" toca els ous. Massa.
Aprendre que s'ha d'usar una freqüència no-audible (~25kHz), i que la llibreria que uso explota a més de 10kHz, i el soroll no mola.
Resulta que totes les llibreries Python fan PWM per software, cal fer-ho per hardware.
La mare que va parir el Kernel de Linux, els overlays, i sa puta mare.

M'he fet un overlay jo mateix, ja tinc els canals que necessito, i ja puc moure els ventiladors a la freqüència que vull.

El #Pitxu ja respira en silenci, i prèn grans bocanades d'aire quan ho necessita.

Entre la millora del micro, el que estic cohent per canviar de #Vosk a #Whisper, i que el hardware aguanti com toca tota la infra, ja començo a tenir ganes de posar-me amb els models altra cop.

@techsimplified it is, completely! I find that having my hands free to do actions (and queries) is indeed a game changer. I'm just bumping my head to make the STT to work smooth.

This project in the pic is a satellite device from my main #Pitxu ongoing built, chaining STT > Chatbot > TTS. As a satellite, it just captures sound, sends it to the "server" and plays the answer. It is a #RaspberryPiZero2 so it can't really hold all the engines needed.

As per tooling, the whole pack uses:
- #Vosk (now tinkering with #Whisper)
- #Gemini (now tinkering with #Ollama offline)
- #Piper

But a big chunk of my brain goes to the UX hardware:
- screen for a more human interaction
- soundcard I/O (gosh RPi is not yet polished here)
- GPIO buttons, UPS, PWM fan cases,...

I am using #VOSK voice recognition model in my phone for transscribing #Esperanto speach. Even very small, it has crazy high success rate and works very well even on phone. Similar for creating subtitles in Kdenlive.
https://alphacephei.com/vosk/models
VOSK Models

Accurate speech recognition for Android, iOS, Raspberry Pi and servers with Python, Java, C#, Swift and Node.

VOSK Offline Speech Recognition API