CW:NSFW!!!
I appreciate devotion, but I admire discipline even more. Impress me with your manners, not your ego.
CW:NSFW!!!
I appreciate devotion, but I admire discipline even more. Impress me with your manners, not your ego.
@techsimplified yes, accuracy issues indeed. The current state is good enough for the initial development tests, but the accuracy STT mistakes makes the rest of the pipeline mediocre, no matter how good it is. The input is the key.
#Vosk has been great but I feel I bump to the limits. I am testing #Whisper and should deliver punctuation and better accuracy, that translates to better interaction with the Chatbot, which brings improved user experience.
I will take a look at your suggestion, but I do focus on Voice to Text rather than Voice to action, as I aim for conversational experience more than simply executing tasks.
Thanx!
Marededéusinyó, 4 dies per fer que els ventiladors de la caixa del #Pitxu funcionin a diferents velocitats segons la temperatura, i en silenci.
He après molt aquesta dies. A nivell més de vida, aquest típic soroll d'aparell elèctric (el típic que ens fa canviar-lo per vell) és degut a que emet una freqüència audible, moltes vegades per error, com era el meu cas
Primer he conectat els ventiladors. Funcionen al 100%.
Després el pin de control. Tirar de llibreria GPIO per encendre'ls i apagar-los a certa temperatura.
Després aprendre d'Hysteresis, que és això de que ventili fins mes abaix del llindar per què no s'estigui encenent i apagant cada 5 segons.
Després convertir-lo a PWM, que permet variar la velocitat per que faci menys soroll.
Descobrir com funciona, i que a freqüències baixes el "zumbit" toca els ous. Massa.
Aprendre que s'ha d'usar una freqüència no-audible (~25kHz), i que la llibreria que uso explota a més de 10kHz, i el soroll no mola.
Resulta que totes les llibreries Python fan PWM per software, cal fer-ho per hardware.
La mare que va parir el Kernel de Linux, els overlays, i sa puta mare.
M'he fet un overlay jo mateix, ja tinc els canals que necessito, i ja puc moure els ventiladors a la freqüència que vull.
El #Pitxu ja respira en silenci, i prèn grans bocanades d'aire quan ho necessita.
Entre la millora del micro, el que estic cohent per canviar de #Vosk a #Whisper, i que el hardware aguanti com toca tota la infra, ja començo a tenir ganes de posar-me amb els models altra cop.
@techsimplified it is, completely! I find that having my hands free to do actions (and queries) is indeed a game changer. I'm just bumping my head to make the STT to work smooth.
This project in the pic is a satellite device from my main #Pitxu ongoing built, chaining STT > Chatbot > TTS. As a satellite, it just captures sound, sends it to the "server" and plays the answer. It is a #RaspberryPiZero2 so it can't really hold all the engines needed.
As per tooling, the whole pack uses:
- #Vosk (now tinkering with #Whisper)
- #Gemini (now tinkering with #Ollama offline)
- #Piper
But a big chunk of my brain goes to the UX hardware:
- screen for a more human interaction
- soundcard I/O (gosh RPi is not yet polished here)
- GPIO buttons, UPS, PWM fan cases,...
AI Speech Technologies
This page is a collection of notes and links related to AI speech technologies, including Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, voice cloning, and other rela(...)
#ai #cloning #speech #stt #synthesis #tts #voice #whisper
https://taoofmac.com/space/ai/speech?utm_content=atom&utm_source=mastodon&utm_medium=social
I hate OpenAI but I had to use Whisper to help someone make accessible content. I hate that I had to use Whisper to do it because it comes from OpenAI.
But I don't know of any other way to get a text transcription from a media file that is free/open. (Besides doing it manually.)
I tell myself because it's for education and accessibility it's okay, but I still don't like it.
Transcribing audio to text accurately is often expensive. OpenAI Whisper allows you to do this locally and easily. If you want more information, you can read about it here: https://byandrev.dev/en/blog/using-whisper-to-transcribe-videos/