Mastodawn

CW:NSFW!!!
I appreciate devotion, but I admire discipline even more. Impress me with your manners, not your ego.

#ASMR
#Whisper
#FootFetish
#Worship
#GoddessWorship
#Femdom

@techsimplified yes, accuracy issues indeed. The current state is good enough for the initial development tests, but the accuracy STT mistakes makes the rest of the pipeline mediocre, no matter how good it is. The input is the key.

#Vosk has been great but I feel I bump to the limits. I am testing #Whisper and should deliver punctuation and better accuracy, that translates to better interaction with the Chatbot, which brings improved user experience.

I will take a look at your suggestion, but I do focus on Voice to Text rather than Voice to action, as I aim for conversational experience more than simply executing tasks.

Thanx!

Xavi 1d ago

Marededéusinyó, 4 dies per fer que els ventiladors de la caixa del #Pitxu funcionin a diferents velocitats segons la temperatura, i en silenci.

He après molt aquesta dies. A nivell més de vida, aquest típic soroll d'aparell elèctric (el típic que ens fa canviar-lo per vell) és degut a que emet una freqüència audible, moltes vegades per error, com era el meu cas

Primer he conectat els ventiladors. Funcionen al 100%.
Després el pin de control. Tirar de llibreria GPIO per encendre'ls i apagar-los a certa temperatura.
Després aprendre d'Hysteresis, que és això de que ventili fins mes abaix del llindar per què no s'estigui encenent i apagant cada 5 segons.
Després convertir-lo a PWM, que permet variar la velocitat per que faci menys soroll.
Descobrir com funciona, i que a freqüències baixes el "zumbit" toca els ous. Massa.
Aprendre que s'ha d'usar una freqüència no-audible (~25kHz), i que la llibreria que uso explota a més de 10kHz, i el soroll no mola.
Resulta que totes les llibreries Python fan PWM per software, cal fer-ho per hardware.
La mare que va parir el Kernel de Linux, els overlays, i sa puta mare.

M'he fet un overlay jo mateix, ja tinc els canals que necessito, i ja puc moure els ventiladors a la freqüència que vull.

El #Pitxu ja respira en silenci, i prèn grans bocanades d'aire quan ho necessita.

Entre la millora del micro, el que estic cohent per canviar de #Vosk a #Whisper, i que el hardware aguanti com toca tota la infra, ja començo a tenir ganes de posar-me amb els models altra cop.

Show thread

Xavi 1d ago

@techsimplified it is, completely! I find that having my hands free to do actions (and queries) is indeed a game changer. I'm just bumping my head to make the STT to work smooth.

This project in the pic is a satellite device from my main #Pitxu ongoing built, chaining STT > Chatbot > TTS. As a satellite, it just captures sound, sends it to the "server" and plays the answer. It is a #RaspberryPiZero2 so it can't really hold all the engines needed.

As per tooling, the whole pack uses:
- #Vosk (now tinkering with #Whisper)
- #Gemini (now tinkering with #Ollama offline)
- #Piper

But a big chunk of my brain goes to the UX hardware:
- screen for a more human interaction
- soundcard I/O (gosh RPi is not yet polished here)
- GPIO buttons, UPS, PWM fan cases,...

Tao of Mac 1d ago

AI Speech Technologies

This page is a collection of notes and links related to AI speech technologies, including Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, voice cloning, and other rela(...)

#ai #cloning #speech #stt #synthesis #tts #voice #whisper

https://taoofmac.com/space/ai/speech?utm_content=atom&utm_source=mastodon&utm_medium=social

Show thread

Pete Prodoehl 🍕2d ago

But like, this is what AI and LLMs *should be* doing. Making accessibility easier. Not generating bullshit art while it burns the planet.

And damn, Python made it so easy to get Whisper installed and running.

But are there alternatives that are free & open?

#ai #whisper

Pete Prodoehl 🍕2d ago

I hate OpenAI but I had to use Whisper to help someone make accessible content. I hate that I had to use Whisper to do it because it comes from OpenAI.

But I don't know of any other way to get a text transcription from a media file that is free/open. (Besides doing it manually.)

I tell myself because it's for education and accessibility it's okay, but I still don't like it.

#AI #OpenAI #LLM #whisper

Show thread

Nick 4d ago

@techsimplified for creation I found other tools good, such as #LiveCaptions(https://flathub.org/en/apps/net.sapples.LiveCaptions), there are other mostly #Whisper based tools for non live caption creation.
But I dont know of any good transcript formatting assistant program(to easily format transcripts with timestamps etc for upload) or any other tools to asses accessibility, that's why I am asking.

Install Live Captions on Linux | Flathub

Live Captioning for the desktop

Andres Parra 4d ago

Transcribing audio to text accurately is often expensive. OpenAI Whisper allows you to do this locally and easily. If you want more information, you can read about it here: https://byandrev.dev/en/blog/using-whisper-to-transcribe-videos/

#ai #openai #whisper

Ramon 4d ago

Habe nun seit einer Woche ein vollständig lokales Setup mit #Homeassistant #Voice am laufen.
#Whisper large v3 und #LLM läuft auf einem Jetson Orin AGX. Allerdings bin ich mit der Geschwindigkeit von #ollama noch nicht zufrieden. Ich muss mal #vllm oder #tensorRT-llm testen. Kann aber auch am Modell #gpt-oss:20b liegen, wobei das zumindest uneindeutige Anfragen gut interpretiert. Aber alles über 10 Sekunden Wartezeit ist zu lang.
Whisper versteht leider auch Frauenstimmen nicht 100% zuverlässig