@davidrevoy And that is exactly the big lie. The magic of brute force token jumping your way to a patch only works if you have gigawatts of Nvidia filled data center powered by methane jet engines. And nobody knows the actual cost of any of this. You try it locally and it's just making your m4 laptop, that normally churns Blender Cycles renders in seconds, crazy hot to touch and takes hours instead of seconds. It seems way less efficient than ... blockchain.
@jimmac True, my local tests here were mostly with https://flathub.org/en/apps/com.jeffser.Alpaca (advantage of being easily removed and leave the computer clean after test). I tried few models available, but even on my workstation, I saw it was slow. Very educational to imagine how screaming some distant CPU and GPU must be on the servers of the AI compagnies, and how they make it totally invisible to the end users who are asking questions on their phone 'for the fun', not even questioning the (hidden) cost of it...
Install Alpaca on Linux | Flathub

Chat with AI models

@davidrevoy @jimmac that's the huge point Ed Zitron has been making for some time now: if you expose the true cost of this product to the users and bill them for every stupid prompt, oopsie and hallucination, nobody will want to pay for it.
@davidrevoy @jimmac and it's so obvious and anyone with a PC can test it locally like you did and see for themself, and yet nobody cares, it's so frustrating

@davidrevoy @jimmac Yep. I tried, just to see, if running a smaller targeted model in something like Ollama would be any more interesting to use with Home Assistant than it's built-in parser system (which I've added to with my own automations).

A _slight_ mis-hear of me setting a timer caused it to spew out some completely useless garbage on significant delay.

Even making HA matches take priority, it'd still, using the very thing LLM's are _supposed_ to be good at, screw up interpretations. "is the fan on in the hallway?" and it'd say "I don't know about any fans in Home Assistant hallway" or something but the fan was very on, and very in the Hallway room (later trying the correct HA syntax answered exactly right VERY fast).

I got rid of it. I'd rather be able to ramble at my speaker and say "Pizza pasta put it in your mouth" and it reply with "I'm not aware of any area called 'your mouth'" in 2 seconds or "Sorry, I didn't understand that" if I'm even more unintelligible (or just 'off' with my command), and only have the STT and TTS overheads on my GPU, than have the dice roller fuck up repeatedly.

The only thing it was halfway decent at was me basically tossing it a JSON dump from the weather forecast command and going "Here make a conversational thing about the next couple days". It was actually pretty good at that, but not worth it. Rewrote that as just my own writing to say exact temps and conditions for each of the next 3 days. My brain can track the similarities hearing it.