Might not be efficient, but at least it... Uhhh, wait, what good does it provide again?
Might not be efficient, but at least it... Uhhh, wait, what good does it provide again?
The massive corporate AI (LLMs for the most part) are driving up electricity and water usage, negatively impacting communities. They are creating a stock market bubble that will eventually burst. They are sucking up all the hardware, from GPUs to memory, to hard drives and SSDs.
On top of all of that they are in such a rush to expand that a lot of them are installing fossil fuel power on top of running the local grid ragged so they pollute, drive up costs, and all for a 45% average rate of incorrect results.
There are a lot of ethical problems too, but those are the direct negatives to tons of people.
Local inference isn't really the issue. Relatively low power hardware can already do passable tokens per sec on medium to large size models (40b to 270b). Of course it won't compare to an AWS Bedrock instance, but it is passable.
The reason why you won't get local AI systems - at least not completely - is due to the restrictive nature of the best models. Most actually good models are not open source. At best you'll get a locally runnable GGUF, but not open weights, meaning re-training potential is lost. Not to mention that most of the good and usable solutions tend to have complex interconnected systems so you're not just talking to an LLM but a series of models chained together.
But that doesn't mean that local (not hyperlocal, aka "always on your device" but local to your LAN) inference is impossible or hard. I have a £400 node running 3-4b models at lightning speed, at sub-100W (really sub-60W) power usage. For around £1500-2000 you can get a node that gets similar performance with 32-40b models. For about £4000, you can get a node that does the same with 120b models. Mind you I'm talking about lightning fast performance here, not passable.
At least for me the small 4-8b models turned out to be pretty useless. Extremely prone to hallucinations, not good at multiple languages and worst of all still pretty slow on my machine.
I tried to create a simple note taking agent with just file io tools available. Without reasoning they fucked up even the simplest tasks in very creative ways and with reasoning it thought about it for 7 before finally doing it.
I wouldn’t even recommend using LLMs in place of search engines, since they make stuff up. If it’s providing sources, you can check those, but you have to be rigorous enough to check every detail, which just isn’t realistic. People are lazy.
The best way I’ve heard them described is “bullshit machines”, and I don’t say that because I think they’re stupid, but because they “bullshit” as opposed to lying or telling the truth. When you’re bullshitting, the truth is irrelevant, as long as it sounds good. That’s exactly how LLMs work.
So if there’s a problem that can be solved by bullshitting, that’s where an LLM might be the right tool for the job.
If AI can do your job in minutes you’re either: A fool pumping out AI slop someone else has to fix and you don’t realize it.
Or
Doing a job that really shouldn’t exist.
LLMs can’t do more than shove out a watered down average of things it’s seen before. It can’t really solve problems, it can’t think, all it can do is regurgitate what it’s seen before. Not exactly conducive to quality.
Try to play tic tac toe against ChatGPT for example 🤣 (just ask for “let’s play ASCII tic tac toe”)
Practically loses every game against my 4yo child - if it even manages to play according to the rules.
AI: Trained on the entire internet using billions of dollars. 4yo: Just told her the rules of the game twice.
Currently the best LLMs are certainly very “knowledgeable” (as in, they “know” much more than I - or practically any person - do for most topics) but they are certainly far away from intelligence.
You should only use them of you are able to verify the correctness of the output yourself.
"See, no matter how much I'm trying to force this sewing machine to be a racecar, it just can't do it, it's a piece of shit"
Just because there are similarities, if you misuse LLMs, they won't perform well. You have to treat it as a tool, with a specific purpose. In case of LLMs that purpose is to take a bunch of input tokens, analyse them, and output the most likely output tokens that is statistically the "best response". The intelligence is putting that together, not "understanding tic tac toe". Mind you, you can tie in other ML frameworks for specific tasks that are better suited for those -e.g. you can hook up a chess engine (or tic tac toe engine), and that will beat you every single time.
Or an even better example... Instead of asking the LLM to play tic-tac-toe with you, ask it to write a Bash/Python/JavaScript tic-tac-toe game, and try playing against that. You'll be surprised.
If LLMs can’t do whatever you tell them based purely on natural language instructions then they need to stop advertising it that way.
It’s not just advertisement that’s the problem, do any of them even have user manuals? How is a user with no experience prompting LLMs (which was everyone 3 years ago) supposed to learn how to formulate a “correct” prompt without any instructions? It’s a smokescreen for blaming any bad output on the user.
Oh, it told you to put glue in your pizza? You didn’t prompt it right. It gives you explicit instructions on how to kill yourself because you talked about being suicidal? You prompted it wrong. It completely makes up new medical anatomical terminology? You have once again prompted it wrong! (Don’t make me dig up links to all those news stories)
It’s funny the fediverse tends to come down so hard on the side of ‘RTFM’ with anything Linux related, but with LLMs it’s actually the user’s fault for believing they weren’t being sold a fraudulent product without a user manual.
Nobody claimed that any sewing machine has PhD level intelligence in almost all topics.
LLMs are marketed as “replaces jobs”, “PhD level intelligence”, “Reasoning models”, “Deep think”.
And yet all that “PhD level intelligence” consistently gets the simplest things wrong.
But, prove me wrong. Pick a game, prompt any LLM you like and share it here (the whole conversation not only a code snippet)
Even that won't be anywhere close to the efficiency of neurons.
And actual neurons are not comparable to transistors at all. For starters the behaviour is completely different, closer to more complex logic gates built from transistors, and they're multi-pathway, AND don't behave as binary as transistors do.
Which is why AI technology needs so much power. We're basically virtualising a badly understood version of our own brains. Think of it like, say, PlayStation 4 emulation - it's kinda working but most details are unknown and therefore don't work well, or at best have a "close enough" approximaion of behaviour, at the cost of more resource usage. And virtualisation will always be costly.
Or, I guess, a better example would be one of the many currently trending translation layers (e.g. SteamOS's Proton or macOS' Rosetta or whatever Microsoft was cooking for Windows for the same purpose, but also kinda FEX and Box86/Box64), versus virtual machines. The latter being an approximation of how AI relates to our brains (and by AI here I mean neural network based AI applications, not just LLMs).
I'm very much aware of FPGA-style attempts, however I do feel the need to point out that FPGAs (and FPGA style computing) is even more hardware-strained than emulation.
For example, current mainstream emulation FPGA DE10 Nano has up to 110k LE/LUT, and that gets you just barely passable PS1 emulation (primarily, it's great for GBA emu, and mid to late 80s, early 90s game console hardware emulation). In fact it's not even as performant as GBA emulation on ARM - it uses more power, costs more, and the only benefit is true to OG hardware execution (which isn't always true for emulation).
Simply said, while FPGAs provide versatility, they're also much less performant than similarly priced SoCs with emulation of the specific architecture.
It’s not too hard. AI requires a LOT of work. Work requires energy. Some energy is wasted during this and the byproduct is heat. The heat has to be removed for many reasons, and water is very good at doing that.
It’s like sweating, it cools you down. But you need water to sweat.
actual intelligence
You have a lot of faith in me.
Do you have a link to any research on a push to analog transistors and their properties? I have been reading up on transistors (and vacuum tubes) but haven’t seen any discussion on this.
Also much lower voltages are typical in modern transistors, from 1-1.5v.
My mistake, I made it look like that’s a fact I’ll edit that as my opinion. Though here is one I can find.
www.nature.com/articles/s43588-024-00753-x
Article also spectrum.ieee.org/analog-ai-2669898661
This study shows a viable pathway to the efficient deployment of state-of-the-art large language models using mixture of experts on 3D analog in-memory computing hardware.
See, the thing is, I watch piss porn. Hear me out. I told my friend that the thing is, to do piss porn, you kind of have to be into it. You could try and fake it, but it wouldn’t be very convincing. So, my contention is, piss porn is more genuine than other types of porn, because the people partaking are statistically more likely to enjoy doing that type of porn. Which is great, I think, because then they really get into it, which is hot. It’s that enjoyment that gets me off. Their enjoyment.
She said, “Krooklochurm, you’re an idiot. Anyone can fake liking getting pissed in the face.”
So I said, “Well, if you’re so adamant, get in the tub and I’ll piss in your mouth, and let’s see if it’s as easy as you claim.”
So she said, “All right. If I can fist you in the ass afterwards.”
Which I felt was a fair deal, so I took it.
My (formal) position was strengthened significantly by the former event. And I can also attest that I could not convincingly fake enjoying being ass-fisted.
What does that have to do with anything, you ask? Genuinity. The real deal. That’s what.