We need to talk about data centres.

For the 2nd or 3rd time this week I've seen someone comment on a new data centre build with a stat about how 80% of data is never accessed. Then they talk about the energy and cooling used in modern DCs.

The reality is that data storage is actually incredibly efficient, and uses fuck all power. A hard disk is less than 10w and stores multiple users data.

Storing data, our photos, our memories, our history. Is not the problem.

What is? 1/n

The thing driving the need for the bigger more power and water hungry data centres is AI. Sparkling autocarrot. Where as a machine in a rack full of hard disks might consume a couple of hundred watts. A machine loaded up with a typical load of 8 "AI accelerators" can be pulling in the region of 5kw. Over an order of magnitude more power than the energy needed to store the lifes photos of hundreds of people.

And why ? To what end?

2/n

I've worked in this industry for over a quarter of a century.

At no point have I found myself thinking "I wish I could just ask the computer to write this email for me" "I wish the computer could write my code for me". MS is adding co pilot function to lots of products. Not as opt in. But opt out. And it's a right hassle to turn it off? Why? So someone can ask it to right a longer email from a prompt that the recipient can then ask the AI to summarise for them ?

3/n

There are certainly some areas where machine learning (note, I'm not calling it AI) has it's uses. Medical research springs to mind. But a ubiquitous AI assistance rolled into all our products? Why? It's just using too much power, too many resources, and for what ? Sparkling autocarrot.

Encoding the worst of our society in a bit stream. Exacerbating inequality, prejudice, and hate.

In comp sci there's a term. GIGO. Garbage in. Garbage out.

4/n

These large language models are being fed on the combined mass of the world's online content. Your tweets, Facebook posts, forum posts, that blog you forgot you started. All of it is being fed into the black box of the LLM. The internet provides us unprecedented access to the world's information. But it is also an unprecedented collection of hate. We've seen this time and time again. From chat bots that start shouting nazi propaganda, to CV vetting systems that won't hire women.

5/n

Garbage in. Garbage out.

And what makes this even more terrifying is that when you look at a webpage, it's often hard to tell if it's been generated with sparkling autocarrot, or written by a human. If we can't tell, then what hope does the LLM? And so we're gonna end up with the next generation models being fed on the output of the previous. This is going to create feedback loops. Reinforcing the worst the model has to offer. Strengthening the hate. The prejudice.

6/n

And because we don't know what has been created how. There's no way to control what feeds the models. It's just gonna enshitify. And fast.

AI has all the hallmarks of a bubble. Like crypto before it, and half a dozen other bubbles before that, that all share their heritage right back to the south sea bubble (no, not tulip mania, but that's something for a different thread).

Except this bubble has gone more mainstream. It's consuming way more resources. Than any before it.

7/n

Water is going to become the next big inequality front. As the climate changes. Clean fresh water is going to become harder to come by. More expensive, and more unequal. That same water is being poured over panels I'm data centres to cool the servers. To cool the AI accelerators, generating content noone asked for. Enshitifying the knowledge base of humanity. Just so a few people can make some money.

8/n

Storing our data, our memories, our photos, on servers in data centres that are built in sensible places isn't inherently a bad thing. And we shouldn't allow ourselves to fall for the trope of 80% of it is never accessed. But building datacentres that use ten times the energy, and need even more water, in deserts, and water stressed areas, to drive sparkling autocarrot that noone asked for. That we should be more vocal about.

9/9.

@quixoticgeek Training GPT-3 took as much water as 460 hamburgers.

You raise a very good point about location of data centers. There's plenty of room along the Wisconsin and Michigan coastlines with access to more fresh water than could ever be consumed in a thousand years. If water were priced appropriately in drought-ridden areas, I imagine data centers would be happy to relocate. First step is to fix the broken politics that subsidizes silage/beef

@quixoticgeek as a practical matter, don't think we can guilt trip users into pretending a given convenience a) isn't cool after all or b) should not be used. People get into accidents because they're idiots and use their cell phones while they're driving. Even so, cell phones are part of life now.

What we can do is tax/regulate the hell out of large commercial entities that, in making cool new tech convenient, move us 1 cm closer to climate disaster. You don't get to make money for doing that.

@YusufToropov @quixoticgeek

But that's a handwave. "We" won't tax the hell out of them, any more than we tax the hell out of billionaires.

This "incentivize good choices" model is what's gotten us an environment in which everybody is pissing microplastics and the amount of CO2 in the atmosphere went up by 1% in a year in 2023.

Regulation works if people decide to do it. This argument is what's been convincing otherwise smart people not to try to do it.

@abhayakara @quixoticgeek

Answer: vote and organise. For Democrats, if you happen to live in the USA.

And yes, this is why Europe is farther along on this. Americans can't seem to grasp the absurdly high stakes. Every syllable of what you just wrote is why I'm trying to make sure we don't get distracted/manipulated into electing a climate denier slate nationwide. Govt must take this seriously.

Guilt tripping users isn't the answer regardless. That's another distraction in my humble opinion.

@abhayakara @quixoticgeek

And by the way I would start with Exxon when it comes to taxing and regulating the hell out of people

@YusufToropov @quixoticgeek

I agree. But I don't think OP was guilt tripping anyone. OP was reporting what's going on. That's important.

@abhayakara @quixoticgeek

This felt like guilt tripping:

"But building datacentres that use ten times the energy, and need even more water, in deserts, and water stressed areas, to drive sparkling autocarrot that noone asked for. That we should be more vocal about."

I prefer we talk about legislation that keeps companies responsible. I disagree that I don't want AI. I *do* want AI. Inaccurate to say no one is asking for the next wave of development. I am. Even if I don't know what it is.

@YusufToropov @quixoticgeek

Okay, but the ask was literally "that we should be more vocal about" [it].

I think LLMs are interesting, and GAI would also be interesting. But we actually already have GAI. It's called corporations and societies. These are artificial cognitive structures, built by humans, running on a substrate that is known to be intelligent.

So if you're interested in how to improve the state of AI, you don't have to wait!

@abhayakara @quixoticgeek

Just give the people who are building responsible data centers a huge subsidy and slap huge taxes and fines all the rest. I don't try to dictate what the marketplace is going to create, or what is going to be cool, a year from now. It's pointless.

@YusufToropov @quixoticgeek

Maybe a little polderpolitiek wouldn't hurt, though. I agree that we shouldn't be legislating outcomes, but right now we're deliberately pricing externalities out, which is a subsidy that always and only benefits approaches that are wasteful. We are not starting with a level playing field, so of course we see abuses.

@YusufToropov @abhayakara @quixoticgeek You’ve missed the point completely. We are wasting electricity on machines performing cheap tricks. Feeding more data into LLMs won’t make them better. The entire bubble is nonsense, pumping carbon into the atmosphere for zero benefit…. In fact, even a detriment to society instead.

@YusufToropov @quixoticgeek

BTW, Americans do grasp the absurdly high stakes, and the majority vote accordingly.

The difference between America and, say, the Netherlands, is that in the Netherlands when Geert Wilders gets the biggest number of seats, the question is, will Yesilgƶz cooperate with him. In the U.S., it's "well, I guess Trump won even though the majority voted against him."

Our news media is Pravda from the 80's, even the "left media." It's very difficult here to get real news.

@YusufToropov @abhayakara @quixoticgeek the reason why the USA is so far behind Europe is simply that our oligarchs have much tighter control over public discourse, because the corporate media are all billionaire owned, and particularly since the execrable Citizens United decision, both major parties are dependent on overlapping subsets of those same billionaires for campaign funding. The oligarchs have captured the public sphere.

@YusufToropov @abhayakara @quixoticgeek

*Your* answer is facile. Yes, organize, and slow the collapse with voting when it'll work, but voting is less and less powerful as time goes on thanks to an unpacked SCOTUS and very well-lubed fascist political regime. National Dems are at best nearly completely useless, either unable or--more likely--unwilling to pursue policies that will alienate bourgeois donors.

Dems aren't the answer, and never can be. Nor are you making sure of anything, expat.

@YusufToropov @abhayakara @quixoticgeek

I doubt you've had your foot on the ground in a while, since you don't seem to understand that climate denial was already the entire sweep in 16. In 2000, when SCOTUS decided an election for a climate-denialist and oil exec.

We get the stakes, we're just fucking exhausted. Work is hell and we can barely afford a roof over our heads, and the population that can't grows all the time.

Standard Oil should've been and can still be nationalized.

@YusufToropov @quixoticgeek I fully intend on guilt tripping users into thinking AI isn't cool. Gonna do it as much as I possibly can.
@YusufToropov @quixoticgeek users aren't demanding it, companies are forcing it on them
@quixoticgeek
1. LLMs are useful far beyond sparkling autocorrect. In fact their embeddings are arguably their most useful feature.
2. LLMs provide a lot of different options for accessibility needs. From sparkling autocorrect allowing for much better encoding and decoding of voice data, to helping with attention management.
3. Immersion Cooling (stick your server in vegetable oil) could already be being used to save water. They don't because of graft. Why pay $9 per gallon of vegetable oil that will last forever, when you can pay .35 cents (2500 times less)? States are literally giving companies water and paying them to use more of it.
4. We have known for over a decade how to remove bias from LLMs. It sometimes degrades performance, so they've decided not to do that. In other ways they have debiased it.
5. It's flatly not true that learning from the data it creates will reinforce bias. In fact, given the way they currently are designed, it will reduce bias. Also, if we can't tell that it was made by a machine, then it doesn't matter whether or not it was.
@quixoticgeek i worry AI is going to have a similar curve to email but harsher: I feel like email started as this nice efficient service then got infested with spammers and junk and now a huge chunk of email is just garbage. AI seem to be skipping the ā€˜efficiency’ stage and just right to bloating.
@quixoticgeek This. If it's not accessed it uses zero energy.
@dalias While I agree with the main point of @quixoticgeek that pursuit of the (IMO garbage) goals of AI is much more of a problem than data, it's not true that storage costs zero differentially when not accessed. A good number for a mix of spinning disk and flash is about a kW of power usage per petabyte. That would be roughly a megawatt for a medium-sized hyperscaler data center, roughly a percent of its usage. Smaller than the AI/CPU/GPU footprint, but not at all zero or even close to zero.
@AlanSill @quixoticgeek OK, what I should have said is that, in some sort of asymptotic sense with proper optimization, data not accessed consumes no energy. Imagine an ongoing defrag-like process that migrates data that has not been accessed in a long time to storage devices that are physically powered down, or even a setup where each client's data (at some fine grain) is on a separately-sleepable eMMC chip or similar.

@quixoticgeek Stochastic Parrots Considered Harmful

Sparkling Autocarrot Considered Harmful

Take a look some time at Ireland's current and short term projected datacentre electricity use as % of total generation. 20% now, 70% by 2030. That's what a friendly corporate tax haven regime gets you these days.

@quixoticgeek GenAI also adds nothing to our overall cultural value. Every piece of text or scribble by a human would be valuable to a future historian trying to understand our time. "I had great Pho last night" posted from Longmont Colorado in 2003 tells a historian about cultural spread and acceptance in our time. A doodle of a person with a selfie stick blocking a vista tells an entire story. GenAI blurs and obscures that. It tells nothing about any human attitude aspirations or vision.
@quixoticgeek Clear and compelling warning about natural consequences of current round of commercialization of AI research!
The threat to drinking water might be less severe -- Microsoft (and maybe others?) have been researching "underwater datacenters" in deep sea water to manage the cooling needs,
@bobhy underwater datacentres are unfortunately still a gimmick. They are too difficult to service the content. And sticking stuff in the sea is not a panacea. The heat still goes somewhere. You heat up the water around the DC, that changes the ecosystem. We see this with the vents for the cooling water at nuclear power stations. The warm water changes the wildlife there.
@quixoticgeek Agree with all of these caveats, avoid one problem, create another. The trick is to create problems at a somewhat larger scale than you were avoiding the original problems at.
@quixoticgeek The larger point -- once these AI "factories" are built, their owners *need* to crank out "product". How can we get the most social benefit from the ongoing resource cost?
If not LLM-generated love letters (as Edge / Copilot is offering me this morning), then maybe a more limited research assistant to help me find relevant factual information, maybe flag bots and troll farms in social contexts?
Leave the human-oriented creative stuff to humans?

@quixoticgeek I'd like to highlight that "AI" isn't actually anything new, its just larger than ever now. Which is why this serverfarm issue isn't new. Before it mostly came in the form of surveillance advertising & personalized recommendations.

*Our need for an income is what drives waste!*

The legit value of serverfarms (though I am sympathetic to arguments that we over-rely on this) is near-entirely in data storage/publishing. And message routing.

The legit value costs near-nothing.

@quixoticgeek one of the things that bothers me is the Snake Oil that is big data.

Computing 101 : data + context = information.

Google, Facebook and other similar horse traders do not have this context element. But sell their wares as customer information.

I don’t need Microsoft to be my editor. I am purposefully using verbose language, because there are steam powered airships in my writing.

There is no place for AI in my life exept for my x-rays. Else more snake oil.

@quixoticgeek data centers also store a massive amount of advertising data, that we all try to block and remove using browsers extensions. And to sell what ? Too big cars ? Luxury items ? Marketing scams ?
@quixoticgeek If you haven't seen it, Philosophy Tube has a great video on the ethics of "A.I." The last section is on the global labour issues that come with large machine learning models. And it's not the issues we think these are. https://youtu.be/AaU6tI2pb3M?si=G_c85QC-RGoKqK8D
Here's What Ethical AI Really Means

YouTube

@quixoticgeek

There are over 550,000 Private pools in AZ and 1,300,000 pools in Ca.
This doesn't include resorts.
Then there are the golf courses..

@quixoticgeek I've heard OLD WEB compared to low-background steel, pre-nuclear testing...

@quixoticgeek
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

We all need to start signing our work.
-----BEGIN PGP SIGNATURE-----

iGwEAREKACwWIQTiPCHthk/086cRTN/KRx/VAWGKSQUCZdtADg4cZGFsZUByZG1w
Lm9yZwAKCRDKRx/VAWGKSageAKCHXHNhCsgXCuCg853RYmdx+6ssoQCeJDnXNTLk
nNPxcQO+qq1BXHStaXA=
=kZbF
-----END PGP SIGNATURE-----

@quixoticgeek You *could* check that signature against key E23C 21ED 864F F4F3 A711 4CDF CA47 1FD5 0161 8A49. Good luck.
@khleedril @quixoticgeek why? What does signing do that we should use it?
@KnightMcLeod @quixoticgeek It means that at any point in the future, if we should meet, I can prove they are my words and not the words of an AI (you'll still have to believe me, but that itself is a human thing, right?)
@quixoticgeek feeding LLM generated pages back into LLM could be more amusing that that - there's a paper that shows it can lead to model collapse. Effectively poisoning itself. https://arxiv.org/abs/2305.17493
The Curse of Recursion: Training on Generated Data Makes Models Forget

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.

arXiv.org
@quixoticgeek This ^^^ is the big deal. We'll go from an LLM fed on Reddit, to an LLM fed on an LLM fed on Reddit. Perfect bullshit storm.

@quixoticgeek That is the term I've been looking for, to summarize LLMs and generative systems to non-tech people. GIGO machines.

#ai #llm #gigo

@quixoticgeek I would have loved that when my employer(i.e. owner) required every employee to send a weekly status of everything we had accomplished that week. After doing this for 3 months it dawned on me he ain't reading them. So by the 4th month I just started recycling them. Did that until the expected stop sending all those emails.
@quixoticgeek "Sparkling autocarrot" is *chef kiss*. 🤣
@quixoticgeek but YouTube has to make their AI algorithm show me videos I don’t want to watch so they can show me an ad I don’t want to see, which somehow means they can charge more for the ad, rather than showing me videos they already know want to see and showing me the same ads on those videos instead.
@quixoticgeek I just asked ChatGPT 4 to write a MUMPS program to calculate pi. The resulting program was moderately accurate. I’ve gotten false results from some simple math questions in the past. For instance giving the _last_ 10 digits of pi, or factoring my social security number.
It wrote a summary, then minutes of a meeting based on the Zoom transcript, somehow covering every discussion, while leaving out every interesting fact.
To me, the question of whether it will ā€œsucceedā€, or of whether it should be banned in the USA, isn’t useful. Figuring out the characteristic limitations is interesting, for the same reason I want to anticipate Siri’s errors, or spell checkers’, or doctors handwriting errors, or UFOs.

@quixoticgeek

It's not just AI.

AI is ofc exponentially worse than data centers but the latter are a huge problem.

@lil_meow_meow if we take out the AI machines we can get a lot more out of the DCs we already have.

@quixoticgeek

Absolutely!

And yet, DCs are a big problem.

I'm not saying abolish them! But they must be powered by renewables and the waste heat (is this the correct term? Or Denglish? šŸ™ƒ ) capitalized on.