Mastodawn

Yesterday Cory Doctorow argued that refusal to use LLMs was mere "neoliberal purity culture". I think his argument is a strawman, doesn't align with his own actions and delegitimizes important political actions we need to make in order to build a better cyberphysical world.

EDIT: Diskussions under this are fine, but I do not want this to turn into an ad hominem attack to Cory. Be fucking respectful

https://tante.cc/2026/02/20/acting-ethical-in-an-imperfect-world/

Acting ethically in an imperfect world

Life is complicated. Regardless of what your beliefs or politics or ethics are, the way that we set up our society and economy will often force you to act against them: You might not want to fly somewhere but your employer will not accept another mode of transportation, you want to eat vegan but are […]

Smashing Frames

I really like and admire @pluralistic and have utmost respect for him, and that's why I'm totally baffled about why he is claiming "fruit of the poisoned tree" arguments as cause of LLM scepticism.

The objections to LLMs aren't about origins but about what they they are doing right now: destroying the planet, stealing labour, giving power over knowledge to LLM owners etc.

The objections are nothing to do with LLMs' origins, they're entirely about LLMs' effects in the here and now.

Cory Doctorow Feb 20

@FediThing @tante

Which parts of running a model on your own laptop are implicated in "destroying the planet?" How is checking punctuation "stealing labor?" Or, for that matter "giving power over knowledge to LLM owners?"

@pluralistic @tante

(Hello Mr Doctorow! Just want to make clear I admire you a great deal and this isn't intended as an attack on you!)

Running a local LLM with no connection to outside providers might be a way of avoiding bad stuff, but I am not clear on how this relates to discussing origins of technologies?

It seems like there's ambiguity in your post about whether it applies just to people with homelabs wondering if they should try offline LLMs, or whether you are discussing LLMs as a general technology?

Almost everyone using LLMs will use the online kind, so objections to LLMs are (reasonably IMHO) based on that scenario.

Cory Doctorow Feb 20

@FediThing @tante

> I am not clear on how this connects to discussing origins of technologies

Because the arguments against running an LLM on your own computer boil down to, "The LLM was made by bad people, or in bad ways."

This is a purity culture standard, a "fruit of the poisoned tree" argument, and while it is often dressed up in objectivity ("I don't use the fruit of the poisoned tree"), it is just special pleading ("the fruits of the poisoned tree that I use don't count, because __").

@pluralistic @tante

Thank you for the responses 🙏

"Because the arguments against running an LLM on your own computer"

...ahhh okay. So was this post aimed more at a very narrow homelab kind of audience?

It's just, as a reader, the article's emphasis on examples of tech origins imply it's trying to defend LLMs in general? This probably is my ignorance as a reader, but it's how it came across to me, and led to bafflement.

Cory Doctorow Feb 20

@FediThing @tante This is the use-case that is under discussion.

https://pluralistic.net/2026/02/19/now-we-are-six/

@pluralistic @tante

Thanks. Can totally see how that makes sense at a technical level for people who run their own offline services.

I think it's the ambiguity that is driving the discourse over this post. People are taking the "refusing to use a technology" section as a defence of LLMs in general?

If the angle was caging LLMs or something like that, it might make it clearer that you aren't endorsing the most common form of LLM?

Anyway, it's your call on this as author, just wanted to feed back on this because your writing matters and I hope feedback is helpful to it.

prince lucija Feb 20

@FediThing @pluralistic @tante i feel in the similar way as big tech has taken the notion of AI and LLMs as a cue/excuse to mount a global campaign of public manipulation and massive investments into a speculative project and pumps gazillions$ into it and convinces everyone it's innevitable tech to be put in bag of potato chips, the backlash is then that anything that bears the name of AI and LLM is poisonous plague and people are unfollowing anyone who's touched it in any way or talks about it in any other way than "it's fascist tech, i'm putting a filter in my feed!" (while it IS fascist tech because it's in hands of fascists).

in my view the problem seems not what LLMs are (what kind of tech), but how they are used and what they extract from planet when they are used by the big tech in this monstrous harmful way. of course there's a big blurred line and tech can't be separated from the political, but... AI is not intelligent (Big Tech wants you to believe that), and LLMs are not capable of intelligence and learning (Big Tech wants you to believe that).

so i feel like a big chunk of anger and hate should really be directed at techno oligarchs and only partially and much more critically at actual algorithms in play. it's not LLMs that are harming the planet, but rather the extraction, these companies who are absolute evil and are doing whatever the hell they want, unchecked, unregulated.

or as varoufakis said to tim nguyen: "we don't want to get rid of your tech or company (google). we want to socialize your company in order to use it more productively" and, if i may add, safely and beneficialy for everyone not just a few.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@prinlu @FediThing @pluralistic @tante I agree with most things said in this thread, but on a very practical level, I'm curious what training data was used for the model used by @pluralistic 's typo-checking ollama?

for me, that training data is key here. was it consensually allowed for use in training?

because as I understand, LLMs need vast amounts of training data, and I'm just not sure how you would get access to such data consensually. would love to be enlightened about this :)

Cory Doctorow Feb 20

@bazkie @prinlu @FediThing @tante

I do not accept the premise that scraping for training data is unethical (leaving aside questions of overloading others' servers).

This is how every search engine works. It's how computational linguistics works. It's how the Internet Archive works.

Making transient copies of other peoples' work to perform mathematical analysis on them isn't just acceptable, it's an unalloyed good and should be encouraged:

https://pluralistic.net/2023/09/17/how-to-think-about-scraping/

How To Think About Scraping – Pluralistic: Daily links from Cory Doctorow

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic @prinlu @FediThing @tante I think the difference to search engines is how LLM reproduces the training data..

as a thought experiment; what if I'd scrape all your blogposts, then start a blog that makes Cory Doctorow styled blogposts, which would end up more popular than your OG blog since I throw billions in marketing money at it.

would you find that ethical? would you find it acceptable?

further thought experiment; lets say you lose most of your income as a result and have to stop making blogs and start flipping burgers at mcDonalds.

your blog would stop existing, and so, my copycat blog would, too - or at least, it would stop bringing novel blogposts.

this kind of effect is real and will very much hinder cultural development, if not grind it to a halt.

that is a problem - this is culturally unsustainable.

Victor Song Feb 21

@bazkie @prinlu @FediThing @tante

First: checking for punctuation errors and other typos *in my own work* in a model running on *my own laptop* has nothing - not one single, solitary thing - in common with your example.

Nothing.

Literally, nothing.

But second: I literally license my work for commercial republication and it is widely republished in commercial outlets without any payment or notice to me.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic but then you consented to that, right? you are in control of that.

also my example IS similar - after all, it's data scraped without consent, used to create another work. the typo-checker changes your blogpost based on my training data, in the same way my copycat blog changes 'my' works based on your training data.

sure, it's on a way different scale - deliberately, to more clearly show the principle - but it's the same thing.

Cory Doctorow Feb 20

Should we ban the OED?

There is literally no way to study language itself without acquiring vast corpora of existing language, and no one in the history of scholarship has ever obtained permission to construct such a corpus.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic I gave it a good thought, and you know what, I'm gonna argue that yes, for me there is a degree of unethical-ness to that lack of permission!

the things that makes me not mind that so much are a variety of differences in method and scale;

(*btw just explaining my personal reasons here, not arguing yours)

- every word in the OED was painstakingly researched by human experts to make the most possible sense of it

- coming from a place of passion on the end of the linguists, no doubt

- the ownership of said data isn't "techno-feudal mega-corporations existing under a fascist regime"

- the OED didn't spell the end of human culture (heh) like LLMs very much might.

so yeah. I guess we do agree that, on some level, the OED and an LLM have something in similar.

it's the differences in method and scale that make me draw the line somewhere in between them; in a different spot from where you may draw it.

and like @zenkat mentioned elsewhere, it's the whole thing around LLMs that makes me very wary of normalizing anything to do with it, and I concede I wouldn't mind your slightly unethical LLM spellchecker as much, if we didn't live in this horrible context. :)

I guess this has become a bit of a reconciliatory toot. agree to disagree on where we draw the line, to each their own, and all that.

David, a Bostonian in Tokyo.Feb 20

@pluralistic @bazkie

Dictionaries reference the sources they use for examples in the entries themselves.

LLMs lose the references at training time.

You've got this dead wrong.