Mastodawn

Yesterday Cory Doctorow argued that refusal to use LLMs was mere "neoliberal purity culture". I think his argument is a strawman, doesn't align with his own actions and delegitimizes important political actions we need to make in order to build a better cyberphysical world.

EDIT: Diskussions under this are fine, but I do not want this to turn into an ad hominem attack to Cory. Be fucking respectful

https://tante.cc/2026/02/20/acting-ethical-in-an-imperfect-world/

Acting ethically in an imperfect world

Life is complicated. Regardless of what your beliefs or politics or ethics are, the way that we set up our society and economy will often force you to act against them: You might not want to fly somewhere but your employer will not accept another mode of transportation, you want to eat vegan but are […]

Smashing Frames

I really like and admire @pluralistic and have utmost respect for him, and that's why I'm totally baffled about why he is claiming "fruit of the poisoned tree" arguments as cause of LLM scepticism.

The objections to LLMs aren't about origins but about what they they are doing right now: destroying the planet, stealing labour, giving power over knowledge to LLM owners etc.

The objections are nothing to do with LLMs' origins, they're entirely about LLMs' effects in the here and now.

Cory Doctorow Feb 20

@FediThing @tante

Which parts of running a model on your own laptop are implicated in "destroying the planet?" How is checking punctuation "stealing labor?" Or, for that matter "giving power over knowledge to LLM owners?"

@pluralistic @tante

(Hello Mr Doctorow! Just want to make clear I admire you a great deal and this isn't intended as an attack on you!)

Running a local LLM with no connection to outside providers might be a way of avoiding bad stuff, but I am not clear on how this relates to discussing origins of technologies?

It seems like there's ambiguity in your post about whether it applies just to people with homelabs wondering if they should try offline LLMs, or whether you are discussing LLMs as a general technology?

Almost everyone using LLMs will use the online kind, so objections to LLMs are (reasonably IMHO) based on that scenario.

@FediThing @tante

> I am not clear on how this connects to discussing origins of technologies

Because the arguments against running an LLM on your own computer boil down to, "The LLM was made by bad people, or in bad ways."

This is a purity culture standard, a "fruit of the poisoned tree" argument, and while it is often dressed up in objectivity ("I don't use the fruit of the poisoned tree"), it is just special pleading ("the fruits of the poisoned tree that I use don't count, because __").

Cory Doctorow Feb 20

@FediThing @tante

> Almost everyone using LLMs will use the online kind, so objections to LLMs are (reasonably IMHO) based on that scenario.

Except that in this specific instance, you are weighing on an article that claims that it is wrong to run a local LLM for the purposes of checking for punctuation errors.

@pluralistic @tante

Thank you for the responses 🙏

"Because the arguments against running an LLM on your own computer"

...ahhh okay. So was this post aimed more at a very narrow homelab kind of audience?

It's just, as a reader, the article's emphasis on examples of tech origins imply it's trying to defend LLMs in general? This probably is my ignorance as a reader, but it's how it came across to me, and led to bafflement.

Cory Doctorow Feb 20

@FediThing @tante This is the use-case that is under discussion.

https://pluralistic.net/2026/02/19/now-we-are-six/

@pluralistic @tante

Thanks. Can totally see how that makes sense at a technical level for people who run their own offline services.

I think it's the ambiguity that is driving the discourse over this post. People are taking the "refusing to use a technology" section as a defence of LLMs in general?

If the angle was caging LLMs or something like that, it might make it clearer that you aren't endorsing the most common form of LLM?

Anyway, it's your call on this as author, just wanted to feed back on this because your writing matters and I hope feedback is helpful to it.

Cory Doctorow Feb 20

@FediThing @tante Thank you.

prince lucija Feb 20

@FediThing @pluralistic @tante i feel in the similar way as big tech has taken the notion of AI and LLMs as a cue/excuse to mount a global campaign of public manipulation and massive investments into a speculative project and pumps gazillions$ into it and convinces everyone it's innevitable tech to be put in bag of potato chips, the backlash is then that anything that bears the name of AI and LLM is poisonous plague and people are unfollowing anyone who's touched it in any way or talks about it in any other way than "it's fascist tech, i'm putting a filter in my feed!" (while it IS fascist tech because it's in hands of fascists).

in my view the problem seems not what LLMs are (what kind of tech), but how they are used and what they extract from planet when they are used by the big tech in this monstrous harmful way. of course there's a big blurred line and tech can't be separated from the political, but... AI is not intelligent (Big Tech wants you to believe that), and LLMs are not capable of intelligence and learning (Big Tech wants you to believe that).

so i feel like a big chunk of anger and hate should really be directed at techno oligarchs and only partially and much more critically at actual algorithms in play. it's not LLMs that are harming the planet, but rather the extraction, these companies who are absolute evil and are doing whatever the hell they want, unchecked, unregulated.

or as varoufakis said to tim nguyen: "we don't want to get rid of your tech or company (google). we want to socialize your company in order to use it more productively" and, if i may add, safely and beneficialy for everyone not just a few.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@prinlu @FediThing @pluralistic @tante I agree with most things said in this thread, but on a very practical level, I'm curious what training data was used for the model used by @pluralistic 's typo-checking ollama?

for me, that training data is key here. was it consensually allowed for use in training?

because as I understand, LLMs need vast amounts of training data, and I'm just not sure how you would get access to such data consensually. would love to be enlightened about this :)

Cory Doctorow Feb 20

@bazkie @prinlu @FediThing @tante

I do not accept the premise that scraping for training data is unethical (leaving aside questions of overloading others' servers).

This is how every search engine works. It's how computational linguistics works. It's how the Internet Archive works.

Making transient copies of other peoples' work to perform mathematical analysis on them isn't just acceptable, it's an unalloyed good and should be encouraged:

https://pluralistic.net/2023/09/17/how-to-think-about-scraping/

How To Think About Scraping – Pluralistic: Daily links from Cory Doctorow

@pluralistic @bazkie @prinlu @tante

This would be my take:

Search engines direct people to the work they index. They reward labour by directing people towards it.

Scraping without consent for training data lets people reproduce the work without crediting or rewarding the people who actually did the labour. That seems like labour theft?

If it is labour theft, then it isn't sustainable and that's part of why LLMs are so questionable as a technology.

Cory Doctorow Feb 20

@FediThing @bazkie @prinlu @tante

There are tons of private search engines, indices, and analysis projects that don't direct text to other works.

I could scrape the web for a compilation of "websites no one should visit, ever." That's not "labor theft."

@pluralistic @bazkie @prinlu @tante

Indexing works is a totally different thing to creating knock-offs of works, surely?

What Miyazaki said about AI knock-offs surely illustrates the difference?

Cory Doctorow Feb 20

@FediThing @bazkie @prinlu @tante

No one is defending "creating knock offs of works." Why would you raise it here? Who has suggested that this is a good way to use LLMs or a good outcome from scraping?

Cory Doctorow Feb 20

@FediThing @bazkie @prinlu @tante

The argument was literally, "It's not OK to check the punctuation in *your own work* if the punctuation checker was created by examining other peoples' work, because performing mathematical analysis on other peoples' work is *per se* unethical."

Cory Doctorow Feb 20

@FediThing @bazkie @prinlu @tante

By this standard the OED is unethical.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic @FediThing @prinlu @tante I'd say "because performing [automated, mass scale] mathematical analysis on other peoples' work [without their consent] [with the goal of augmenting one's own work] is *per se* unethical" - and in that case, it's a statement I would agree with.

Cory Doctorow Feb 20

@bazkie @FediThing @prinlu @tante

You've literally just made the case against:

* Dictionaries
* Encyclopedias
* Bibliographies

And also the entire field of computational linguistics.

If that's your position, fine, we have nothing more to say to one another because I think that's a very, very bad position.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic @FediThing @prinlu @tante I did not make that case, if you'd properly read my [additions] to the statement.

making dictionaries etc isn't automated on mass scales like feeding training data to LLMs is.

it's a very human job that involves a lot of expertise and takes a lot of time.

@pluralistic @bazkie @FediThing @prinlu @tante I think part of the issue here is that GenAI is being pushed so hard and fast *everywhere* that's it's hard to be nuanced about what narrow use-cases might be acceptable or not.

We're living under a massive pro-LLM propaganda campaign. They have already set the terms of the debate with a maximalist position. It's no surprise that the backlash is similarly absolute.

@pluralistic @bazkie @FediThing @prinlu @tante I mean, the Butlarian Jihad was absolutist for a reason. Replicating human thought is a slippery slope that leads to very unpleasant places. It's no surprise that folks say, "I want none of that."

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@zenkat good point. I will admit, if we didn't live in a techbro-feudal slopworld, I probably wouldn't mind a non-consenually trained typo checker LLM all that much.

komali_2 Feb 21

@zenkat @pluralistic @bazkie @FediThing @prinlu @tante

Butlerian Jihad led to the creation of a feudal empire... Anyway it's a fictional work. In real life, when has the extremist maximalist position worked out? The Soviet Iron curtain? It remains to be seen if the PRC firewall works out long term for them (the maximalist position in both cases: our populations must be forcefully prevented from coming into contact with capitalist degeneracy or our socialist project will fail)

@komali_2 @pluralistic @bazkie @FediThing @prinlu @tante yep, I'm not a fan of maximalist positions myself, and my personal views on LLMs are even more nuanced than Cory's.

But let's be frank, the AI boosters are far more maximalist than any neo-luddite. The Singularity is established dogma for many leaders in tech. The replacement of humanity by infinitely-accelerating technology. Not as metaphor, but as prophecy.

That's as maximalist as it gets. And these are people with immense power and wealth. They can turn their maximalist prophecies into reality. And as you note, the history of maximalist movements is not pretty.

The future these people are pushing is scary as fuck. The pushback is understandable.

komali_2 Feb 21

@zenkat right, fair, I guess I just get flashbacks to communists in Spain putting anarchists against the wall while the fascists come marching in. In the face of maximalism, nuance might be a survival strategy if nothing else.

At least we can draw comfort from the fact that LLMs are a dead end for any kind of AI that would lead to a singularity, but the True Believers are too blinded by FOMO to realize it

Joris Meys Feb 20

@pluralistic
No, because dictionnaries are about language which is a shared common, encyclopedias are about knowledge, which is a shared common, and bibliohraphies are a list of works, not a derivative.

Knowledge, language and a list of works cannot be copyrighted. You can use language, knowledge, words from the dictionary. You can quote an encyclopedia when refering to the source. None of that is even relevant to this discussion.

@bazkie @FediThing @prinlu @tante

@pluralistic @bazkie @FediThing @prinlu @tante encyclopedias don't say: look i made a painting, wanna buy?
but all that is kinda offtopic, imo
there are plenty of reasons to not use LLM, besides that commons/copyright stuff, wich are not purist and very much based on real-world-issues. in a perfect world, theft won't be an issue (imo), because we had overcome money and fossil energy. but using genLLM still would be morally wrong, because of its dangers, due to bias and failures. …/

@pluralistic @bazkie @FediThing @prinlu @tante /… also: loss of knowledge. those dangers will never be solved due to the nature of this technology. that's why defending it now, is problematic.
so in a perfect world your use-case would be fine, but your reasoning, wich tante criticises, somehow invalidates LLM-criticism as a whole, doesn't it?
(i also doubt that any LLM-spellcheck is better than the built-in langenscheidt that i got, but that's german, might be quite a different ux)

Joris Meys Feb 20

@pluralistic
The argument was "without the consent of the creators of said works." And you know that.

Don't be just another debate bro. Please.

@FediThing @bazkie @prinlu @tante

@pluralistic @bazkie @prinlu @tante

If LLMs were only used for checking grammar that is one thing.

But by far the most common use of LLMs is labour theft through creating knock-offs, and that's something else.

I think the concern is that training data useful for the first case could be useful for the second case too? Hence the questions about where the training data comes from and where it ends up.

Kind of feels like it needs to be strictly ringfenced if it's to be ethical?

Cory Doctorow Feb 20

@FediThing @bazkie @prinlu @tante

Once again, you a replying to a thread that started when someone wrote that using an LLM to check the punctuation in your own work is ethically impermissible because no one should assemble corpora of other peoples' works for analytical purposes under any circumstances, ever.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic @FediThing @prinlu @tante sure, but I'm responding here specifically to your statement that scraping for training isn't unethical per se.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic @FediThing @prinlu @tante you keep conveniently malforming the aspect of "mass automated non-consensual scraping with the goal of helping producing works" into "analytical purposes" and I find that in rather bad faith

@pluralistic @bazkie @prinlu @tante

I guess the question is if such data is assembled for a legitimate purpose, are there safeguards to stop the same data being used for an illegitimate purpose?

If there aren't any safeguards, then there's a danger the legitimate purpose is used as a shield/figleaf for illegitimate stuff?

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic @prinlu @FediThing @tante I think the difference to search engines is how LLM reproduces the training data..

as a thought experiment; what if I'd scrape all your blogposts, then start a blog that makes Cory Doctorow styled blogposts, which would end up more popular than your OG blog since I throw billions in marketing money at it.

would you find that ethical? would you find it acceptable?

further thought experiment; lets say you lose most of your income as a result and have to stop making blogs and start flipping burgers at mcDonalds.

your blog would stop existing, and so, my copycat blog would, too - or at least, it would stop bringing novel blogposts.

this kind of effect is real and will very much hinder cultural development, if not grind it to a halt.

that is a problem - this is culturally unsustainable.

Cory Doctorow Feb 20

@bazkie @prinlu @FediThing @tante

First: checking for punctuation errors and other typos *in my own work* in a model running on *my own laptop* has nothing - not one single, solitary thing - in common with your example.

Nothing.

Literally, nothing.

But second: I literally license my work for commercial republication and it is widely republished in commercial outlets without any payment or notice to me.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic but then you consented to that, right? you are in control of that.

also my example IS similar - after all, it's data scraped without consent, used to create another work. the typo-checker changes your blogpost based on my training data, in the same way my copycat blog changes 'my' works based on your training data.

sure, it's on a way different scale - deliberately, to more clearly show the principle - but it's the same thing.

Cory Doctorow Feb 20

Should we ban the OED?

There is literally no way to study language itself without acquiring vast corpora of existing language, and no one in the history of scholarship has ever obtained permission to construct such a corpus.

bazkie 👩🏼‍💻 bitplanes 🎵Feb 20

@pluralistic I gave it a good thought, and you know what, I'm gonna argue that yes, for me there is a degree of unethical-ness to that lack of permission!

the things that makes me not mind that so much are a variety of differences in method and scale;

(*btw just explaining my personal reasons here, not arguing yours)

- every word in the OED was painstakingly researched by human experts to make the most possible sense of it

- coming from a place of passion on the end of the linguists, no doubt

- the ownership of said data isn't "techno-feudal mega-corporations existing under a fascist regime"

- the OED didn't spell the end of human culture (heh) like LLMs very much might.

so yeah. I guess we do agree that, on some level, the OED and an LLM have something in similar.

it's the differences in method and scale that make me draw the line somewhere in between them; in a different spot from where you may draw it.

and like @zenkat mentioned elsewhere, it's the whole thing around LLMs that makes me very wary of normalizing anything to do with it, and I concede I wouldn't mind your slightly unethical LLM spellchecker as much, if we didn't live in this horrible context. :)

I guess this has become a bit of a reconciliatory toot. agree to disagree on where we draw the line, to each their own, and all that.

David, a Bostonian in Tokyo.Feb 20

@pluralistic @bazkie

Dictionaries reference the sources they use for examples in the entries themselves.

LLMs lose the references at training time.

You've got this dead wrong.

@pluralistic @bazkie @prinlu @FediThing @tante Is DDOSing independent websites an unalloyed good?

@pluralistic @tante After reading so many comments, it is pretty clear who here would be opposing the creation of Napster and torrenting and be defending RIAA... They are also clearly very much against Internet Archive, shadow libraries, etc, simply because they can't take any disagreement.

Who knew running a local LLM, that uses the same energy as watching a youtube video, to spellcheck your own work would bring out such a mob.

@pluralistic @FediThing @tante you’re attempting to legitimize use of an unethical technology for something you don’t actually need a plausible-sounding-wall-of-text generator for

it goes beyond “it’s made by bad people in bad ways”. it’s a “”tool”” that actively causes cognitive decline and psychosis and sucks the soul out of everything it touches. and mind you promoting and legitimizing it is an act of support for those bad people and their bad ways. your deflection is a typical that of someone with no regard for ethics

“I installed Ollama” instantly gives a person away as a techbro

your not-so-friendly not-so-neighborhood “””liberal”””

Cory Doctorow Feb 20

@zaire @FediThing @tante

I'm not a liberal, I'm a leftist, so perhaps this is why I disagree with you.

The argument that "something is unethical because someone else used it in an unethical way" is so incoherent that it doesn't even rise to the level of debatability.

@pluralistic @FediThing @tante yea no the thing is you’re acting like a liberal, what we in the biz call a shitlib, and i’m a leftist, and you have things conflated a little

@pluralistic @FediThing @tante The argument that “The argument that “something is unethical because someone else used it in an unethical way” is so incoherent that it doesn’t even rise to the level of debatability.” doesn’t address what i’m saying here at all

again, pretty clear you don’t know what ethics are or how to be ethical in tech

Ian Betteridge Feb 20

@zaire @pluralistic @FediThing @tante Define “an unethical technology” in a way which doesn’t also include whatever device you’re typing/dictating/writing on.

@ianbetteridge @pluralistic @FediThing @tante okay if you’re gonna present that to me as a gotcha i ain’t following that line of argument

Ian Betteridge Feb 20

@zaire @pluralistic @FediThing @tante The main reason I ask is that everything you stated after "It's a tool that..." was also claimed about the smartphone, computer, Dungeons and Dragons, and television. And that's just in my lifetime.

Dilman Dila Feb 21

@pluralistic @FediThing @tante So this is some kind of spell-checker, which is already in LibreOffice? I'm not sure why I would use that instead.

I use offline AI, esp for visual effects, subtitles, fixing dialogue errors, etc. There are "deep fake technologies" useful for mocap, camera tracking, and such other tedious works. They don't use prompts, and don't generate art, and are trained on your own inputs.

Perhaps we need a new name to differentiate it from the online genAI tech.