I was disappointed to read Cory Doctorow's post where he got weirdly defensive about his LLM use and started arguing with an imaginary foe.

@tante has a very thoughtful reply here:

https://tante.cc/2026/02/20/acting-ethical-in-an-imperfect-world/
A few further comments, 🧵>>

Acting ethically in an imperfect world

Life is complicated. Regardless of what your beliefs or politics or ethics are, the way that we set up our society and economy will often force you to act against them: You might not want to fly somewhere but your employer will not accept another mode of transportation, you want to eat vegan but are […]

Smashing Frames
It was particularly disappointing to see Doctorow misconstrue (and thus, if he is believed) undermine the work that many of us are doing to shine a light on the ways in which the ideology of "AI" and the specific ways in which LLMs and other "AI" products are created do real harm.
>>

In this context, I feel like reminding people (again) that the stochastic parrots paper was not primarily a response to synthetic text extruding machines (not at all popular in late 2020), but an exploration of the range harms that had already been documented in the pursuit of LM scale.

https://dl.acm.org/doi/10.1145/3442188.3445922

>>

On the Dangers of Stochastic Parrots | Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

ACM Conferences

I also want to point out (again) the ways in which lumping together all uses of LMs (like the lumping of technologies into "AI") obscures the issues at hand.

Language modeling is a useful component of many technologies that can be built without extractive, exploitative means. Take the automatic transcription built by and for the Māori people -- there's te reo Māori language model that's part of that.
>>

And the transformer architecture represented an important step forward in language modeling, that brought improvements to things like spell checking (Doctorow's use case).
>>
What we argued in Stochastic Parrots, however, was that you can get those benefits of the transformer architecture without ammassing datasets too large to collect with care (meaning consentfully, intentionally, and with the ability to document what's in the data).
>>
@emilymbender "too large to collect with care" resonates strongly.

@emilymbender

"Datasets too large to collect with care" is a great line.

Is there anyone actually doing this?

I know Mozilla is trying this (CommonVoice) and they're incredibly unpopular right now. Could we have community support for such an organization?

@gatesvp @emilymbender

You can get community support for just about anything:

  • if it's really in the best interest of the community
  • if it does not intentionally harm others, and adequately compensates those who are harmed by it
  • if you do the necessary work to discuss these things
  • and if you are not primarily motivated by greed, and have suitable defenses against exploitation by those who are.

I'll leave you to ruminate of the failures of Mozilla, a nominally non-profit organization.

@gatesvp @emilymbender as the other reply says, it's all about informed consent and participation (in other words the exact opposite of what we currently have with LLMs), but also one does not necessarily need enormous data sets to do useful things! Rumors of the demise of "obsolete" technologies like WFSTs have been greatly exaggerated 🙂

@emilymbender

> without ammassing datasets too large to collect with care

I appreciate your post, and I'm a huge proponent for moving past the transformer.

However, how is worrying about this not choosing "cooperate" every single time in the Prisoner's Dilemma when you know Meta & Co will always choose "defect"?

This is worrying to me, because nefarious actors have no incentive to care about these issues (and actually have plenty of incentives to be hostile to these ideas). Also, they have institutional backing to do so.

@budududuroiu In what sense is this choosing 'cooperate' and also where do you see a prisoner's dilemma here?

@emilymbender

The framing is that there is a Commons of human knowledge that is accessible to virtually everyone (based on how internet protocols operate currently). There's a sort of social contract within these Commons which makes participants feel ok to publish permissively, be ok with bearing the costs of compute so that everyone can access the Commons freely.

To 'cooperate' is to abide by this unwritten social contract, respecting /robots.txt, respecting CC.

To 'defect' is to take advantage of this permissiveness to massively scrape the Commons, ignoring the social contract, offloading the costs to the providers in the Commons.

The current SOTA architecture is transformer-based, which requires massive data for training effectively. By cooperating and not engaging in training a 'free as in freedom' LLM, we're 1) losing the benefits of the Commons (as they either get sloppified or people take more information private because of increased compute costs), and 2) we also don't get to build an artifact based on the knowledge of the Commons that can be contributed to the Commons (an open LLM).

If the DeepSeek moment managed to wipe $600bn off Nvidia's market cap, commoditising LLM training would be the death knell of the AI slop hype race, as who would pay thousands in tokens to OAI and Anthropic when you can use a GPL-licensed LLM (or whatever permissive license we come up with)

And you can build and use language models without turning them into the synthetic text extruding machines that are despoiling our information ecosystem.

And even if those are easily accessible, because OpenAI et al want to burn through cash with their demos, we can still refute and refuse the narrative that synthetic text is somehow a panacea to be used across social services (medicine, education) and in science, etc.
>>

Doctorow could have gone into these details; could have said something about the particular LLM he chose was built (whose data, trained how, how much data, what kind of further data work in RLHF); could have drawn a distinction about use cases.
>>

But instead he wrote a defensive screed, seemingly imagining someone knowing about his LLM use ascribing to him all of the ills of everyone's LLM production and use.

A missed opportunity, to be sure.

@emilymbender

His position on subjects is distorted by his personal position in society.

It's a common side-effect of successful critics of society. He speaks, now, to the only people he thinks matter, but they are a narrow group of exceptionals, culled from the privileged, who he interacts most with.

Success has its isolations, and he hasn't confronted this yet . . .

@_chris_real @emilymbender Thank you for this insight into an easy pitfall of being a successful critic. It's something I'd like to keep in mind.

@_chris_real @emilymbender He's been responsive when I've communicated with him, and I'm not a celebrated luminary.

It could be that success has corrupted him. It seems to corrupt everyone.

I thought he was only talking about the ethics of the things, though. Is he actually using them then? For what? I'm curious, as I've only seen the peripheries of discussions about Doctorow and LLMs do far. A link to whatever he said that sparked all this would be welcome.

@mason @_chris_real @emilymbender this is the web version: https://pluralistic.net/2026/02/19/now-we-are-six/#stock-buyback if you search for "llm" on the page you'll find the part this has been talking about
Pluralistic: Six Years of Pluralistic (19 Feb 2026) – Pluralistic: Daily links from Cory Doctorow

@paulsilver @_chris_real @emilymbender Oh, that's disappointing. I've sent him error corrections in the past, but I'd rather see the occasional typo than have him contribute to cooking the planet.

He doesn't talk about the training data for his model, nor whether he's using their cloud services or not. He talks about "purity culture" but disregards ongoing harm.

Thank you for the pointer.

@mason @paulsilver @_chris_real @emilymbender He's running Ollama locally to do a grammar check. Let's not pretend that's a significant use of resources.

@krishooper @mason @emilymbender I've been losing my mind about this. There might be valid criticisms to what he wrote, but like, the idea that he is directly harming the environment with his use case is a straight up denial of reality and yet people (seem?) to be saying that en masse.

Like shit man, attack the parts that you feel stick out. Everyone seems to just be copy-pasting their general argument against AI into their replies to his post, despite the fact that a lot of that doesn't apply to that post.

I feel like I'm going crazy. Either I'm missing something, or everybody is just talking past each other.

@emilymbender
Lovely post! Thank you for your well informed piece. I tend to be in Mélanie Mitchells camp or @ct_bergstrom
Sometimes both

@emilymbender Thank you for this thoughtful and balanced post/thread! It almost obviates a comment on @tante 's piece that I have been planning to write. I am particularly happy about your point on the Māori language model, on transformers and how to build and use language modeling without extractive, exploitative means!

If I may thus join in there and add another thought ...

>>

@emilymbender @tante

What annoyed me in the arguments I had seen was the shallowness of the sketches of liberatory and emancipatory usages, efforts and perspectives - for @pluralistic , with saying "Open Source" enough seems to have been said, while for @tante such usages, efforts and perspectives seem to be in principle impossible, or only "niche attempts" not worth looking into because they are working worse than all the other models anyway. Well, that's not the impression I got when looking at, say, the Institutional Data Initiative, a "Summer School for Women in AI and Data Science" in Addis Ababa and the RAIL workshops, and a host of other movements that I am trying to track from afar, but would love to see much, much more of or eventually become involved with.

>>

@emilymbender @tante @pluralistic

@collinsworth 's post about "AI optimism is a class privilege" made me realize and acknowledge the privilege I have, but then the question is, how can I make good use of that privilege and be an ally for those who are not as privileged as me? Boycoting AI BigTech?Boycoting any (L)LM whatsoever? With some exceptions for acrasia or external constraints being tolerated? Or as long as I am not talking about it in ways that could be understood as affirmative? Probing and developing/distilliing datasets and models in more open ways? Developing infrastructure, social relations and collective action in other directions?

>>

@emilymbender @tante @pluralistic @collinsworth

Now, how to be a good ally is not something that any of @tante , @pluralistic , you or I have any authority to determine I suppose. But I feel like thinking of the challenge we all face in these terms helps align some ideas and set priorities - well, it helps me at least.

https://medium.com/@seidymam/summer-school-for-women-in-ai-and-data-science-a56e847156d9

https://sadilar.org/en/rail-2025/

https://joshcollinsworth.com/blog/sloptimism

/fin

Summer school for women in AI and Data Science

Introduction

Medium

@emilymbender He's not making your criticism; that is not a slight to you or your ranking of what's important.

I'm not really sure what the background is here, but it reminds me of how the left so frequently winds up harming its own allies.

@emilymbender

This distinction about use cases is the important point in my view. So much so that I wasn't fully on board with the first paragraphs of the Smashing Frames article (though I loved the rest).

For example, the analogy to wanting to be vegan but accepting vegetarian. I am convinced of the value of reducing our meat consumption and animal farming. But personally I don't find eating meat morally objectionable on principle. If I did, I'd *not* make exceptions.

>>

@emilymbender

Noting that something should be reduced, and concluding that it is morally objectionable in principle are two different things. One allows for compromise and exception, the other should not.

(NB: I know and accept that some people do find any meat consumption morally objectionable in principle. I respect such views, but don't (currently) share them)

Re: veganism "If I did, I'd not make exceptions."

Yeah, I am a vegan. I've even worked as a chef at vegan restaurants.

Life has a "funny" way of testing convictions in my experience.

For me, for example: I have been incarcerated, more than once. Despite requesting vegan meals, such things were never availed to me.

However: I found that others with whom I was incarcerated, were generally more than happy to trade their meals' vegetables, for my meals' meat. Same for milk, etc.

Of all the weird economies that I encountered whilst incarcerated? It certainly seemed as if it was among the more benign. I managed to maintain being vegan as best I could in a food desert, and cultivated some camaraderie from carnivores who were happy with my generosity with things I had no interest in consuming.

I would posit: @[email protected] probably isn't vegan, and isn't writing from a perspective of authority in such realms. Alas, while analogies are perhaps useful for trying to convey an idea, they're also a fundamental logical fallacy that critical thinking classes in junior colleges will typically highlight as something to avoid in writing.

I'll leave you with a vegan joke: "When I was an omnivore, I didn't understand vegetarians. Now that I am vegan, I understand them even less."
@emilymbender are there examples of people doing this well (describing what they chose, why, what data, maybe even how to challange or improve ?), that others can learn from? Would be v interested .

@sunnydeveloper There is a whole literature on dataset documentation, including Data Statements for NLP. We link to some of the other projects from this page and also have some sample data statements.

https://techpolicylab.uw.edu/data-statements/

Data Statements | Tech Policy Lab

@emilymbender thankyou ! I want to help people make more informed decisions, and be able to describe their choices - but teaching myself first!

@emilymbender Hi from a random Internet person! I wondered if you have a view on "Sovereign" models like Apertus? Per https://raw.githubusercontent.com/swiss-ai/apertus-tech-report/main/Apertus_Tech_Report.pdf

FWIW I am a genAI septic who started out feeling quite positive about this development, but then cooled on it rapidly once I realised that it doesn't address a) environmental impacts, or b) potential harms when genAI is used naively - or for plausible deniability by people doing bad stuff ¯\(ツ)/¯

For anyone reading this who hasn't come across Apertus before, there are now several models like this with characteristics such as:

  • Full disclosure of training data
  • robots.txt is respected during scraping
  • Training corpus includes under-represented languages/cultures
  • Measures taken to mitigate harm are documented
  • Code base is open source, not just the weights
@emilymbender That would have required admitting/exposing that the thing he used was unethically trained and violating the consent of the people whose works were used in making it. So of course he didn't do it.
@emilymbender It's so incredibly sad we have found a method to turn any snippet of text into some numbers that somehow encode the meaning behind it, and yet the most popular usecase is just guessing what the next word is
@me @emilymbender They don't encode the meaning. They encode the correlation to the *form* (NOT meaning) of elements of a corpus of past written text.

@dalias @me @emilymbender Semantic analysis/understanding is an avenue of AI that has been basically abandoned last AI winter. Because it turns out to be pretty hard to figure out how to do anything like that. Especially if it's supposed to grow while preserving that property (hardcoding it is difficult but considerably more tractable).

LLMs encode meaning about as well as unicode & UTF-8 do so (i.e. not at all, that's out of scope).

@emilymbender @me

There are many ways #Aiantagonists lose credibility building inaccurate mythos.

One of which is, they assume AI is frozen in stone with zero development, and because they are outright hostile to the tech they rarely keep up with advancements.

The "Guessing text" is a case in point.

#kona is an Energy Based Model which presents MATHEMATICALLY PROVABLE answers.

If I had a cent for every time in my timeline somebody talks about stochiastic parrots, I'd have 67 cents, and thats just yesterday.

Angry posts won't fix AI, political engagement will.
Get off your fat arses and activate politically, #regulateai

@me @emilymbender If it makes you feel better, no, we haven't figured out how to turn words into numbers that "somehow encode the meaning behind it". There are no meanings in text, only more words. Without a ton of background information LLMs do not possess, words are nothing but arbitrary symbols. Things like word2vec only tell us what other words a given word tends to hang out next to.
@emilymbender Somewhat off-topic: I think some grammatical errors and typos are fine. I really don't like things that are too polished. I work with someone whom English is a second language, and he obviously uses LLMs to write messages. I'd rather read his own broken English than to wonder if the LLM generated what he actually meant. Similarly, I like live music recordings with mistakes and improvisation. Gives things more character, IMO. Over-polishing everything can lead to a type of bland conformity.
@1337 @emilymbender Well said. I had to ask a friend for whom English is a second language that he not use an LLM bot for the same reason.
@1337 @emilymbender this (mis)use of large language models to 'polish' your text is just autotune for words, and about as unhelpful.

@emilymbender

Oh c'mon! Let not facts stand in the way of the Luddite hordes...
...least you be denounced as a broligarch/techbro by the righteous!

@n_dimension @emilymbender

I'm with Lord Byron on the Luddites:

@ecadre @n_dimension

Heh! I'm often reminded (and the piece below does a gorgeous job with the subject, incl referencing Byron's poem) that Luddism wasn't anti-progress / anti-tech / reactionary :) That's how the victors rewrote it, as they so often do.

https://thenib.com/im-a-luddite/

@emilymbender

I’m a Luddite (and So Can You!) | The Nib

What the Luddites can teach us about resisting an automated future.

The Nib
@n_dimension @emilymbender tell us more about luddism you clearly know a lot about it

@emilymbender @mancube

The #luddites lost.

There were many reasons. Not the least being that the state deployed more soldiers against them than against Napoleon... And hanged 17 ringleaders.

But the main reason was THE LUDDITES NEVER FORMED A NATIONAL POLITICAL REPRESENTATION!

Stop posting angry memes. Get off your fat/scrawny arses and become politically active.
#regulateai

@n_dimension @emilymbender @mancube

It's a little hard to mount an effective political party when the system is hellbent on murdering you (and uses all the propaganda machinery at its disposal to distort your message and memory). Status quo parties like Republicans and Democrats "fight" each other (and even the terrorists who hate us for our freedoms or whatever) like pro wrestlers putting on a cute show. Managed opposition. Anything close to a real Luddite party they sabotage, co-opt, and/or kill with the focus reserved for REAL threats.

People are trying. A lot of folks whose names you'll never know have died or been locked up and tortured, trying.

@emilymbender @mancube @violetmadder

Resisting the opressor is never easy.

4 of my ancestors were killed by the Nazis.
One of whome was in a concentration camp, she died from Tuberculosis.

One survived a bloody battle of Monte Casino storming the mountain, (mainly because he was his division Baker, still counts), My Grandfather was the only survivor of his unit fighting Nazis (He never volunteered).

My Great-Uncle was captured, incarcerated and died in a Nazi concentration camp.

#resistance is never easy.
But we don't have to go full #Luddite yet, we have not exhausted political action yet.

The Luddites didn't even go full Luddite. They felt they had not exhausted political action, then were summarily executed and/or sent to slave colonies. They mailed letters, did public demonstrations, signed petitions, and for their efforts they were beset by bullshitters accusing them of "military-like drills," among other obvious fabrications. They wrecked machinery, then got blamed for death threats. They were blamed for the Pentrich uprising, which itself was caused by the unbearable exploitation of the working class, and not by the working class who justifiably marched on Nottingham. But nobody there was flying some imagined flag of Ned Ludd!

I wasn't there when it went down, but I call it like I see it. A city nearby to me had a lady last week who marched up to city council during a public hearing and held up a petition at them, which they'd been ignoring for months because 19,000 people wanted to revoke the city's permit that they granted to ICE to hole up in there. They freaking swarmed her with cops, and the news dutifully reported that in the wake of this vicious attack, city council members were going to have to start carrying firearms to defend themselves. She's been charged with criminal trespassing. In a public hearing, at city hall. People in power lie like that ALL the TIME. Not even a lot of power. Any "representative" of anyone anywhere in the USA is going to lie that protected, civil acts are dangerous violence that must be stopped with more violence. I can't imagine the UK is much better.

Until someone shows me the "death threats and possibly attacked" magistrates, I'm going to assume the magistrates were lying. Even when they were (supposedly) flying Ned's flag, I'm not holding a single one Luddite to any extremist lens, until I read some record other than that of the total slimeball Samuel Bamford, claiming the protestors opened fire unprovoked, because when the mill owners fired on them, that was just "to intimidate" and shouldn't have been seen as uh... shooting them.

I mean um... going "full Luddite" as you put it is probably a bad idea, since they start murdering people long before that point. So I agree. Just... don't expect them to play by any rule other than "kill people until they stop nattering at me."

#opinions #politics #ProbablyWrong #idk

CC: @[email protected] @[email protected] @[email protected]
@emilymbender I always put "AI" in quotes when referring to LLMs. I think the term "AI" was used to make the technology seem more momentous than it is.

@tante small typos if useful to know:
"And that stand lead him into the problematic train of thought" (led)
"Of just reap the fruits..." (or)

And thank you for this piece!!

@emilymbender Thank you for this. I read the piece in Smashing Frames yesterday – very thoughtful, as is your response.
@emilymbender @tante
FYI, there was also a follow-up that I found equally worth reading:
https://tante.cc/2026/02/20/on-alliances/
On Alliances

This morning (it is evening now in freezing Berlin) I wrote an article about a blog post Cory Doctorow had released the day before. In his post Cory made an argument about LLM usage that I criticized: I think his view on technology being neutral and it being possible to “liberate” any technology by making […]

Smashing Frames
@emilymbender @tante Enshittification Man himself getting enshittified? Is nothing in this world sacred?
@funbreaker @emilymbender @tante He never even coined that term, it was used online years before he claimed it