Mastodawn

Yesterday Cory Doctorow argued that refusal to use LLMs was mere "neoliberal purity culture". I think his argument is a strawman, doesn't align with his own actions and delegitimizes important political actions we need to make in order to build a better cyberphysical world.

EDIT: Diskussions under this are fine, but I do not want this to turn into an ad hominem attack to Cory. Be fucking respectful

https://tante.cc/2026/02/20/acting-ethical-in-an-imperfect-world/

Acting ethically in an imperfect world

Life is complicated. Regardless of what your beliefs or politics or ethics are, the way that we set up our society and economy will often force you to act against them: You might not want to fly somewhere but your employer will not accept another mode of transportation, you want to eat vegan but are […]

Smashing Frames

I really like and admire @pluralistic and have utmost respect for him, and that's why I'm totally baffled about why he is claiming "fruit of the poisoned tree" arguments as cause of LLM scepticism.

The objections to LLMs aren't about origins but about what they they are doing right now: destroying the planet, stealing labour, giving power over knowledge to LLM owners etc.

The objections are nothing to do with LLMs' origins, they're entirely about LLMs' effects in the here and now.

Show thread

Ian Betteridge Feb 20

@FediThing @tante @pluralistic Some people - in fact quite a lot; if my reading is correct - do indeed argue that LLMs can *never* be ethically used because they are “trained on stolen work”.

Show thread

Cory Doctorow Feb 20

@ianbetteridge @FediThing @tante

Performing mathematical analysis on large corpora of published work is not "stealing."

Show thread

Bruno Nicoletti Feb 20

@pluralistic @ianbetteridge @FediThing @tante If that “mathematical analysis” regurgitates near verbatim works created by other people, it certainly is committing IP theft, and LLMs will happily do that. The “mathematical analysis” is effectively a form of lossy compression on its training data which a prompt can later extract.

Show thread

Cory Doctorow Feb 20

@bjn @ianbetteridge @FediThing @tante

Once again, you're talking about *using* a model, not training a model.

Also "IP theft" isn't a thing. Perhaps you mean copyright infringement?

Show thread

Bruno Nicoletti Feb 20

@pluralistic @ianbetteridge @FediThing @tante I’ll give you pedant points for copyright infringement, which is what most people mean by “IP theft”. As for training/using, the difference is somewhat moot. The models are trained to be used, and if trained on copyrighted data without a license, you’ve encoded that data into the model which might then regurgitate it thus facilitating copyright infringement.

Show thread

Cory Doctorow Feb 20

@bjn @ianbetteridge @FediThing @tante it is a bedrock of copyright law that devices 'capable of sustaining a substantial non-infringing use' are lawful. Decided in 1984 (SCOTUS/Betamax) and repeatedly upheld.

It is categorically untrue that merely because a model's output can infringe copyright that the model is therefore illegal.

There's not much that's truly settled in American limitations and exceptions, but this is.

Show thread

Else, Someone Feb 20

@pluralistic
> untrue that merely because a model's output can infringe copyright that the model is therefore illegal.

Mhmmm naaah overfitting and memorization are very much a thing, especially in the case of LLM where they've completely given up on controlling data leaks, and where memorization has been demonstrated rather unambiguously e.g. with the suitesparse example...

Not to imply that "illegal" is bad ofc, or that copyright justifiable

@bjn @ianbetteridge @FediThing @tante

Show thread

Ian Betteridge Feb 21

@nobody @pluralistic @bjn @FediThing @tante Memorisation is very definitely a thing for humans too – ask the ghost of George Harrison, who unconsciously regurgitated "She's so fine" as "My Sweet Lord".

And notably – he got sued for it, and lost, *despite* everyone's acceptance that it wasn't deliberate.

If an LLM regurgitates substantive parts of a work, meeting the legal bar of what would land a human in court, there should be no legal difference – the human who prompted that creation could be sued.

Show thread

Bruno Nicoletti

@ianbetteridge @nobody @pluralistic @FediThing @tante

So it turns out LLMs now happily regurgitate great chunks of copyrighted works, with Anthropic’s model generating near verbatim the entirety of Harry Potter and The Philosopher’s Stone. Not what most people would call “fair use” and possibly the courts as well some time soon.

https://arstechnica.com/ai/2026/02/ais-can-generate-near-verbatim-copies-of-novels-from-training-data/

AIs can generate near-verbatim copies of novels from training data

LLMs memorize more training data than previously thought.

Ars Technica