I was disappointed to read Cory Doctorow's post where he got weirdly defensive about his LLM use and started arguing with an imaginary foe.

@tante has a very thoughtful reply here:

https://tante.cc/2026/02/20/acting-ethical-in-an-imperfect-world/
A few further comments, 🧵>>

Acting ethically in an imperfect world

Life is complicated. Regardless of what your beliefs or politics or ethics are, the way that we set up our society and economy will often force you to act against them: You might not want to fly somewhere but your employer will not accept another mode of transportation, you want to eat vegan but are […]

Smashing Frames
It was particularly disappointing to see Doctorow misconstrue (and thus, if he is believed) undermine the work that many of us are doing to shine a light on the ways in which the ideology of "AI" and the specific ways in which LLMs and other "AI" products are created do real harm.
>>

I also want to point out (again) the ways in which lumping together all uses of LMs (like the lumping of technologies into "AI") obscures the issues at hand.

Language modeling is a useful component of many technologies that can be built without extractive, exploitative means. Take the automatic transcription built by and for the Māori people -- there's te reo Māori language model that's part of that.
>>

And the transformer architecture represented an important step forward in language modeling, that brought improvements to things like spell checking (Doctorow's use case).
>>

And you can build and use language models without turning them into the synthetic text extruding machines that are despoiling our information ecosystem.

And even if those are easily accessible, because OpenAI et al want to burn through cash with their demos, we can still refute and refuse the narrative that synthetic text is somehow a panacea to be used across social services (medicine, education) and in science, etc.
>>

Doctorow could have gone into these details; could have said something about the particular LLM he chose was built (whose data, trained how, how much data, what kind of further data work in RLHF); could have drawn a distinction about use cases.
>>
@emilymbender are there examples of people doing this well (describing what they chose, why, what data, maybe even how to challange or improve ?), that others can learn from? Would be v interested .

@sunnydeveloper There is a whole literature on dataset documentation, including Data Statements for NLP. We link to some of the other projects from this page and also have some sample data statements.

https://techpolicylab.uw.edu/data-statements/

Data Statements | Tech Policy Lab

@emilymbender thankyou ! I want to help people make more informed decisions, and be able to describe their choices - but teaching myself first!