Joseph Wilk d[-_-]b

@josephwilk
412 Followers
91 Following
89 Posts
♿️💻🎶🎨 Disabled cyborg using automative forms of expression working with alternative bodies - he/him
Arthttps://art.josephwilk.net

Online playtest of TTRPG "Killer Crips 3000"
A game of Disabled Rage using the "Eat The Reich" system.

Set in a fantasy world where disabled people fought & won a war against an ableist regime but in the process something ripped open the seams of reality leaking horrors.

https://www.eventbrite.co.uk/e/workshop-table-top-role-playing-game-killer-crips-3000-with-joseph-wilk-tickets-1983456246025

‪Talking about my artwork Wrongmove: The London rental market, where it's easier to find a home with chandeliers & marble floors than step free access. Tech replicating inequality, failure of legislation & capital's disdain for access.
Fri 6th March - Pm Studio Bristol & online
https://www.watershed.co.uk/studio/events/2026/03/06/lunchtime-talk-wrongmove
Lunchtime Talk: WrongMove | Pervasive Media Studio

In this Lunchtime Talk, Joseph Wilk will take us through WrongMove - a video work that explores the conditions of the Greater London rental market over a year.

Watershed

Best I've found is from: "Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models": https://aclanthology.org/2021.emnlp-main.41.pdf

"Minority and female group names are singly tokenized less than white and male names. Single tokenization correlates with frequency"

With GPT Tokenizers (like BPE used by OpenAI) does the number of tokens used to represent a word correlate with the frequency of that word in the training data?
Is it a way to reverse-engineer frequency in the hidden training data?
A handy tool for investigating various open pretraining datasets of LLMs without having to manage TB of storage space and it's quick:
https://infini-gram.io/
Home

infini-gram
Extracting books from production language models (LLMs):
https://arxiv.org/pdf/2601.02671v1

This hit hard:

“we don’t merely use technologies; we participate in them. With tools, we retain agency—we can choose when and how to use them. With technologies, the choice is subtler: they remake the conditions of choice itself. A pen extends communication without redefining it; social media transformed what we mean by privacy, friendship, even truth." [Peter Hershock]

AI is Destroying Universities and Learning itself.

“Universities are being retrofitted as fulfillment centers of cognitive convenience.”

(via Michael Rera).

https://www.currentaffairs.org/news/ai-is-destroying-the-university-and-learning-itself

AI is Destroying the University and Learning Itself

Students use AI to write papers, professors use AI to grade them, degrees become meaningless, and tech companies make fortunes. Welcome to the death of higher education.

An odd proposal: "llms.txt" web standard using plain text summaries of webpages to fix that LLMs are really bad at searching web pages. What happened to that thing we used previously to search the internet? That seemed to do a pretty good job of searching text.

https://llmstxt.org

The /llms.txt file – llms-txt

A proposal to standardise on using an /llms.txt file to provide information to help LLMs use a website at inference time.

llms-txt

Explicitly opting out of MLCommons Safety benchmark (https://ailuminate.mlcommons.org/benchmarks/general_purpose_ai_chat/1.0-en_us-official-ensemble) for LLMs seems like a pretty big regulator red flag 🙈
* xAI - Grok-3-Preview
* Tencent - Hunyuan-TurboS
* NVIDIA - Llama 3.3 49b

Knowing LLMs scores on Child Sexual Exploitation is clearly for the public good.