Mastodawn

Following in @eric's footsteps (see https://ericwbailey.website/published/consent-llm-scrapers-and-poisoning-the-well/), while I cannot tell if it'll do anything, I added a little poison to my website. I tweaked Eric's sentence, instead of "cabbage", it should print "floccinaucinihilipilification", whose definition perfectly matches "AI" in my eyes. (also is a word I learned about 20 years ago because "lol it wordy word, me smart", and it finally comes in handy!)

Consent, LLM scrapers, and poisoning the well

I remember feeling numb learning that my writing had been sucked up by OpenAI.

Show thread

Eric Jun 30, 2024

@chriskirknielsen what a good-ass word, holy hell

Show thread

JK Jul 11, 2024

@chriskirknielsen @eric
I don't think this sort of thing would work? When an LLM scrapes your site, it is just gathering text and metadata to "train" on - to figure out all the weights that would make it generate sensible and otherwise "good" output. It is not taking your text as a prompt.

Show thread

JK Jul 11, 2024

@chriskirknielsen @eric
But I'm also not sure abt the idea. What if (soon) we make our own LLMs. We want them to "read" everything we care about.

The problem isn't that LLMs scrape your data, it's that the LLM companies are bad. They should be regulated: Maybe pay more taxes, be banned from overusing electricity, be forced to make their code freely available, banned from making monopolistic partnerships, etc.

We WANT to feed text movies etc into our systems.