I really think it is time for us to treat LLM usage as another form of metadata, such as licensing.

As a user, I want to know if my software contains LLM.

As a developper I want to know if a project accepts LLM usage.

As a web-surfer I want to know if this content has been made by a human.

I don't think we can trust people (especially companies) to disclose their usage, so it's essentially a web-of-trust/web-of-shame.

Is any RFC already up? I wanna talk about this.

#vibecoding #llm

@berru https://codeberg.org/robida/human.json doesn’t fit the full scope of what you’re asking for but its a start
human.json

A lightweight protocol for humans to assert authorship of their website content and vouch for the humanity of others.

Codeberg.org
@berru in the #406 channel on libera.chat the creator of https://406.fail/ mentioned that they were working on a draft for this
RFC 406i - The Rejection of Artificially Generated Slop (RAGS)

@technomancy lovely, i didn't know 406.fail ! I'm hanging out on IRC so that's now in my join list, thanks !

@berru that's also why I asked @fdroidorg whether they will also start disclosing it is their catalogue

https://infosec.exchange/@webhat/116294791477295027

webhat (@[email protected])

@[email protected] do you have a not created using AI tag on Fdroid?

Infosec Exchange
@webhat @berru Isn't all software these days tainted by AI yet? Which list did you start, clean-from-ai or the other one? Which one is shorter? 😸

@berru @xgranade The problem is any labeling effort also helps the LLMs avoid model collapse. If LLMs can preferentially train on human content, the bubble will last longer and destroy more.

This doesn't mean we shouldn't label, but it does mean that labeling without other countermeasures against LLM scrapers may help LLM corporations more than it helps humans. IMHO labelling is best done in spaces where it is difficult for LLM scrapers to gain access.

@skyfaller @xgranade that is true. But I don't see any good countermeasure coming, so I'd advocate it's worth the risk.

@berru @xgranade I would argue there are a number of good countermeasures, such as iocaine for websites: https://iocaine.madhouse-project.org/

This may not be practical on corporate-owned platforms like GitHub, but for actual websites I think there is a lot we can do to block scrapers or make it more difficult for them to function.

At this point I think we should also consider darknets and purely analog distribution systems, I think people aren't worried enough.

iocaine - the deadliest poison known to AI

@skyfaller @xgranade this only alleviates a specific use case which is bot traffic and unwanted use of material. Which is good! But doesnt do that much especiallyin the short term..

@berru @skyfaller I mean, fair, but also not doing something that humans need because AI vendors can misuse it can't *always* be the right answer.

I don't even disagree that it's *sometimes* the right approach, just there has to be a balance where there's room for us to take care of us.

@skyfaller @berru @xgranade I am forever curious about this but my regular browsing setup triggers their protection so I can't read about it