Mastodawn

I really think it is time for us to treat LLM usage as another form of metadata, such as licensing.

As a user, I want to know if my software contains LLM.

As a developper I want to know if a project accepts LLM usage.

As a web-surfer I want to know if this content has been made by a human.

I don't think we can trust people (especially companies) to disclose their usage, so it's essentially a web-of-trust/web-of-shame.

Is any RFC already up? I wanna talk about this.

#vibecoding #llm

Show thread

wtl 3d ago

@berru https://codeberg.org/robida/human.json doesn’t fit the full scope of what you’re asking for but its a start

human.json

A lightweight protocol for humans to assert authorship of their website content and vouch for the humanity of others.

Codeberg.org

Show thread

technomancy 3d ago

@berru in the #406 channel on libera.chat the creator of https://406.fail/ mentioned that they were working on a draft for this

RFC 406i - The Rejection of Artificially Generated Slop (RAGS)

Show thread

berru 3d ago

@technomancy lovely, i didn't know 406.fail ! I'm hanging out on IRC so that's now in my join list, thanks !

Show thread

webhat 3d ago

@berru that's also why I asked @fdroidorg whether they will also start disclosing it is their catalogue

https://infosec.exchange/@webhat/116294791477295027

webhat (@[email protected])

@[email protected] do you have a not created using AI tag on Fdroid?

Infosec Exchange

Show thread

F-Droid 14m ago

@webhat @berru Isn't all software these days tainted by AI yet? Which list did you start, clean-from-ai or the other one? Which one is shorter? 😸

Show thread

Nelson 3d ago

@berru @xgranade The problem is any labeling effort also helps the LLMs avoid model collapse. If LLMs can preferentially train on human content, the bubble will last longer and destroy more.

This doesn't mean we shouldn't label, but it does mean that labeling without other countermeasures against LLM scrapers may help LLM corporations more than it helps humans. IMHO labelling is best done in spaces where it is difficult for LLM scrapers to gain access.

Show thread

berru 3d ago

@skyfaller @xgranade that is true. But I don't see any good countermeasure coming, so I'd advocate it's worth the risk.

Show thread

Nelson 3d ago

@berru @xgranade I would argue there are a number of good countermeasures, such as iocaine for websites: https://iocaine.madhouse-project.org/

This may not be practical on corporate-owned platforms like GitHub, but for actual websites I think there is a lot we can do to block scrapers or make it more difficult for them to function.

At this point I think we should also consider darknets and purely analog distribution systems, I think people aren't worried enough.

iocaine - the deadliest poison known to AI

Show thread

berru 3d ago

@skyfaller @xgranade this only alleviates a specific use case which is bot traffic and unwanted use of material. Which is good! But doesnt do that much especiallyin the short term..

Show thread

Cassandra is only carbon now 3d ago

@berru @skyfaller I mean, fair, but also not doing something that humans need because AI vendors can misuse it can't *always* be the right answer.

I don't even disagree that it's *sometimes* the right approach, just there has to be a balance where there's room for us to take care of us.

Show thread

inert_a 3d ago

@skyfaller @berru @xgranade I am forever curious about this but my regular browsing setup triggers their protection so I can't read about it