I really think it is time for us to treat LLM usage as another form of metadata, such as licensing.

As a user, I want to know if my software contains LLM.

As a developper I want to know if a project accepts LLM usage.

As a web-surfer I want to know if this content has been made by a human.

I don't think we can trust people (especially companies) to disclose their usage, so it's essentially a web-of-trust/web-of-shame.

Is any RFC already up? I wanna talk about this.

#vibecoding #llm

@berru @xgranade The problem is any labeling effort also helps the LLMs avoid model collapse. If LLMs can preferentially train on human content, the bubble will last longer and destroy more.

This doesn't mean we shouldn't label, but it does mean that labeling without other countermeasures against LLM scrapers may help LLM corporations more than it helps humans. IMHO labelling is best done in spaces where it is difficult for LLM scrapers to gain access.

@skyfaller @xgranade that is true. But I don't see any good countermeasure coming, so I'd advocate it's worth the risk.

@berru @xgranade I would argue there are a number of good countermeasures, such as iocaine for websites: https://iocaine.madhouse-project.org/

This may not be practical on corporate-owned platforms like GitHub, but for actual websites I think there is a lot we can do to block scrapers or make it more difficult for them to function.

At this point I think we should also consider darknets and purely analog distribution systems, I think people aren't worried enough.

iocaine - the deadliest poison known to AI

@skyfaller @berru @xgranade I am forever curious about this but my regular browsing setup triggers their protection so I can't read about it