I've done the right thing and it's going to cause pain.

#Gentoo Distribution Kernel configs are now hosted entirely on git.gentoo.org rather than GItHub. If you don't use Gentoo mirrors, you may be hitting 502s thanks to our LLM overlords now. If you use Gentoo mirrors, you may be hitting 404s if they hit 502s while trying to fetch from our Infra 🤷.

@mgorny

I hear tell that 402 is popular for responding to LLM scrapers when detected, given that it is a client-end not a server-side problem. (-:

Although I think that we need a 437 Not Even If You Paid Me response code.

I presume that you are not explicitly detecting them.

#HTTP #LLMs

@JdeBP, we have measures but they're insufficient and the service keeps failing.

@mgorny

I've heard before that a WWW front-end to version control is one of the worst. The LLM spiders scrape every commit and the load on the back end is massive.

I can see a future where non-commercial people with vc repositories disable all of the mechanism for strangers reading the code on-line with a WWW browser and only enable cloning.

#HTTP #AI #LLMs #git

@JdeBP, but the whole point is, we don't want to disable it. It's useful. It's useful to be able to quickly look at the git history without having to clone the whole repository (which also implies a lot of load for large repositories). It's useful to link to specific commits on Bugzilla and let people quickly see what changed there. Not to mention it's convenient to use autogenerated archives for tags as distfile sources (that are normally mirrored automatically, so the load is minimal).

@mgorny

They may not want to, yet end up having to because it is simply too expensive to maintain in the face of the LLM spiders. This is where the future seems to be heading.

#HTTP #AI #LLMs