Mastodawn

We apologize for a period of extreme slowness today. The army of AI crawlers just leveled up and hit us very badly.

The good news: We're keeping up with the additional load of new users moving to Codeberg. Welcome aboard, we're happy to have you here. After adjusting the AI crawler protections, performance significantly improved again.

Show thread

Codeberg Aug 15

It seems like the AI crawlers learned how to solve the Anubis challenges. Anubis is a tool hosted on our infrastructure that requires browsers to do some heavy computation before accessing Codeberg again. It really saved us tons of nerves over the past months, because it saved us from manually maintaining blocklists to having a working detection for "real browsers" and "AI crawlers".

Show thread

Codeberg Aug 15

However, we can confirm that at least Huawei networks now send the challenge responses and they actually do seem to take a few seconds to actually compute the answers. It looks plausible, so we assume that AI crawlers leveled up their computing power to emulate more of real browser behaviour to bypass the diversity of challenges that platform enabled to avoid the bot army.

Show thread

Codeberg Aug 15

We have a list of explicitly blocked IP ranges. However, a configuration oversight on our part only blocked these ranges on the "normal" routes. The "anubis-protected" routes didn't consider the challenge. It was not a problem while Anubis also protected from the crawlers on the other routes.

However, now that they managed to break through Anubis, there was nothing stopping these armies.

It took us a while to identify and fix the config issue, but we're safe again (for now).

Show thread

Codeberg Aug 15

For the load average auction, we offer these numbers from one of our physical servers. Who can offer more?

(It was not the "wildest" moment, but the only for which we have a screenshot)

@Codeberg wow

@Codeberg In the days of single CPU servers (early 90s?) and an interesting filesystem problem, I think I may have seen ~400 at a client site!

Show thread

mirabilos Aug 15

@DamonHD @Codeberg got around 800 on a single CPU server (well actually VM) this week, I had blocked lots of Huawei networks but there’s no list of all of them.

I now block some more, but I also reduced MaxServer in Apache httpd, to put an upper cap on this. Getting into swapping is not fun.

Show thread

DamonHD Aug 15

@mirabilos @Codeberg Yes, that is bad. I now run servers with no disc swap and tightly capped limits on all services to avoid the dreaded OOM Killer being needed.

Show thread

mirabilos Aug 15

@DamonHD @Codeberg no swap at all is actually a bad thing on Linux. Put some in, if it is just 128 MiB. They designed it to work with it.

Show thread

DamonHD Aug 16

@mirabilos @Codeberg my scheme has been working well for me (with a little zram) for well over a decade!

Show thread

mirabilos Aug 16

@Codeberg @DamonHD it works, sure

Show thread

DamonHD Aug 16

@mirabilos @Codeberg I run all my primary services on an RPi3 off grid. It has 1GB RAM and swapping to the memory card would be very slow and would wear it fast.

Show thread

DamonHD Aug 16

@mirabilos @Codeberg It also gives better Web response than I can get out of Cloudflare for my target audience. So it meets all my goals.

Show thread

DamonHD Aug 16

@mirabilos @Codeberg Those may well be different to your goals for a server.

Show thread

seppel3210 Aug 19

@mirabilos @DamonHD
AFAIK zram swap has the same positive effects as typical swap in terms of Linux being designed with swap in mind

Show thread

mirabilos Aug 19

@DamonHD @seppel3210 ah, good to know

@Codeberg nice

@Codeberg ouch. This remains a cat-and-mouse game.

At least having them solve the Anubis challenge does cost them extra resources, but if they can do that at scale, it doesn't promise a lot of good.

Show thread

waem Aug 15

@ikke @Codeberg it costs the planet extra resources too unfortunately :/

Show thread

Askaaron Aug 15

@Codeberg wow - that looks scary. Thanks for all your work ❤️

Show thread

Aug 15

@Codeberg I really wish you contacted me at all about this before going public.

Show thread

Codeberg Aug 15

@cadey I'm sorry if this gave you any unwanted or negative attention. I consider crawlers emulating more of real browser features to bypass protections of websites an inevitable future, and today at least one big crawler seems to have started doing so. ~f

Show thread

Aug 15

@Codeberg Can we continue this conversation over email after my panic subsides? me@xeiaso.net.

Show thread

Máňa Zalabák Aug 15

@cadey @Codeberg in case it is because of this, don't be too hard on yourself! It sounds like the bots went out of their way to spend insane cloud compute budgets but still not really achieved anything. It hurt them more than what they've gained

Show thread

sam Aug 15

@Codeberg It'd be a good time to encourage folks to sign up to https://github.com/sponsors/Xe.

Sponsor @Xe on GitHub Sponsors

Support Xe's work in open source

GitHub

Show thread

Codeberg Aug 15

@thesamesam Unfortunately, I'm not sure if encouraging anyone to reinforce the vendor-lock-in of Microsoft GitHub by making maintainers financially dependent on that platform, is in spirit with our mission. ~f

Show thread

sam Aug 15

@Codeberg I'm sure https://www.patreon.com/cadey could work?

Xe is creating Art, Code, Writing | Patreon

Become a patron of Xe today: Get access to exclusive content and experiences on the world’s largest membership platform for artists and creators.

Patreon

Show thread

kat Aug 16

@Codeberg @thesamesam

https://liberapay.com/Xe/ you can support Xe through liberapay, too. Is that more in line with something you could make a visible post directing people to, in order to support that work?

Xe's profile - Liberapay

Hi, I'm Xe. I develop software so you don't have to. I'm a software developer, technical educator, and developer advocate based in Ottawa, Canada.

Liberapay

Show thread

fnetX Aug 16

@zkat
I think that Liberapay is currently one of the best options, thank you for pointing this out. Looks like some people found it already.

As a German non-profit, I think that Codeberg still cannot call for donations to an individual. However, I hope that Codeberg's usage of Anubis has helped with visibility for the project in the past, and made some people chip in.

@thesamesam

Show thread

Bredroll Aug 15

@Codeberg yeowsa. this feels like an arms race that is going to get harder :(

Show thread

Hakan Bayındır Aug 15

@Codeberg This is a great number, but I have seen higher in my career. Unfortunately I either have no screenshots or lost what I already have.

5831.24 is pretty good though. Congrats for hitting, hope your head doesn't hurt. :D

Show thread

lindesbs #FckAFD Aug 15

@Codeberg
Hw much RAM do you have in your Machines?

Show thread

Codeberg Aug 15

@lindesbs 160 GB apparently. Looked it up from https://codeberg.org/Codeberg-Infrastructure/meta/src/branch/main/hardware/achtermann.md. ~f

meta/hardware/achtermann.md at main

meta - Organizational repo for Codeberg's Infrastructure: Documentation, Organizing, Planning.

Codeberg.org

Show thread

Clara Aug 15

@Codeberg damn. The only time I've seen numbers like this were when a ceph server went down.

Show thread

Sharlatan Aug 15

@Codeberg what is the threshold for alerting so? Grafana/Zabbix/Prometheus?

Show thread

Jann Horn Aug 15

@Codeberg huh, that's a pretty kernel-heavy workload, so much red

Show thread

Jann Horn Aug 15

@Codeberg what are they hitting that's so kernel-intensive, is this filesystem stuff or process executions or something?

Show thread

Vince Aug 16

@jann @Codeberg

It’s io. If the system was cpu bound at 5000+ load you certainly wouldn’t be running anything interactive.

More like typing out the command to stop the web service / nft block web traffic waiting for it to appear on the screen and execute just to restore console interaction 😆

Show thread

Stephen Foskett Aug 16

@Codeberg omfg that load!

Show thread

arialdo Aug 16

@Codeberg thank you for the details. Very interesting. They are worth a blog post.

Show thread

SKC 🏳️‍🌈Aug 16

@Codeberg what if you had challenges for AI to perform that made it mine bitcoin for you and you just block them at the end anyway 🤣

Show thread

odo2063 Aug 16

@Codeberg Here goes more™...

Show thread

Þór Sigurðsson Aug 17

@Codeberg How much of that load was actual I/O wait?

Show thread

Lenny Aug 17

@Codeberg Why not just to block huawei cloud asn prefixes?
It's easy to get them (e.g. from projectdiscovery)

Show thread

Codeberg Aug 18

@lenny If you read the thread, you'll notice that this is exactly what we did, except that we made a mistake. ~f

Show thread

EveyPN Sep 5

@Codeberg We sometimes see similar numbers, or 10k+ when a user submits a 64core job in a single slot and the cgroup limiting kicks in. Bit annoying that load is a bit useless for that now a days

Show thread

Michael Simons Aug 15

@Codeberg Great thread and explanation. Thank you.

Show thread

Ludovic :Firefox: :FreeBSD:Aug 15

@Codeberg #opshugs

Show thread

Stefano Zacchiroli Aug 15

@Codeberg so, to clarify, do you have evidence that the bots were solving Anubis challenges or not, i.e., it was due to the configuration issue? (I think it's inevitably going to happen if Anubis gets traction. I'm just curious if we're already there or not.) Thanks for your work and transparency on all this.

Show thread

Codeberg Aug 15

@zacchiro Yes, the crawlers completed the challenges. We tried to verify if they are sharing the same cookie value across machines, but that doesn't seem to be the case.

Show thread

Bradley M. Kühn Aug 15

I have a follow up question, though, @Codeberg, re: @zacchiro's question. Is it *possible* that giant human farms of Anubis challenge-solvers actually did it? Or did it all happen so fast that there is no way it could be that?

#Huawei surely could fund such a farm and the routing software needed to get the challenge to the human and back to the bot quickly enough that it might *seem* the bot did it.

Show thread

Codeberg Aug 15

@bkuhn
Anubis challenges are not solved by humans. It's not like a captcha. It's a challenge that the browser computes, based on the assumption that crawlers don't run real browsers for performance reasons and only implement simpler crawlers.

So at least one crawler now seems to emulate enough browser behaviour to make it pass the anubis challenge. ~f
@zacchiro

Show thread

Bradley M. Kühn Aug 16

@Codeberg I get it now.

Thanks for taking the time to clue me in.

I'm lucky that I haven't needed to learn about this until now and I'm so sorry you've had to do all this work to fight this LLM training DDoS!

Cc: @zacchiro

Show thread

Efraim Flashner Aug 15

@Codeberg
I like the idea of them figuring out solving the Anubis challenge only to be blocked afterward

Show thread

Máňa Zalabák Aug 15

@efraim @Codeberg ...and spending a good amount of their corporate compute budgets just to walk away empty handed. I hope they learn, or go bankrupt, or both

Show thread

Henrý Ólson Aug 15

@Codeberg are the ip blocklists public?

Show thread

Codeberg Aug 15

@nemo Currently not. We wanted to investigate the legal situation with regards to sharing such lists. They could currently contain individual's IP addresses and likely need to be cleaned up first. ~f

Show thread

Henrý Ólson Aug 15

@Codeberg no worries, ty for fighting the good fight o7

Show thread

Steven Sandoval Aug 15

@Codeberg Was the solution to increase the proof-of-work difficulty?