OpenAI's crawler just found our family server / cloud services and immediately proceeded to crash Nextcloud within minutes. Fucking fantastic.

Is there some nice, up-to-date write-up on the different tools to protect yourself against this?
#AI #AISlop #AttackOfTheMachines #selfHosting

@Natanox It's a bit of work, but I'd suggest something like #NetBird or #tailscale to keep your private things private.

The only real downside I see so far is that on mobile devices (iOS in my case) it increases battery consumption to a noticeable degree.

@Fishd Not all family members are inclined to install these tools everywhere, and it would cause e.g. Nextcloud password-protected Share Links to stop working for anyone we want to send things to.

I'll probably go with something like Anubis + iocane instead.

@Natanox Fair points.

My concern with those tools are, you're just playing whack-a-mole ... and your opponent has more resources than you and is sufficiently motivated (by the way of investment capital) to defeat you.

Similarly for those folks suggesting fighting back by 'poisoning the well' ... that assumes you've the spare compute power and significant energy supplies.

@Fishd Also fair points.

Though these tools that fight back are a community effort, so there's a lot of brainpower going in there as well. Basically the giant well-funded army with only a few bright minds against a motivated guerilla force, but digital.

It's a shitty situation, but for now I want to keep our infrastructure as accessible as possible to convince some family members that corpo clouds indeed aren't inherently better.

@Natanox @Fishd
Robots.txt doesn't help?
OpenAI seems to be publishing even lists of ipadresses so you can block it https://developers.openai.com/api/docs/bots
Overview of OpenAI Crawlers

@FoxVK @Fishd If AI crawlers found us I'll rather directly go with things like Anubis. Even *if* some of them care enough, a sufficient amount of them do not or even actively circumvent blocks (e.g. Anthropic is known for that).

To be fair though, I didn't check the default robots.txt if the Nextcloud docker before so this might've been preventable. But at least now I know the bad actors will get to us soon.