Mastodawn

Christian "Schepp" Schaefer

Wondered why our latest podcast episode didn’t show up on https://workingdraft.de this morning. In our headless WP we preschedule releases and @11ty builds the front facing site daily. Turns out an AI bot broke the build: our log-parsing stats step choked on its UA string:

Mozilla/5.0 (compatible; Thinkbot/0.5.8; +In_the_test_phase,_if_the_Thinkbot_brings_you_trouble,_please_block_its_IP_address._Thank_you.)

"if_the_Thinkbot_brings_you_trouble" 🖕

Working Draft

Wöchentlicher Podcast für Frontend Devs, Design Engineers und Web-Entwickler:innen

Large Heydon Collider Feb 24

@Schepp @11ty 🤦‍♂️

Christian "Schepp" Schaefer Feb 24

@heydon @11ty 🤬😭

Tom Anypuppies Feb 24

@heydon @Schepp @11ty haha this bot also went straight into my honeypot*… repeatedly.
* a directory on my website that only is mentioned in the robots.txt with a disallow and not linked anywhere.
so this motherboardfucker (excuse my french) is actually looking in the robots.txt but then sees a disallow as an invite

Christian "Schepp" Schaefer Feb 24

@webrocker @heydon @11ty

Tom Anypuppies Feb 24

@Schepp @heydon @11ty maybe they should have worked on the thinktwice bot instead

Christian "Schepp" Schaefer Feb 24

@webrocker @heydon @11ty 😂

Luke Harby Feb 24

@webrocker @heydon @Schepp @11ty

I thought robots.txt were completely disregarded but most ai companies publish their ip address ranges and you write some redirect rules to block them scraping your site.

I think there's one specific company who were completely opaque about that and published false ip addresses. Perplexity (I couldn't think of the name straight away), so surely there are other companies doing the same thing.

https://rknight.me/blog/perplexity-doesnt-give-a-shit-about-consent/

Perplexity Doesn’t Give a Shit About Consent

Perplexity proving yet again they don't care about the rules

Tom Anypuppies Feb 24

@lukeharby @heydon @Schepp @11ty I wonder how else if not via my robots.txt entry the bots would discover my unlinked directory. to be fair, there are only a few hits per day in there, but this "thinkbot"(and its user agent string) made a lasting impression.

Moritz Glantz Feb 24

@webrocker @heydon @Schepp @11ty I had an interesting AI-encounter the other day about which rules AIs obey. Maybe I need to write a blog article about it…

Tom Anypuppies Feb 24

@MoritzGlantz @heydon @Schepp @11ty Inspired by this incident I have now completed my feeble defense against those bots that visit my hidden directory. Their IP is saved in a nosql and the single entry to my website checks the current vistor's IP against that nosql and returns a 403 if the IP matches. I sucessfully logged me out of my website by visiting my hidden dir. jay.

@webrocker Gnihihi, classic!

@MoritzGlantz @heydon @Schepp @11ty

Christian "Schepp" Schaefer Feb 24

@Lippe @webrocker @MoritzGlantz @heydon @11ty hahahaha 😂😂😂😂😂

@webrocker Why not send them a multigigabyte file to crawl? Or pollute them with Heydon's script?

@MoritzGlantz @heydon @Schepp @11ty

Moritz Glantz Feb 24

@Lippe @webrocker @heydon @Schepp @11ty A zip bomb maybe? 🤔

@MoritzGlantz Thought of this, but will they at all crawl zip files?

@webrocker @heydon @Schepp @11ty

Tom Anypuppies Feb 24

@Lippe @MoritzGlantz @heydon @Schepp @11ty well if I don't want them to waste the resources on my site, how would serving a gazillion of data help?

@webrocker Right... Oopsie... ;)

@MoritzGlantz @heydon @Schepp @11ty

Moritz Glantz Feb 24

@Lippe @webrocker @heydon @Schepp @11ty Fight fire with fire! ✊

@Schepp By accident I yesterday discovered that the website I host for the local hiking club my parents are in consumed a whopping 170GB traffic in February, 230GB in January, 190 in Dec, you get it... Over the past 12 months this accumulated to 1.2TB of traffic, picking up steam since July.
Just now looking into some of the logs I see lots of bytedance, lots of facebook crawler, and such...

It's a tiny wordpress site, where the club shares some pictures of recent hikes and info on the next...

Christian "Schepp" Schaefer Feb 24

@Lippe yikes!