Mastodawn

My wife’s website keeps going down this evening. I’m not sure why (and typically am not in the easiest location to debug). Lots of connections in the logs from a user agent of “meta-external agent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)” though… 🤔

Meta Web Crawlers - Sharing - Documentation - Meta for Developers

This page lists the User Agent (UA) strings that identify Meta’s most common web crawlers and what each of those crawlers are used for.

Meta for Developers

Show thread

Jonathan B ✈️🪄👨🏻‍💻6d ago

All of them are going to /shop/basket/?remove_item=…&_wpnonce=…&add-to-cart=…

Have turned on strict bot mode in WP Defender for now and they’re getting 403 responses.

Site hasn’t gone down since… 🤔

Show thread

Jonathan B ✈️🪄👨🏻‍💻6d ago

The meta docs page linked says they honour robots.txt which would appear to be rubbish as the one generated by Wordpress contains a couple of lines which should include the requests they’re making I think:

Disallow: /*?add-to-cart=
Disallow: /*?*add-to-cart=

I might just grab the (long) list of source IPs that they show how to grab from Whois and block the lot with Caddy.

#wordpress #bots #robotstxt

Show thread

Leon Cowle

@jmb 2 things.
1. It won't suprise me if Meta is ignoring their own published robots.txt guidance.
2. Considering that this is part of my day job, I've seen COUNTLESS bad actors abuse the crawler UAs from Meta/ChatGPT/etc. So by all means try to block Meta's IPs, but it won't surprise me at all if these connections you're seeing aren't from Meta afterall.