In my case I was checking the access.log from nginx and I realised that many of the hits comes from the user agent "facebookexternalua". Then I took a sample and I checked every IP from the sample in
https://www.abuseipdb.com. Most of the IPs come from Facebook and many of them are reported in the database.
This is the only information I got regarding the "facebookexternalua":
https://udger.com/resources/online-parser?Fheaders=User-Agent%3A+facebookexternalua&Fip=31.13.127.16&test=822&action=analyzeI took a sample of my log and you can check by yourself:
cat /var/log/nginx/access.log | grep facebookexternalua
66.220.149.8 - - [26/Aug/2025:04:41:02 +0000] "POST /fedi/jrballesteros05/inbox HTTP/1.1" 403 118 "-" "facebookexternalua"
66.220.149.35 - - [26/Aug/2025:04:41:31 +0000] "POST /fedi/shared-inbox HTTP/1.1" 403 118 "-" "facebookexternalua"
31.13.127.21 - - [26/Aug/2025:04:47:39 +0000] "POST /fedi/jrballesteros05/inbox HTTP/1.1" 403 118 "-" "facebookexternalua"
66.220.149.62 - - [26/Aug/2025:04:54:56 +0000] "POST /fedi/shared-inbox HTTP/1.1" 403 118 "-" "facebookexternalua"
31.13.127.16 - - [26/Aug/2025:05:03:56 +0000] "POST /fedi/jrballesteros05/inbox HTTP/1.1" 403 118 "-" "facebookexternalua"
2a03:2880:12ff:8:: - - [26/Aug/2025:05:31:20 +0000] "POST /fedi/shared-inbox HTTP/1.1" 403 118 "-" "facebookexternalua"
2a03:2880:2ff:8:: - - [26/Aug/2025:05:33:43 +0000] "POST /fedi/jrballesteros05/inbox HTTP/1.1" 403 118 "-" "facebookexternalua"
I added the nginx configuration from here:
https://github.com/ai-robots-txt/ai.robots.txtBut I slightly modified because I had to add the "facebookexternalua" because is not in the repo. So when Nginx detect the user agent it will automatically give them a 403 (Not authorized) code.
As I see that Facebook keeps hitting the server I decided to create my custom "fail2ban" rule.
The file /etc/fail2ban/filter.d/metabot.local with a regex to detect any content with "facebookexternalua"
#Created by chucho
[INCLUDES]
# Load regexes for filtering
before = botsearch-common.conf
[Definition]
failregex = ^<HOST>\s+\-\s+\-\s+\[.*?\]\s+["](GET|POST|HEAD)\s+.*?["]\s+.*?["]facebookexterna.*?["]
ignoreregex =
datepattern = {^LN-BEG}%%ExY(?P<_sep>[-/.])%%m(?P=_sep)%%d[T ]%%H:%%M:%%S(?:[.,]%%f)?(?:\s*%%z)?
^[^\[]*\[({DATE})
{^LN-BEG}
journalmatch = _SYSTEMD_UNIT=nginx.service + _COMM=nginx
And the content from "/etc/fail2ban/jail.d/metabot.conf", which is gonna use the previous filter and block immediately the IP, I use "nftables" as backend.
[metabot]
dbpurgeage = 3d
logpath = %(nginx_access_log)s
port = http,https
backend = auto
journalmatch = _SYSTEMD_UNIT=nginx.service + _COMM=nginx
banaction = nftables-multiport
enabled = true
filter = metabot
# Ban for 1 hour if there is "facebookexternalua" in the Nginx logs
bantime = 3600
maxretry = 1
# It will be incrementing the ban time if the IP is persistent
bantime.increment = true
bantime.factor = 1
bantime.formula = ban.Time * (1<<(ban.Count if ban.Count<20 else 20)) * banFactor
It start banning the IP for 1 hour, it the IP hits again then it bans for 2. I even have IP banned for +24 hours.
2025-08-26 05:55:32,588 fail2ban.actions [401352]: NOTICE [metabot] Ban 173.252.95.15
2025-08-26 05:55:32,589 fail2ban.observer [401352]: INFO [metabot] IP 173.252.95.15 is bad: 5 # last 2025-08-25 10:45:44 - incr 1h to 1d 8h
2025-08-26 05:55:32,589 fail2ban.observer [401352]: NOTICE [metabot] Increase Ban 173.252.95.15 (6 # 1d 8h -> 2025-08-27 13:55:31)
The problem with this it's that it fills my firewall with blocked IPs but it's working at the moment. If "meta" or whoever that is using the "facebookexternalua" as agent start spoofing the "user agent" this is not going to work anymore.
Maybe your problem is different from mine.
CC:
@simendsjo@fosstodon.org