Curriculum Vitae:
PhD degree from Aarhus University
Worked at Google Zürich and London
Partner at Intempus Timeregistrering - now part of Visma
Operating nat64.net/
Which scenario are you dealing with?
The retransmitted TCP packets makes that hypothesis sound unlikely.
I agree with the recommendation to disable DNS lookups in sshd. But I don’t expect it to solve other problems such as MTU problems.
Based on the symptoms you describe my first guess is a PMTU problem.
If this was Linux I would recommend trying to change advmss on your routes to 1220 on both client and server. I am guessing FreeBSD has a similar setting which can tweak the MSS, but I have no idea what it would be called.
This got me wondering if there is a way to tell a crawler that crawling this site is permitted, but only if you use IPv6.
Simply serving different versions of robots.txt depending on address family won’t achieve that since the crawler will silently assume the version of robots.txt it received applies in both cases.
I am guessing they load robots.txt before each intended fetch to verify that the URL they intend to fetch is permitted. If they primarily want resources that are not permitted, it would explain why they fetch robots.txt more often than anything else.
Of course caching robots.txt would be better. The only problem with that is that you may end up fetching a URL which is no longer permitted because you used an outdated version of robots.txt.
If you want a crawler to be extra well behaved you could take this approach:
But I think that’s probably a bit too advanced for an AI company to work out.