kasperd

@kasperd@westergaard.social
99 Followers
105 Following
2.2K Posts
Currently testing this platform to decide whether it's the future of social networking.

Curriculum Vitae:
PhD degree from Aarhus University
Worked at Google Zürich and London
Partner at Intempus Timeregistrering - now part of Visma
Operating nat64.net/

Which scenario are you dealing with?

  • One server many clients
  • One client many servers
  • One client with many mounts from the same server

The retransmitted TCP packets makes that hypothesis sound unlikely.

I agree with the recommendation to disable DNS lookups in sshd. But I don’t expect it to solve other problems such as MTU problems.

Based on the symptoms you describe my first guess is a PMTU problem.

If this was Linux I would recommend trying to change advmss on your routes to 1220 on both client and server. I am guessing FreeBSD has a similar setting which can tweak the MSS, but I have no idea what it would be called.

Imagine what kind of bullshit Trump is going to be spewing when he realizes that he is not going to get the prize.
Do they publish the information necessary to validate the watermark?
Doing so means the software vendor takes the losses for the fault of the hardware vendor. Whether that’s fair or not depends on what sort of arrangement there is between the two vendors.

This got me wondering if there is a way to tell a crawler that crawling this site is permitted, but only if you use IPv6.

Simply serving different versions of robots.txt depending on address family won’t achieve that since the crawler will silently assume the version of robots.txt it received applies in both cases.

I am guessing they load robots.txt before each intended fetch to verify that the URL they intend to fetch is permitted. If they primarily want resources that are not permitted, it would explain why they fetch robots.txt more often than anything else.

Of course caching robots.txt would be better. The only problem with that is that you may end up fetching a URL which is no longer permitted because you used an outdated version of robots.txt.

If you want a crawler to be extra well behaved you could take this approach:

  • If your cached robots.txt is older than 24 hours or you haven’t cached it at all. Then retrieve robots.txt.
  • If your cached robots.txt is less than 24 hours old and doesn’t permit the desired URL. Then you don’t retrieve anything.
  • If your cached robots.txt is between 1 minute and 24 hours old and does permit the URL you intend to fetch, then you fetch robots.txt again to ensure the desired URL is still permitted.
  • If your cached robots.txt is less than 1 minute old and does permit the desired URL you trust the cache.

But I think that’s probably a bit too advanced for an AI company to work out.

Yes, that’s much better than “Move fast and break the entire country”.
The article seems to imply that developers like this. That’s not necessarily the case.