Mastodawn

Show thread

kasperd 11h ago

Which scenario are you dealing with?

One server many clients
One client many servers
One client with many mounts from the same server

Show thread

kasperd 2d ago

The retransmitted TCP packets makes that hypothesis sound unlikely.

I agree with the recommendation to disable DNS lookups in sshd. But I don’t expect it to solve other problems such as MTU problems.

Show thread

kasperd 2d ago

Based on the symptoms you describe my first guess is a PMTU problem.

If this was Linux I would recommend trying to change advmss on your routes to 1220 on both client and server. I am guessing FreeBSD has a similar setting which can tweak the MSS, but I have no idea what it would be called.

Show thread

kasperd 2d ago

Imagine what kind of bullshit Trump is going to be spewing when he realizes that he is not going to get the prize.

Show thread

kasperd 4d ago

Do they publish the information necessary to validate the watermark?

Show thread

kasperd 4d ago

Doing so means the software vendor takes the losses for the fault of the hardware vendor. Whether that’s fair or not depends on what sort of arrangement there is between the two vendors.

Show thread

kasperd 4d ago

This got me wondering if there is a way to tell a crawler that crawling this site is permitted, but only if you use IPv6.

Simply serving different versions of robots.txt depending on address family won’t achieve that since the crawler will silently assume the version of robots.txt it received applies in both cases.

Show thread

kasperd 4d ago

I am guessing they load robots.txt before each intended fetch to verify that the URL they intend to fetch is permitted. If they primarily want resources that are not permitted, it would explain why they fetch robots.txt more often than anything else.

Of course caching robots.txt would be better. The only problem with that is that you may end up fetching a URL which is no longer permitted because you used an outdated version of robots.txt.

If you want a crawler to be extra well behaved you could take this approach:

If your cached robots.txt is older than 24 hours or you haven’t cached it at all. Then retrieve robots.txt.
If your cached robots.txt is less than 24 hours old and doesn’t permit the desired URL. Then you don’t retrieve anything.
If your cached robots.txt is between 1 minute and 24 hours old and does permit the URL you intend to fetch, then you fetch robots.txt again to ensure the desired URL is still permitted.
If your cached robots.txt is less than 1 minute old and does permit the desired URL you trust the cache.

But I think that’s probably a bit too advanced for an AI company to work out.

Show thread

kasperd 4d ago

Yes, that’s much better than “Move fast and break the entire country”.

Show thread

kasperd 4d ago

The article seems to imply that developers like this. That’s not necessarily the case.