A few random thoughts from my yesterday’s #Akkoma instance failure:
- Elixir libraries used by #Pleroma and #Akkoma suck at #IPv6 - this manifests by rather puzzling error messages like “I can’t resolve your Postgres server name” but it’s because it only resolves to AAAA, same for egress HTTP proxy
- Postgres library can be made to work by rather ugly workarounds which don’t always work across version upgrades, and for HTTP proxy I had to set up a local IPv4-only Squid proxy exclusively for Akkoma to be able to reach the Internet
- Fediverse instances in general massively increase load of the local DNS resolver because the whole point of federation is continuous updates sent to thousands of other instances, each using a different domain name in hundreds of the new TLDs
- especially the latter impacts #DNSSEC validation and caching - nothing wrong with Fediverse here, you just need to be aware of that as a scaling problem
- if you’re not aware, you can mistakenly blame the latter on the former, when “things stop working” as they did yesterday - Fediverse was only one of the impacted services but mostly visible for me
The primary solution was to spread DNS resolution load on two Unbound instances which I already had on the edge firewalls, except one of them was slacking as a redundancy instance. The other was to enable RFC 8767 serving of stale responses in Unbound.