๐Ÿงต Today, I learned about NAT collision in my Kubernetes cluster with wireguard (kubespan) to mesh the network between my home nodes and edge node.

#kubernetes #talos #network #homelab

I run a Talos Linux cluster with some nodes at home and only 1 edge baremetal at OVH, connected via KubeSpan (WireGuard mesh).

All home nodes share the same public IP and advertise the same endpoint `<home_isp_ip>:51820` to the remote peer.

The mesh mostly works because each node initiates outbound and NAT assigns different ephemeral source ports ; so WireGuard can tell them apart.

But when a tunnel drops and the OVH node needs to re-establish the connection, it only knows <home_isp_ip>:51820 for all peers. It can't distinguish them, so recovery is unreliable and causes flapping.

Fix: a unique port forward per node (51821-51824) and Talos endpoint filters to stop advertising the default :51820.

Now when a tunnel drops, the OVH node has a dedicated port to reach each home node directly.

Also found a weird bug with Kubespan config `FilterIPs` processes rules sequentially, so ["!ip/32", "0.0.0.0/0"] silently does nothing. The deny has to come after the allow.

This would have been much simpler if KubeSpan allowed overriding WireGuard's listenPort per node.

Instead of the whole extraAnnouncedEndpoints + filters workaround, I could just override WireGuard ListenPort: 5182x per node and do a simple port forward. But Kubespan hardcodes it to 51820.

There's an existing issue about this: https://github.com/siderolabs/talos/issues/9038

Proper KubeSpan port forwarding support ยท Issue #9038 ยท siderolabs/talos

I want to experiment Talos between my home and my Azure network over IPv4, and my home network is obviously having a NAT with a private network range behind a router. More specifically, I want to h...

GitHub