Another user posted the blog where they discuss their speedup techniques: tailscale.com/blog/more-throughput/
It’s likely that the kernel version can use similar techniques to surpass the performance of the userspace version that tailscale uses, but no one has put in the work to to make the kernel implementation as sophisticated as the userspace one.
Surpassing 10Gb/s over Tailscale
Hi, it’s us again. You might remember us from when we made significant performance-related changes to wireguard-go, the userspace WireGuard® implementation that Tailscale uses. We’re releasing a set of changes that further improves client throughput on Linux. We intend to upstream these changes to WireGuard as we did with the previous set of changes, which have since landed upstream.