Mastodawn

When we look at the performance of QUIC, we often find that the limiting factor is CPU consumption of socket calls and crypto processing. It is, but congestion control also matters quite a big. In this blog (https://www.privateoctopus.com/2026/01/30/cpu_bound.html) I explain how fixing the "max RTT" measurement and the pacing algorithm for the C4 algorithm improved tests of picoquic on loopback from "worse than BBR" to "better than Cubic".

Performance of C4 when CPU bound

Back in November 2025, when doing tests of C4 on a loopback address, I observed that C4 achieved lower data rates than Cubic or even BBR. Since “performance under loopback” was not a high priority scenario, I filed that in the long pile of issues to deal with later. Then, in early January 2026, I read a preprint of a paper by Kathrin Elmenhorst and Nils Aschenbruck titled “2BRobust – Overcoming TCP BBR Performance Degradation in Virtual Machines under CPU Contention” (see: https://arxiv.org/abs/2601.05665). In that paper, they point out that BBR achieves lower than nominal performance when running on VMs under high CPU load, and trace that to pacing issues. Pacing in an application process involves periodically waiting until the pacing system acquires enough tokens. If the CPU is highly loaded, the system call can take longer than the specified maximum wait time, and pacing thus slows the connection more than expected. In such conditions, they suggest increasing the programmed pacing rate above the nominal rate, and show that it helps performance.