Mastodawn

Anton May 20, 2025

Okay, hopefully that's it for #NTP for now:

https://scottstuff.net/posts/2025/05/19/ntp-limits/

I'm seeing up to 200 ns of difference between various GPS devices on my desk (one outlier, should really all be closer to that) plus 200-300 ns of network-induced variability on NTP clients, giving me somewhere between 200 and 500 ns of total error, depending on how I measure it.

So, it's higher than I'd really expected to see when I started, but *well* under my goal of 10 μS.

The Limits of NTP Accuracy on Linux

Lately I’ve been trying to find (and understand) the limits of time syncing between Linux systems. How accurate can you get? What does it take to get that? And what things can easily add measurable amounts of time error? After most of a month (!), I’m starting to understand things. This is kind of a follow-on to a previous post, where I walked through my setup and goals, plus another post where I discussed time syncing in general. I’m trying to get the clocks on a bunch of Linux systems on my network synced as closely as possible so I can trust the timestamps on distributed tracing records that occur on different systems. My local network round-trip times are in the 20–30 microsecond (μS) range and I’d like clocks to be less than 1 RTT apart from each other. Ideally, they’d be within 1 μS, but 10 μS is fine. It’s easy to fire up Chrony against a local GPSTechnically, GNSS, which covers multiple satellite-backed navigation systems, not just the US GPS system, but I’m going to keep saying “GPS” for short. -backed time source and see it claim to be within X nanoseconds of GPS, but it’s tricky to figure out if Chrony is right or not. Especially once it’s claiming to be more accurate than the network’s round-trip time20 μS or so. , the amount of time needed for a single CPU cache miss50-ish nanoseconds. , or even the amount of time that light would take to span the gap between the server and the time source.About 5 ns per meter. I’ve spent way too much time over the past month digging into time, and specifically the limits of what you can accomplish with Linux, Chrony, and GPS. I’ll walk through all of that here eventually, but let me spoil the conclusion and give some limits: GPSes don’t return perfect time. I routinely see up to 200 ns differences between the 3 GPSes on my desk when viewing their output on an oscilloscope. The time gap between the 3 sources varies every second, and it’s rare to see all three within 20 ns of each other. Even the best GPS timing modules that I’ve seen list ~5 ns of jitter on their datasheets. I’d be surprised if you could get 3-5 GPS receivers to agree within 50 ns or so without careful management of consistent antenna cable length, etc. Even small amounts of network complexity can easily add 200-300 ns of systemic error to your measurements. Different NICs and their drivers vary widely on how good they are for sub-microsecond timing. From what I’ve seen, Intel E810 NICs are great, Intel X710s are very good, Mellanox ConnectX-5 are okay, Mellanox ConnectX-3 and ConnectX-4 are borderline, and everything from Realtek is questionable. A lot of Linux systems are terrible at low-latency work. There are a lot of causes for this, but one of the biggest is random “stalls” due to the system’s SMBIOS running to handle power management or other activities, and “pausing” the observable computer for hundreds of microseconds or longer. In general, there’s no good way to know if a given system (especially cheap systems) will be good or bad for timing without testing them. I have two cheap mini PC systems that have inexplicably bad time syncing behavior,1300-2000 ns. and two others with inexplicably good time syncing20-50 ns . Dedicated server hardware is generally more consistent. All in all, I’m able to sync clocks to within 500 ns or so on the bulk of the systems on my network. That’s good enough for my purposes, but it’s not as good as I’d expected to see.

scottstuff.net

Show thread

Graham Sutherland / Polynomial May 20, 2025

@laird

> and everything from Realtek is questionable

I feel like that should be a blanket statement for everything, not just latency.

Show thread

shironeko May 20, 2025

@laird @oxidecomputer I wonder if the lack of a bios could be an demonstrable advantage here when it comes to time syncing.

Show thread

Scott Laird May 20, 2025

@shironeko @oxidecomputer Maybe. The lack of SMBIOS (or similar) running at a higher-than-ring-0 priority would probably help, but not as much (IMO) as having people around who actually understand the hardware in detail and are able to optimize away sources of error without having to treat giant chunks of the system as a black box. Which presumably *also* describes Oxide.

Show thread

shironeko May 20, 2025

@laird @oxidecomputer yeah, I'm imagining some paravirtualized interface where the hypervisor can enable all the VM to have extremely accurate time, would be quite awesome.

Show thread

AMS May 20, 2025

@laird I was always told NTP hits a wall around 500us, you need PTP (IEEE1588) to go further (typically 10us-20us). Though that was through multiple switches with other traffic flowing.

Show thread

Scott Laird May 20, 2025

@AMS I suspect that that's outdated. Chrony seems to stretch much further down than ntpd, and PTP is probably quite a bit better than that today, *although* this probably depends on exactly what you're trying to measure and what the cost of missing your clock-accuracy SLA is.

I'm seeing median errors around 500 ns and P99 probably around 1000 ns (although that's tricky to measure since Chrony's internal PLL should be able to handle short spikes in network noise). However, the *worst* case, with a congested network, a switch reboot, maybe some GPS jamming, an antenna failure, etc -- who knows?

Show thread

Chuck May 20, 2025

@laird Nice! Fun isn't it? I eventually went to a single GPS disciplined time base (RPi-4 based) and NTP to everything else in my network. Like you I got bitten by the assumption of symmetrical xmit/recv times and discovered that my workbench NUC was receiving packets over the backbone network (1GbE) but responding over the instrument network (100M). Bad routing tables from me debugging routing to test gear 🙂

Show thread

Scott Laird May 20, 2025

@ChuckMcManis Three of my NTP servers are on the same /22 with most of the WiFi, etc devices at home. I ended up injecting a /32 for each into OSPF to get around some fun asymmetrical routing problems. That helped quite a bit. Moving them all to a different network would probably be better but it would have involved pulling a bunch of extra wires and would have invalidated all of my existing testing. So maybe some day.

I'm not at all convinced that having a single, simple 1G switch with all of the timing traffic on it wouldn't be a substantial improvement in accuracy, but I have an increasing number of systems without any 1G interfaces (including all of my test systems from this episode), so it's not going to happen.

Show thread

Chuck May 20, 2025

@laird Interesting, I've mostly got static routing tables set up as overall the network is fairly simple. But using some /32's to fix the two biggest losers (downstairs to upstairs, and my lab to the backbone) might be simpler. My biggest gripe was files on my NAS appearing to be "in the future" when the NAS and the machine I was working on were out of sync. As long as everyone is withing a couple of ms its all good.

Show thread

Scott Laird May 20, 2025

@ChuckMcManis I went on a redundancy kick a few years ago after losing a bunch of time to switch reboots and a failure or two, so most of my network is dynamically routed L3, with VxLAN + EVPN for L2 over the top of that.

Bizarrely, it's been *massively* more reliable than the less complex L2 network was before that.

Show thread

Chuck May 20, 2025

@laird Heh. I vacillate between making things more robust and not wanting to make things too much like work 😃 I had a chance to drop in new wiring and brought everything down to two (or in the case of the test equipment three) layers. Device -> local switch -> backbone switch. That and three wireless networks (primary, guest, and IOT) which are all bidirectionally fire-walled.