Okay, this is weird. My one ConnectX-4 system *really* dislikes running #linuxptp.
I'm currently using #Chrony for NTP sync; in *some* cases linuxptp/ptp4l gives better results., but *not* with ConnectX-4 NICs. I'm judging result quality by tracking the RMS offset from `chrony tracking`, as collected by the Prometheus Chrony collector.
As a steady state, my test system sees 25-40ns of RMS offset time. Nice and steady. Chrony is talking to 3 local NTP servers plus some pool servers, with `hwtimestamp *` set but no `refclock` config.
Just starting up `ptp4l` in the background (where it syncs PTP time from the network onto the NIC's PTP Hardware Clock, but *doesn't* touch the system clock) causes chrony's tracking error to jump from ~40ns to ~900ns. Stopping `ptp4l` makes the errors go away immediately.
In this state, there shouldn't be *any* interaction between Chrony and ptp4l at all, but I see a 45x increase in timing error.
Disabling `hwtimestamp *` doesn't help at all. I could see some weird dependency on the PHC when HW timestamps are used, but disabling them doesn't help at all.
This is on Ubuntu 24.04, with Linux 6.8.0-59 and linuxptp 4.0-1ubuntu1. It's using a MCX456A-ECAx NIC (psid MT_2190110032). I originally observed this with FW 12.27.4000; upgrading to the latest FW (12.28.2302) and rebooting showed no change.
I don't see similar problems w/ ConnectX-5.
*In addition to chrony being unhappy*, ptp4l isn't really syncing correctly, either. It's logging RMS errors that look mostly random, between 20 and 50000ns of error, with no real indication of setting down into sync.
For comparison, my pair of Intel X710 systems run with <15ns of RMS error via the same linuxptp build and same switch infra.