2AM

Brain: "hey, what if we needed to build an NTP server to handle 100k qps?"

Me: "What?"

@kwf Always interested in a 2am thought exercise, but ... I thought NTP's client backoff strategy would make sustaining this level of qps unnecessary, by design
@tychotithonus but what if you're handling tens of millions of clients.

@kwf Hmm, fair. Thundering herd (massive, forced synchronized restart) aside -- which should be extremely rare, and only happen if your tens of millions of clients were recovering from a massive power/comms failure that forced them all back to minpoll simultaneously ...

... after each individual peer's initial burst/minpoll flurry, settling down to maxpoll (1024 seconds, which most clients would be running at most of the time) ... I'd expect 10k qps to handle 10m peers, and 100k qps could handle 100m peers ... but would be near capacity.

But also, since local drfit offset is calculated and stored on each client, and since I would expect most clients to support that quenching / Kiss-of-Death thingie ... I'd expect near-capacity conditions to be brief, absorbable, and very low impact for actual time synchronization.

In other words: Dr. Mills thought about this pretty hard. 😁

@kwf Hmm, though now that I think about it, persistence of local drift, vs VM instantiation and behavior, shifts this traditional assumption I'm making, too!
@tychotithonus Then layer on all sorts of abuse you see on the public WAN, and the burst capacity you want to have to be able to handle 100k qps baseline starts getting less trivial.
@kwf Totally agreed - and you have more experience with that problem surface than I do!
D-Link Firmware Abuses Open NTP Servers - Slashdot

DES writes "FreeBSD developer and NTP buff Poul-Henning Kamp runs a stratum-1 NTP server specifically for the benefit of networks directly connected to the Danish Internet Exchange (DIX). Some time last fall, however, D-Link started including his server in a hardcoded list in their router firmware....