Mastodawn

2AM

Brain: "hey, what if we needed to build an NTP server to handle 100k qps?"

Me: "What?"

@kwf good brain.
But still, dear brain: that's the domain of a not-very-large FPGA impl, I'd say. Leave the poor network engineer alone.

Show thread

Brandon Bennett 13h ago

@kwf I know someone you need to talk to if you want to do this :)

Show thread

Rob Chipperfield (M0VFC)13h ago

@kwf then you have the excuse I'm always looking for to buy a https://www.leobodnar.com/shop/index.php?main_page=product_info&cPath=120&products_id=365 😁

LeoNTP Time Server 1200 : Leo Bodnar Electronics

Leo Bodnar Electronics LeoNTP Time Server 1200 - LeoNTP model 1200 is a Stratum 1 NTP time server with GPS synchronised reference clock source. Datasheet LeoNTP 1200 has unique custom design developed by Leo Bodnar Electronics. Its key features are: * maximum performance, reaching 100% of 100Mbps network speed at more than 100,000 time requests per second * supports both IPv4 and

Show thread

Royce Williams 13h ago

@kwf Always interested in a 2am thought exercise, but ... I thought NTP's client backoff strategy would make sustaining this level of qps unnecessary, by design

Show thread

Kenneth Finnegan 12h ago

@tychotithonus but what if you're handling tens of millions of clients.

Show thread

Royce Williams 11h ago

@kwf Hmm, fair. Thundering herd (massive, forced synchronized restart) aside -- which should be extremely rare, and only happen if your tens of millions of clients were recovering from a massive power/comms failure that forced them all back to minpoll simultaneously ...

... after each individual peer's initial burst/minpoll flurry, settling down to maxpoll (1024 seconds, which most clients would be running at most of the time) ... I'd expect 10k qps to handle 10m peers, and 100k qps could handle 100m peers ... but would be near capacity.

But also, since local drfit offset is calculated and stored on each client, and since I would expect most clients to support that quenching / Kiss-of-Death thingie ... I'd expect near-capacity conditions to be brief, absorbable, and very low impact for actual time synchronization.

In other words: Dr. Mills thought about this pretty hard. 😁

Show thread

Royce Williams 11h ago

@kwf Hmm, though now that I think about it, persistence of local drift, vs VM instantiation and behavior, shifts this traditional assumption I'm making, too!

Show thread

Kenneth Finnegan 11h ago

@tychotithonus Then layer on all sorts of abuse you see on the public WAN, and the burst capacity you want to have to be able to handle 100k qps baseline starts getting less trivial.

Show thread

Royce Williams 10h ago

@kwf Totally agreed - and you have more experience with that problem surface than I do!

Show thread

Stormgren 12h ago

@kwf It turns out that's surprisingly little hardware.