Like, the interesting story for me here is path dependence.
I don’t think we set out to write a bidding schedule design so much as we set out not to have dependencies on Raft consensus (run a single global high-volume Raft cluster some time and see how you end up feeling about distributed consensus).
Like, the inception of `flyd` was literally: “pull the driver code out of Nomad, and make it not depend on Raft”.
But once you do that, you can’t easily do a central planning scheduler anymore. You become Orchestration Milton Friedman. A totally different tech tree.
@tqbf This is fascinating!
Cloud Foundry had a similar journey in its orchestration system
It started with a very fancy pub/sub based system without a central orchestration node. This was hard to debug, fragile, etc.
Then rewrote it with an auction-based central co-ordinator in Go, call Diego, that used etcd and consul for state.
Then, finally, migrated from etcd and consul to SQL because GOOD LORD those things were a pain to run.
@tqbf If you haven't taken a look at Diego you might find it interesting as an example of a pretty production-hardened orchestrator that also makes very different choices from k8s.
https://github.com/cloudfoundry/diego-release
https://github.com/cloudfoundry/diego-design-notes
(The notes are out of date but the fundamentals haven't changed *that* much since that period. IIRC the big thing that's changed is mostly that more logic got moved into it out of the CF API.)
@tqbf Your thread triggered me to pass your hiring page on to several folks who have worked on that system and *especially* on its CLI. (Which, for reasons you noted in your article, is where a lot of the complex logic that makes Cloud Foundry powerful lives.)
Your interview process is very well-targeted for the kinds of folks I suspect you're looking for.