Carving The Scheduler Out Of Our Orchestrator

A deep dive into container scheduling and Flyd, our new orchestrator.

Fly

Like, the interesting story for me here is path dependence.

I don’t think we set out to write a bidding schedule design so much as we set out not to have dependencies on Raft consensus (run a single global high-volume Raft cluster some time and see how you end up feeling about distributed consensus).

Like, the inception of `flyd` was literally: “pull the driver code out of Nomad, and make it not depend on Raft”.

But once you do that, you can’t easily do a central planning scheduler anymore. You become Orchestration Milton Friedman. A totally different tech tree.

@tqbf This is fascinating!

Cloud Foundry had a similar journey in its orchestration system

It started with a very fancy pub/sub based system without a central orchestration node. This was hard to debug, fragile, etc.

Then rewrote it with an auction-based central co-ordinator in Go, call Diego, that used etcd and consul for state.

Then, finally, migrated from etcd and consul to SQL because GOOD LORD those things were a pain to run.

@tqbf The Auctioneer AFAIK also started out pretty sophisticated and ended up being pretty simple, because it turns out the job is *mostly* about balancing memory across the cell fleets with the kind of workloads Diego runs. It doesn't have your requirements around global distribution, though.

@tqbf If you haven't taken a look at Diego you might find it interesting as an example of a pretty production-hardened orchestrator that also makes very different choices from k8s.

https://github.com/cloudfoundry/diego-release
https://github.com/cloudfoundry/diego-design-notes

(The notes are out of date but the fundamentals haven't changed *that* much since that period. IIRC the big thing that's changed is mostly that more logic got moved into it out of the CF API.)

GitHub - cloudfoundry/diego-release: BOSH Release for Diego

BOSH Release for Diego. Contribute to cloudfoundry/diego-release development by creating an account on GitHub.

GitHub
@tqbf I can't find it now but somewhere in there they've got a set of "simulation" tests that they used to figure out the implications of choices they were making with the auctioneer.
@nat This is very cool, thank you!

@tqbf Your thread triggered me to pass your hiring page on to several folks who have worked on that system and *especially* on its CLI. (Which, for reasons you noted in your article, is where a lot of the complex logic that makes Cloud Foundry powerful lives.)

Your interview process is very well-targeted for the kinds of folks I suspect you're looking for.

@tqbf The SQL, I should note, still has a consensus algorithm for most production deployments, since by default it uses Galera.

A very carefully managed, hardened Galera that is not allowed to get up to any SHIT.

And getting it there took years and many painful outages and data loss incidents.

But, Cloud Foundry is designed to run workloads in data centers that can't access the internet so it's gotta bring and manage its own SQL DB.