Carving The Scheduler Out Of Our Orchestrator

A deep dive into container scheduling and Flyd, our new orchestrator.

Fly

Like, the interesting story for me here is path dependence.

I don’t think we set out to write a bidding schedule design so much as we set out not to have dependencies on Raft consensus (run a single global high-volume Raft cluster some time and see how you end up feeling about distributed consensus).

Like, the inception of `flyd` was literally: “pull the driver code out of Nomad, and make it not depend on Raft”.

But once you do that, you can’t easily do a central planning scheduler anymore. You become Orchestration Milton Friedman. A totally different tech tree.

(everything past “extract driver, lose Raft” is JP’s, who I’d link to if I understood Mastodon. The design his team came up with is very elegant, and also, what’s the word I’m looking for, “rigid in a good way”, oh right RIGOROUS, which is not something you can say about any of the code I wrote in nomad-firecracker).

If you were just scheduling whole apps, I think the Omega designs would have kept scaling indefinitely (we’d have ended up federating somehow).

But as a consequence of chasing this design and becoming orchestration libertarians, we’re not just scheduling apps anymore; the same scheduler design makes it super easy for us to let customers spin random VMs up, to sandbox code, to respond to web requests, to run background jobs, that sort of thing.

I don’t like hyping what we do up in articles, but I’ll do it here, apparently. :)

Here’s a paper we just should have cited in this post: Sparrow.

https://cs.stanford.edu/~matei/papers/2013/sosp_sparrow.pdf

Motivation: schedule jobs on clusters in response to HTTP queries: ✅.

Deliver sub-second scheduling by relaxing constraints, running many schedulers w/o a complete picture of available resources: ✅.

Optimize scheduling with P2C: ❌ (Sparrow does this, we don’t. We should consider it!)

Run diverse jobs without a single long-running queuing executor (ie, running arbitrary Docker containers): ❌ (Sparrow explicitly doesn’t do this, and we have to.)

@tqbf reminds me of that old Erlang on Xen demo where they tried to build a erlang unikernel. The demo would boot a new vm, reply to http request and shutdown in a no time, back in 2009.
@tqbf The path dependence here is not the idea of pulling the driver out of Nomad, but the fact that earlier in your career you built Stockfighter.