Mastodawn

A recent pub. Links below.

Challenge: The edge cloud is awesome as it promises low latency to clients. But it suffers from small "micro-datacenters".

How can we provide bounded latency, in a multi-tenant, dense edge cloud environment?

What: Edge-RT, published in RTSS, makes contributions in edge systems that want to both maintain high (line rate) throughput *and* strongly bounded, end-to-end request latency processing.

We run thousands of per-client Network Function (NF) chains serving ML inference and control tasks, and meet request deadlines much more effectively than Linux and EdgeOS (which we build on).

How #1: We use end-to-end *packet* scheduling, and NFs inherit the priority of the packets as they are processed. Thus the system keeps its eye on the goal of controlling latency for each request, despite processing across many "processes".

How #2: This only works when we control which packets (with which deadlines) are buffered for NF processing. Too much buffering: NFs execute at an inappropriately high priority. Too little: throughput tanks. We create "deadline-bounded batching"!

How #3: To control the costs of inter-core event notification (e.g. IPIs), and scheduling overheads, we use periodic event processing, and create a new "constant-time Earliest Deadline First" algorithm that is O(1).

Background: We build on EdgeOS that enables memory-dense computation with featherweight processes, and strong isolation properites. We use DPDK for networking, and build everything on our Composite micro-kernel.

Wenyuan Shao is the main researcher and is a fantastic systems hacker. He's likely graduating with his PhD and will be on the job market around Sept. We got the "best student paper" award, which is a testament to all of his hard research.

Paper: https://www2.seas.gwu.edu/~gparmer/publications/rtss22edgert.pdf
Presentation: https://www2.seas.gwu.edu/~gparmer/publications/rtss22edgert_pres.pdf
EdgeOS: https://www2.seas.gwu.edu/~gparmer/publications/atc20edgeos.pdf
Related pubs: https://www2.seas.gwu.edu/~gparmer/pubs.html

Bonus: We empirically demonstrate why eBPF kernel-bypass (DPDK), and Linux kernel deadline scheduling alone are not close to sufficient for this domain. We're proud of the background section, if you want a small intro to some of these technologies.