my student Seong-Heon talking about one of our new projects at NYU!
(We’re at NJPLS today, come say hi)
| Website | https://cs.nyu.edu/~shw8119/ |
| Blog | https://shwestrick.github.io/ |
| MaPLe | https://github.com/mpllang/mpl |
| https://twitter.com/shwestrick |
my student Seong-Heon talking about one of our new projects at NYU!
(We’re at NJPLS today, come say hi)
in TypeDis (conditionally accepted at POPL!), we develop a type system for enforcing **disentanglement** statically at compile-time. This project was led by Alexandre Moine here at NYU, in close collaboration with Stephanie Balzer at CMU.
Disentanglement is all about parallel GC. It decomposes the heap into pieces that can be traced independently, in parallel, with no synchronization.
How does disentanglement work?
The idea is (1) give every task its own local heap, and (2) enforce that the heaps of any two _concurrent_ tasks never directly point to each other, i.e., the heaps are "disentangled".
We've been exploring this idea over the past ~10 years in MaPLe (https://github.com/mpllang/mpl). Today, in MaPLe, disentanglement is checked and managed dynamically, and we've put a ton of work into ensuring that this dynamic management cost is nearly zero (see https://dl.acm.org/doi/10.1145/3591284 and https://dl.acm.org/doi/10.1145/3547646)
However, nearly-zero cost only holds if your program is disentangled; as soon as a violation is detected, the GC has to start synchronizing across concurrent tasks and we're back to the tricky world of reasoning about the synchronization overheads of GC.
So, the question is: Can we reason about disentanglement _statically_, at the source level, and enforce it automatically at compile-time? In other words: can we statically guarantee that the GC remains fully parallel, and never has to synchronize across concurrent tasks?
Yes! Enter TypeDis. Similar to region types, TypeDis associates a task identifier or "timestamp", δ, with every allocation. E.g. the type string@δ indicates that the string was allocated at time δ. Unboxed types don't need these annotations (e.g. raw integers, booleans, etc).
At compile-time, TypeDis computes a partial order over timestamps (derived from the nested fork-join structure of parallel tasks) and enforces that all pointers must flow _backwards_ in time. Thinking about a tree of tasks (where parents fork into their children), this corresponds to an **up-pointer invariant**: the data of child tasks can only contain pointers to ancestor data. In this way, you can never have pointers between concurrent siblings/cousins/etc.
One of the surprising things we discovered is that this "backwards-in-time" / "up-pointer invariant" concept can be elegantly encoded in the type system as a form of subtyping that we call subtiming. The whole system is essentially just standard unification + subtiming.
Of course, there are tons of juicy details and intricacies that cannot fit into (even this many) tweets. Take a look at the preprint to see all the nitty gritty stuff!
https://cs.nyu.edu/~am15509/publications/typedis.pdf
We got really fantastic feedback during reviews (thank you POPL reviewers!) and we're looking forward to updating this preprint. Stay tuned for the final paper.
parallel Boyer-Moore majority selection is a nice bit of code; here it is in MaPLe.
this algorithm seems to be folklore -- the original Boyer-Moore algorithm is sequential, but I've found at least two mentions of the parallel algorithm in the wild.
Oleg Kiselyov discusses its correctness here:
https://okmij.org/ftp/Algorithms/map-monoid-reduce.html#BM
and, Adam Blank alludes to the parallel algorithm on the bottom of page 6 in this set of lecture notes:
https://courses.cs.washington.edu/courses/cse332/17wi/sections/08/section08-solutions.pdf
so this leaves me wondering -- has this parallel Boyer-Moore algorithm always been folklore? Is there a definitive reference for it?
when I first moved to NYC I started making two lists — places to go, and places I’ve been 🚶🏼♂️
now, almost exactly a year later, I’ve been to ~250 places and counting
love this city… so much to see
trying my hand at FPGA programming using Clash, an HDL based on Haskell.
Starting with something _very_ simple (just a 7-segment display that increments every second), but still, it’s satisfying to actually get something running.
Source code here: https://github.com/shwestrick/basys-3-clash-counter-example
Clash is interesting because it’s both high-level and very low-level at the same time. You can get a lot of power out of the Haskell ecosystem, but at the end of the day, you’re still reasoning at the RTL level
(at least, this is my impression after just a few hours with Clash)
this semester, I've been using MaPLe to teach parallel programming in a special topics course at NYU. It's going well--students learn about parallel algorithm design and can immediately implement and test their ideas without too much headache
a brief history of MaPLe by pubs: