This is a re-recording of the talk I gave at EGRAPHS 2023, a workshop attached to PLDI in June 2023. Unfortunately the streaming setup in the room was not working,…
At the time we cut over (2+ years ago) compile time was at parity and we saw a small improvement in code quality. Hard to compare because at the same time I built aegraphs I built the integration with ISLE (our DSL for rewrites) so our rewrite database has grown significantly since then.
But there was a *ton* of engineering to get that: tricks so our SSA dataflow is implicitly a trivial aegraph, and "pay-as-you-go" in general. And pervasive single-pass thinking.
More in the talk that was linked above of course; but also I haven't ever formally published all of these tricks (and it's one of those hypothetical papers I'm not sure would make sense or be easy to get past reviewers -- a lot of interlocking design decisions, no single academic insight). At some point I should at least do a mega-blogpost. Someday!
It's possible I'm missing some nuance here too but aegraphs do permit multiple representation (concise version: they keep multi-nodes-per-class and cost-based extraction, and single fixpoint, but canonicalize once at creation and never recanonicalize if a new equivalence "comes in by the side door").
One can still have the blowup inherent in, say, adding commutativity/associativity rules and storing all permutations of a big sum or running into timeouts or limits. In Cranelift we have limits but we've also picked a path where we don't do all this -- because of the ISLE DSL's pattern-match compiler, "rules are cheap and nodes are expensive" so we write multiple versions of a left-hand side if we need to do so. Or, yes, pick a canonical direction to rewrite.
With recent algorithmic improvements and easy-to-use libraries, equality saturation is being picked up for hardware design, program synthesis, theorem proving, program optimization, and more. Existing work on using equality saturation for program optimization makes use of external equality saturation libraries such as egg, typically generating a single optimized expression. In the context of a compiler, such an approach uses equality saturation to replace a small number of passes. In this work, we propose an alternative approach that represents equality saturation natively in the compiler's intermediate representation, facilitating the application of constructive compiler passes that maintain the e-graph state throughout the compilation flow. We take LLVM's MLIR framework and propose a new MLIR dialect named eqsat that represents e-graphs in MLIR code. This not only provides opportunities to rethink e-matching and extraction techniques by orchestrating existing MLIR passes, such as common subexpression elimination, but also avoids translation overhead between the chosen e-graph library and MLIR. Our eqsat intermediate representation (IR) allows programmers to apply equality saturation on arbitrary domain-specific IRs using the same flow as other compiler transformations in MLIR.