So who does e-graphs for equality saturation in a real compiler? I want to hear about performance impact both to the compiler and the resulting program. @regehr maybe?
@j2kun @regehr Cranelift's aegraphs aren't conventional egraphs but that might be the closest thing I can think of. https://docs.rs/cranelift-egraph/latest/cranelift_egraph/
cranelift_egraph - Rust

ægraph (aegraph, or acyclic e-graph) implementation.

@j2kun @regehr There's talk by @cfallin at https://vimeo.com/843540328 with slides at https://cfallin.org/pubs/egraphs2023_aegraphs_slides.pdf. He might be able to weigh in on some of the experience report elements you're asking about (although the talk does cover some of that).
ægraphs: Acyclic E-graphs for Efficient Optimization in a Production Compiler

This is a re-recording of the talk I gave at EGRAPHS 2023, a workshop attached to PLDI in June 2023. Unfortunately the streaming setup in the room was not working,…

Vimeo
@j2kun @regehr @cfallin I'm not sure if it would qualify as an egraph to most people since I've always seen their appeal (despite the fundamental issue of combinatorial blow-up with any kind of congruence closure/saturation loop) as tackling the hard cases of symmetric rewrites, which aegraphs punt on. So depending on what you want to use egraphs for, it may or may not be applicable.
@j2kun @regehr @cfallin But it does address some of the classical issues with coarse-grained pass ordering fixed points by doing hash consing (GVN) fused with rewrites. You can do very similar things with e.g. LuaJIT's FOLD engine at https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_opt_fold.c#L33 or my take with smart constructors and tail calls rather than LuaJIT's table-driven trampoline at https://gist.github.com/pervognsen/96d116fff14d95ffe51cf084c8604d64. Cranelift's integration with mixed declarative/procedural rewrite rules is very cool, though.
LuaJIT/src/lj_opt_fold.c at v2.1 · LuaJIT/LuaJIT

Mirror of the LuaJIT git repository. Contribute to LuaJIT/LuaJIT development by creating an account on GitHub.

GitHub
@pervognsen @j2kun @regehr Yes to all of the above: we do "egraphs in name and halfway in spirit" in the sense that we address the pass ordering problem with the One True Fixpoint Loop, but we don't "go back and recanonicalize" ever -- with lots more nuance about why that's generally OK for the sorts of rewrite rules that we use.

@pervognsen @j2kun @regehr

At the time we cut over (2+ years ago) compile time was at parity and we saw a small improvement in code quality. Hard to compare because at the same time I built aegraphs I built the integration with ISLE (our DSL for rewrites) so our rewrite database has grown significantly since then.

But there was a *ton* of engineering to get that: tricks so our SSA dataflow is implicitly a trivial aegraph, and "pay-as-you-go" in general. And pervasive single-pass thinking.

@pervognsen @j2kun @regehr

More in the talk that was linked above of course; but also I haven't ever formally published all of these tricks (and it's one of those hypothetical papers I'm not sure would make sense or be easy to get past reviewers -- a lot of interlocking design decisions, no single academic insight). At some point I should at least do a mega-blogpost. Someday!

@cfallin @j2kun @regehr I use something very similar and agree it works great. But unless I'm missing something about aegraphs (the core idea in a different guise seems so familiar to me that it's very possible I overlooked something) they don't address the issue of having to play kingmaker by choosing particular canonical forms, which egraphs promised to do away with. Of course that comes with all the attendant costs of egraphs as well. Which is why I don't use egraphs. :)

@pervognsen @j2kun @regehr

It's possible I'm missing some nuance here too but aegraphs do permit multiple representation (concise version: they keep multi-nodes-per-class and cost-based extraction, and single fixpoint, but canonicalize once at creation and never recanonicalize if a new equivalence "comes in by the side door").

@pervognsen @j2kun @regehr

One can still have the blowup inherent in, say, adding commutativity/associativity rules and storing all permutations of a big sum or running into timeouts or limits. In Cranelift we have limits but we've also picked a path where we don't do all this -- because of the ISLE DSL's pattern-match compiler, "rules are cheap and nodes are expensive" so we write multiple versions of a left-hand side if we need to do so. Or, yes, pick a canonical direction to rewrite.

@cfallin @j2kun @regehr Thanks, I feel like I'm the one missing some nuance there. I need to grab lunch but will try to work through an example to see what if anything goes wrong.
@pervognsen @j2kun yeah I don't know of much besides Cranelift that has found its way into production, although someone I was talking to was mentioning something in LLVM, perhaps gvn or newgvn, that has something more or less equivalent
@j2kun @regehr cranelift is the main production level one. You may find this interesting https://arxiv.org/abs/2505.09363 pushing towards an egraph mlir dialect. Some other things to look at https://github.com/philzook58/awesome-egraphs I think some existing database query optimziers do something egraph like?@mwillsey
eqsat: An Equality Saturation Dialect for Non-destructive Rewriting

With recent algorithmic improvements and easy-to-use libraries, equality saturation is being picked up for hardware design, program synthesis, theorem proving, program optimization, and more. Existing work on using equality saturation for program optimization makes use of external equality saturation libraries such as egg, typically generating a single optimized expression. In the context of a compiler, such an approach uses equality saturation to replace a small number of passes. In this work, we propose an alternative approach that represents equality saturation natively in the compiler's intermediate representation, facilitating the application of constructive compiler passes that maintain the e-graph state throughout the compilation flow. We take LLVM's MLIR framework and propose a new MLIR dialect named eqsat that represents e-graphs in MLIR code. This not only provides opportunities to rethink e-matching and extraction techniques by orchestrating existing MLIR passes, such as common subexpression elimination, but also avoids translation overhead between the chosen e-graph library and MLIR. Our eqsat intermediate representation (IR) allows programmers to apply equality saturation on arbitrary domain-specific IRs using the same flow as other compiler transformations in MLIR.

arXiv.org
@sandmouth @regehr @mwillsey I know Tobias but somehow I missed this paper 😅
@j2kun @regehr Other folks have covered this pretty well! Basically, egg-style search and saturate is far too expensive for most general purpose compilers. It has found some use in domain specific situations, or when smart people like @cfallin go make their own version specialized to their performance needs!