EGRAPHS talk yesterday on Cranelift's (a)egraphs mid-end was great fun; slides are now up [0]! Unfortunately the streaming setup was broken in our workshop room, so no recording exists (sorry!).

Thanks again to @mwillsey, @ztatlock and the rest of the egraphs-workshop crew for inviting me. Was fantastic to finally meet some collaborators and other compiler folks in person, too; only regret is not having more time to catch up!

[0] https://cfallin.org/pubs/egraphs2023_aegraphs_slides.pdf

@cfallin @mwillsey @ztatlock Damn, I really wish there was a recording. The slides look like it would've been a very enjoyable watch.

@cfallin @mwillsey @ztatlock Okay, the slides were definitely also informative by themselves.

One thing I struggled with when I tried to read the previous materials about acyclic e-graphs (mostly the cranelift RFC), is how you can avoid rebuilding the congruence closure without having to arbitrarily pick one of the representatives for the hash-consing table in some rare situations. I think I got it now, though.

If I understand correctly:

  • you do away with any invariants of the form “the hash-consing table always refers to the latest version of an e-class”
  • instead, you have the invariant that the hash-consing table points to the version of the e-class that includes exactly those nodes that would’ve been rediscovered if you re-ran the rewrites from scratch on this particular representation
    • there is one exception to this: rules like (sub x x) => 0 can use information from the union-find that has been discovered “accidentally” when simplifying earlier expressions
  • you continue maintaining the union-find data structure like a traditional e-graph, but you avoid ever looking anything up by the canonical id assigned by the union-find, instead only ever checking whether two particular nodes are equivalent

Does this sound right? If so, it is interesting how you recovered some of the directionality of the rewrites that e-graphs usually lack.

Also, have you considered the possibility of some limited normalization on the e-nodes before looking them up in the hash-cons? I’m mostly thinking of stuff like: for commutative operators, make sure the LHS has a lower e-class id. This seems like a good way of merging a + b with b + a without having to introduce full commutativity into the rewrites.

@mei @ztatlock @mwillsey yes, I want to make an actual recording sometime; it's too bad the room equipment wasn't working!

I think you've got it basically right. We add hashcons entries after all rewrites for all the nodes we progressed through in the rewrite chain, pointing to the union of all. But if another node unions onto the eclass later, we miss that. Eager rewriting though so generally OK.

@mei @ztatlock @mwillsey

There is some complexity around how we do the lookup: we have a custom hash and equality impl that uses the union-find. Some reasoning about that here: https://github.com/bytecodealliance/wasmtime/issues/6126

Cranelift: GVN depends on value definition order matching visit order · Issue #6126 · bytecodealliance/wasmtime

If the input CLIF does not define values in increasing numeric order, then the egraph GVN map may not recognize that two instructions are the same. .clif Test Case test optimize precise-output set ...

GitHub
@mei @ztatlock @mwillsey In brief the idea is that the hash-key actually does change as the union-find is updated, but we don't re-hash; but this is ok (I think) because of the specific hashtable impl that the RawTable API implies. The worst that happens is we miss some hits unnecessarily, in the case that something else does union on later and we would have picked it up.
@mei @ztatlock @mwillsey incidentally, my work wasn't the only at the workshop to move back toward directionality a bit -- Yihong Zhang's talk about tree automata (blogpost: https://effect.systems/blog/ta-completion.html) introduced a weaker-than-eqsat formalism as well. There's probably something more to the overlap; I haven't had the chance to explore it more though!
Ensuring the Termination of EqSat over a Terminating Term Rewriting System