Spent too long chasing virtual register chains to see that the underlying problem is an ordering issue... phis happen simultaneously, and I can make the emulator cheat less with its upsilon/phi implementation to avoid failing here, but will have to sort out unraveling this for actual code generation. Time to read a bit more, as I'm sure this is a well explored problem.

10 = phi 2, 11
8 = phi 1, 10
5 = phi 3, 15
4 = phi 0, 14

#Projects #Compiler

Just reordering these would solve the issue here, but I suspect you can encounter a situation where two variables end up swapping values, which will require more subtle handling. Now I want to figure out a test case that actually provokes that situation.

edit: definitely easy to provoke with a simple loop that swaps two variables each time around.

@swetland I think the keyword you might want to look for is "parallel move": https://bernsteinbear.com/blog/linear-scan/
Linear scan register allocation on SSA

A linear scan through the history of… linear scan register allocation.

Max Bernstein
@swetland This is a well known problem though. For instance here is our test case for it in metajit.cpp: https://github.com/can-lehmann/metajit.cpp/blob/main/tests/test_cfg.cpp#L127 (block params instead of phis, but fundamentally the same thing)
metajit.cpp/tests/test_cfg.cpp at main · can-lehmann/metajit.cpp

A meta-tracing JIT. Contribute to can-lehmann/metajit.cpp development by creating an account on GitHub.

GitHub

@CanLehmann I've been playing with phi/upsilon "pizlo form" as it seems like a reasonable way to represent things.

(https://gist.github.com/pizlonator/79b0aa601912ff1a0eb1cb9253f5e98d)

Pizlo SSA Form (short version)

Pizlo SSA Form (short version). GitHub Gist: instantly share code, notes, and snippets.

Gist
@swetland Oh interesting! Its seems kind of like block args but doing the arg passing using upsilon instructions and the argument declarations with the phi instructions. The fact that it allows upsilon/phis in the middle of blocks is pretty neat.
@swetland It seems like the ability to have phis/upsilons in the middle of BBs might be a bit of a pain to handle in codegen though, isn't it?
@CanLehmann I'm just getting to register allocation and code-gen, so I guess I'm about to find out! (though currently I always generate phis at the top of and upsilons at the bottom of basic blocks)

@swetland Will be interesting to see how it works out. I think the trivial thing is to implement the semantics directly by having two regs per Phi X, X' and then

upsilon(A, ^X) => X' = A
X = phi() => X = X'

That should implement the parallel move semantics correctly, but is quite wasteful, so you probably don't actually want to do it. My fear is just that you can get weird cases where phis and upsilons are mixed in the same sequence. The upsilon for a phi also does not have to be in a...

@swetland predecessor basic block. So you could get stuff like

bb1:
upsilon(^X)
branch bb2, bb3
bb2:
jump bb4
bb3:
jump bb4
bb4:
X = phi()

and it should still resolve.

@swetland This is certainly interesting to explore though. Maybe there is a nice solution to all of this, I just don't know.
@swetland Maybe it works out if you actually do the two regs per phi approach and aggresively fold mov instructions. I.e. if possible eliminate movs by allocating the same physical register for source and target registers.
@CanLehmann Amusingly, I sort of started that way by accident since I was generating a (superfluous) move instruction on every store to a local. Once I elided those and started further discarding trivial upsilon/phi pairs, some test cases started breaking, which lead me to realizing that ordering was going to be an issue.
@CanLehmann Yeah, it's been fun. The whole project has been a mix of "I've built a lot of frontends, interpreters, bytecode runtimes, etc, but never a full to-the-metal compiler" and "I'd like to build a fully self-hosted compiler". Part 2 is pretty much there... stage0 is a transpiler to C written in C, stage1 is written in itself and compiles to unoptimized straight-line risc-v-ish assembly, stage2 is a rebuild of that inside an emulator, and now stage3 is working on a "real" backend...
@swetland Getting it selfhosting is a really cool milestone! I remember the last time I did a self-hosting compiler (for some LISP-like thing) I was really annoyed at having to fix all these bugs in the non-self-hosting compiler, because I would much rather write code in my own language 😆 I think the bytecode interperter in between is maybe a good approach to circumvent this issue a bit, since it allows you to leave behind the initial compiler earlier.
@CanLehmann Yeah, and it keeps everything as instructions that fit within 2-or-3-arg form (vs traditional phis with arbitrary argument lists) which seems like a nice property, though I'm still using a traditional looking phi sort of thing while assembling basic blocks at the moment... not sure if I can collapse directly to placing phi/upsilon instructions easily yet.
@CanLehmann Awesome. That definitely is the issue I'm looking at.