so i have been writing a small vm. i started off with wingo's garbage collector and started laying everything out.
one thing that i figured would make a big difference to performance is to cut allocations by having immediates - values that aren't hidden behind a pointer. so to support these, i do clever things to steal bits and let you get away with it. you save the allocation cost and you save the gc tracing cost.
it probably should have dawned on me earlier that in order to make this fast, you still need a certain amount of compiler infrastructure, otherwise you'll be repeatedly checking tags everywhere. so of course i need to integrate a JIT of some kind. and this is where things get tedious.
llvm is arguably the natural answer, except it's huge. but perhaps the clincher is that you basically need to understand the ABI for every platform you support, because it doesn't handle this stuff for you.
i've looked at a few things now, but none of them are quite right. most of them aren't really embeddable in c and the ones that are seem to not be great for my purposes (e.g. i will need to teach them about simd intrinsics they're missing and it's not obvious they're even built to support more than a common subset).
so i'm actually considering the thing i've been avoiding for years, building a JIT. 😬