also making register machine interpreters is so easy. like, what's a register? well that's something that holds a value! how many registers can i have? well its a VM so as many as i like, so every named variable has its own register! where do i put these registers? well i have this nifty thing called a stack that grows and shrinks where i can simply allocate space for all the "registers" (read: variables) a function uses

and it works way faster than a stack machine because you're not constantly doing multiple pushes and pops just to do a simple addition

@eniko This post got stuck in my head for a while (I wanna stop doing what I'm doing and make a VM too 😔) and I just started wondering: how do you handle temporaries?

Stack machines are cute, everything is by default a "temporary", but with a register machine you have to pass a register name/idx to ops, right? So 'a = a + (b + c)' needs to do _t1 = b + c, then a = a + _t1, right? (assuming 3op)

Do you put the logic in the compiler to figure out the lifetimes of temps and how many you need?

@mort so in this case i'm not doing that since i'm basically doing a type of assembly, so the user is explicit about where everything goes and there are no temporaries*

but i have done this before and it's not so bad. temporaries are single-use, so it's easy to "reuse" a slot for a temporary when the temporary has been spent, which is basically whenever it gets read from. at that point you can return it to the pool of available temporaries

this also lets you optimize. imagine `a = b + 2` results in:

temp0 = b + 2
a = temp0

because you know `temp0` is temporary and can only be read once you can read back up through the instructions and replace the initial assignment to `temp0` with an assignment to `a`, if `a` is not read from in between the assignment to `a` and the initial assignment of the temporary

in fact you can do this for non-temporaries too although then you have to make sure there are no reads for both variables. this is where the real power of a register machine comes from

@mort because you can condense what is a stack machine's

push b
pushimm 2
add
pop temp0
push temp0
pop a

into a single operation

addimm a b 2

@mort
* this isn't entirely true because in my lisp i will quietly assign literals to "temporary" stack slots so i dont have to have a bunch of different variations of the same opcodes. so the compiler actually transforms `setf $tblfoo "myfield" 5` to

setstr $const_str_myfield "myfield"
setimm $const_int_5
setf $tblfoo $const_str_myfield $const_int_5

but i don't have to worry about reuse here because they're constants so every time they're used they'll point to the same hidden variable

@mort sorry if this is kind of a lot to take in all at once, if anything's unclear or you have follow up questions let me know

@mort also consider that temporaries in a register machine do follow the same pattern as a stack machine. take `a = (b + 1) / 3` for example. in a stack machine this is

push b
push 1
add
push 3
div
pop a

in a register machine its

mov t0 b
mov t1 1
add t0 t0 t1

the `add` consumes t0 and t1 (pops them off a stack of temporaries) so the top of the stack of temporaries is t0, and thats why it adds into t0 instead of t2. then:

mov t1 3
div t0 t0 t1

same thing here, t0 and t1 are used up which shrinks the "stack" so store into t0. then

mov a t0

@mort if i put them side-by-side its easy to see the similarity

mov t0 b ; push b [b]
mov t1 1 ; push 1 [b][1]
add t0 t0 t1 ; add [b+1]
mov t1 3 ; push 3 [b+1][3]
div t0 t0 t1 ; div [(b+1)/3]
mov a t0 ; pop a []

but the cool thing about register machines is that you can go back up and start replacing temps with the original values so this shrinks down to

mov t1 1
add t0 b t1
mov t1 3
div a t0 t1

or if you have support for immediate values

add t0 b 1
div a t0 3

@eniko Ah, I think the thing I was overlooking is that the state of the runtime stack in the stack machine case is statically known. So instead of emitting a 'push' instruction, I can increment a compile time counter, and when emitting an instruction which would've popped the stack of a stack machine, I can decrement the counter.

I guess I'm making a language with a register based VM soon 🤷 Combined with some techniques I wrote about it https://mort.coffee/home/fast-interpreters/ it could turn out pretty fast.

Faster virtual machines: Speeding up programming language execution - Mort's Ramblings

@eniko Good news: I gave in and made a VM! Not quite a "register VM", but each stack frame allocates a certain amount of stack space and the bytecode deals with indexes into that stack. I think I can make a compiler which generates great code out of this too.

I wanna get on to codegen!

... but I have to write a parser first 😭😭😭

Honestly writing a parser isn't even that bad, it's writing my 1000000th buffered reader implementation that irks me, it requires so much thinking and getting it wrong is so easy and thinking of the right test cases for the unit tests is annoying and it's gonna make me set up a system for unit testing and I'm gonna get side tracked writing my tenth first 80% of a unit testing framework

You know what? I'm gonna write this compiler in Raku (Perl 6).

Why can't every language have first-class grammars‽)

#perl #raku #PLDev

@mort You can get pretty far with #Perl 5* #RegularExpressions. Here's @perlancar’s #CPAN module based on @randalschwartz’s minimal #JSON parser as a single #regex: https://metacpan.org/pod/JSON::Decode::Regexp

Full docs on conditional #regexes, including the special `(DEFINE)` form that merlyn used: https://perldoc.perl.org/perlre#condition-yes-pattern-no-pattern

* #RakuLang hasn't been called #Perl6 for four years now. You're deadnaming the language.

JSON::Decode::Regexp - JSON parser as a single Perl Regex - metacpan.org

JSON parser as a single Perl Regex