Mastodawn

I spent some time looking at GBA stuff after posting about it a few days ago. It's been so long since I've touched ARM32 that I'd forgotten the insane shit you can do in one instruction, e.g. LDMEQFD SP!, {R0, R2-R5, PC}.

Show thread

Nick Ludlam 18h ago

@pervognsen Is that still technically RISC? Or has RISC just shifted its baseline because of how execution architecture has matured?

Show thread

Per Vognsen 17h ago

@nick I'm pretty sure that LDM would have worked as-is with the original ARM1 instruction set so this was there in the beginning. ARM has never been RISC in any meaningful sense. It's a load/store architecture with a bunch of GPRs but that's about it. I guess if you wanted to be snide, you could say that it shares in the earliest RISC tradition of shipping parts of your microarchitecture as the ISA (barrel shifter, predication, etc) like MIPS did with branch delay slots and imprecise exceptions.

Show thread

Fabian Giesen 17h ago

@pervognsen @nick FWIW the string-ish multi-loads were in early POWER as well and that was definitely sticker-label RISC

Show thread

Per Vognsen 17h ago

@rygorous @nick The pièce de résistance is combining it with predication and the PC as a pseudo-GPR. Now we're cooking.

Show thread

Fabian Giesen 17h ago

@pervognsen @nick reference on early POWER multi-loads https://bitsavers.org/pdf/ibm/IBM_Journal_of_Research_and_Development/341/ibmrd3401E.pdf pp. 7-10 starting with "The RS/6000 architecture has adopted the following strategy for dealing with misaligned data."

Load-multiple section starts. on p. 9 "Another aspect of including string operations..."

Show thread

Fabian Giesen 17h ago

@pervognsen @nick I will say that they are IMO bang on the money here on _all_ counts - calling out that

a) mem copies/string copies etc. are important and usually unaligned
b) Alpha-esque "we give you a way to do SWAR loops for this" only gets you so far,
c) for load/store multiple, that function prologues/epilogues are the key use case

other ISAs have struggled to learn that lesson 30 years later...

Show thread

Wolf480pl 17h ago

@rygorous @pervognsen @nick

> The architecture allows for the partial
completion of an operation and thegeneration of an
alignment-check interrupt when the datacrosses a cache-
line boundary. System softwarecan then complete the
instruction by fixing up the affected registersor memory
locations.

this has EINTR vibes

Show thread

Per Vognsen 17h ago

@wolf480pl @rygorous @nick Regarding EINTR vibes, this is also true with something like REP MOVSB at page boundaries if there are soft faults. Or interrupts for that matter, but it happens even with exceptions, analogous to the cache line case.

Show thread

Wolf480pl 17h ago

@pervognsen @rygorous @nick

hmm are there any Unix syscalls that can partially happen and then return EINTR? I guess not... read() and write() can partially complete but then they return a length, and you don't get to know if it was short because of a signal...

so it looks like IBM's string instructions requiring "fixing up registers or memory" is even worse

Show thread

Wolf480pl 17h ago

@pervognsen @rygorous @nick
but my point was more about "it's an edge case we don't want to handle, let's create a new edge case one layer up and let those folks handle it"

Show thread

Fabian Giesen 17h ago

@wolf480pl @pervognsen @nick check out how MIPS handles exceptions triggered from branch delay slots one day :P

Show thread

Fabian Giesen 17h ago

@wolf480pl @pervognsen @nick (you can't just save the address of the faulting instruction and resume there, because if it's in a branch delay slot, now you end up falling through the branch instead of executing it)

Show thread

Fabian Giesen 17h ago

@wolf480pl @pervognsen @nick likewise why are MIPS k0 and k1 registers "reserved for the kernel"? Can't the kernel save its own regs when it needs to? And why do they need to be reserved all the time, can't they just be reserved around syscalls or something? :P

Show thread

Wolf480pl 16h ago

@rygorous @pervognsen @nick
Hmm so upon an exception, a MIPS CPU only:
- disables interrupts
- saves PC in EPC
- fills the Cause register
- jumps to a hard-wired address
?

So it doesn't save any of the GPRs for you, and unlike in ARM, there is no separate copy of a subset of registers for each type of exception?

Show thread

Wolf480pl 16h ago

@rygorous @pervognsen @nick

So to save a register, you need an address to save it to. I'm guessing on MIPS you don't get to put a literal address in the store instruction.

So you need to put the address in a register.

Some other CPUs may save the stack pointer for you, and replace it with one defined in the exception vector. But not MIPS.

So you will have to clobber one of the user's registers to build an address to save the registers to.

Hence k0.