Mastodawn

I spent some time looking at GBA stuff after posting about it a few days ago. It's been so long since I've touched ARM32 that I'd forgotten the insane shit you can do in one instruction, e.g. LDMEQFD SP!, {R0, R2-R5, PC}.

Show thread

Nick Ludlam 19h ago

@pervognsen Is that still technically RISC? Or has RISC just shifted its baseline because of how execution architecture has matured?

Show thread

Per Vognsen 19h ago

@nick I'm pretty sure that LDM would have worked as-is with the original ARM1 instruction set so this was there in the beginning. ARM has never been RISC in any meaningful sense. It's a load/store architecture with a bunch of GPRs but that's about it. I guess if you wanted to be snide, you could say that it shares in the earliest RISC tradition of shipping parts of your microarchitecture as the ISA (barrel shifter, predication, etc) like MIPS did with branch delay slots and imprecise exceptions.

Show thread

Fabian Giesen 19h ago

@pervognsen @nick FWIW the string-ish multi-loads were in early POWER as well and that was definitely sticker-label RISC

Show thread

Per Vognsen 19h ago

@rygorous @nick The pièce de résistance is combining it with predication and the PC as a pseudo-GPR. Now we're cooking.

Show thread

Fabian Giesen 19h ago

@pervognsen @nick reference on early POWER multi-loads https://bitsavers.org/pdf/ibm/IBM_Journal_of_Research_and_Development/341/ibmrd3401E.pdf pp. 7-10 starting with "The RS/6000 architecture has adopted the following strategy for dealing with misaligned data."

Load-multiple section starts. on p. 9 "Another aspect of including string operations..."

Show thread

Fabian Giesen 19h ago

@pervognsen @nick I will say that they are IMO bang on the money here on _all_ counts - calling out that

a) mem copies/string copies etc. are important and usually unaligned
b) Alpha-esque "we give you a way to do SWAR loops for this" only gets you so far,
c) for load/store multiple, that function prologues/epilogues are the key use case

other ISAs have struggled to learn that lesson 30 years later...

Show thread

Wolf480pl 18h ago

@rygorous @pervognsen @nick

> The architecture allows for the partial
completion of an operation and thegeneration of an
alignment-check interrupt when the datacrosses a cache-
line boundary. System softwarecan then complete the
instruction by fixing up the affected registersor memory
locations.

this has EINTR vibes

Show thread

Fabian Giesen 18h ago

@wolf480pl @pervognsen @nick also how REP MOVS/STOS, the new ARM mem block copies/sets, ARM SVE loads/stores (first fault lane!) etc. work! (At page not cache line level)

Show thread

Tom Forsyth 17h ago

@rygorous @wolf480pl @pervognsen @nick And gather/scatter 🙂

Show thread

Fabian Giesen 17h ago

@TomF @wolf480pl @pervognsen @nick well they don't actually work so.... (ever since GDS)

Show thread

Tom Forsyth 16h ago

@rygorous @wolf480pl @pervognsen @nick Oh, I had not kept up to date with this. Fun!

Show thread

Fabian Giesen 16h ago

@TomF @wolf480pl @pervognsen @nick I mean the instructions are still there but they just bail into full microcode fallback now

Show thread

Tom Forsyth 14h ago

@rygorous @wolf480pl @pervognsen @nick I'm a little surprised these cores don't have a segregated mode on a chicken bit for all their register files by now. How many bugs of essentially the same format is this now?

Show thread

Fabian Giesen 14h ago

@TomF not nearly as many as there are different named exploits, a lot of them were Intel doctoring around on symptoms because the real underlying issue was a fundamental problem with the cache access path design that was unfixable without a major uArch rev

Show thread

Fabian Giesen 13h ago

@TomF specifically the Spectre stuff (which boils down to data-dependent branches cause data to leak into branch history) was exploitable ~everywhere, on every uArch and every ISA, and arguably not really Intel's fault, it's a fundamental issue with speculation.

The thing that really reamed Intel, Meltdown/L1TF and friends, was an unforced mistake in their L1 access path design.

Show thread

Fabian Giesen 13h ago

@TomF Namely, everyone else either does privilege checks up front, or at most did them in parallel with the access path and made sure to mux in 0 on the data returns in case of privilege check failure.

Intel did the privilege checks in parallel/late and makes the instruction raise an exception on retirement, but did forward the actual privileged data (that you weren't supposed to be able to read) onwards to dependent insns regardless.

Show thread

Fabian Giesen 13h ago

@TomF As for GDS, I am really surprised that all the Spectre-era exploits apparently did not cause Intel to do an internal audit of all speculative state and see if it might leak to attackers.

I am not surprised that the bug exists in Skylake/SKX era uArchs, and it would be totally fine if Intel found this in a post-Spectre security audit but kept quiet about it until it was discovered externally or similar, but it doesn't look like that's what happened.

Show thread

Fabian Giesen 13h ago

@TomF Instead, from the response (and the fact that it affects many post-SKL uArchs), the likely conclusion is that they still hadn't gone over all shared and potentially security-sensitive state in the memory access path with a fine-toothed comb by 2023, 5 years after Meltdown, which is disappointing to say the least.

Show thread

Fabian Giesen 13h ago

@TomF one would assume that by the third time you step on that particular rake, you maybe start looking for these issues on your own and try to prevent them even if someone hasn't fed you a PoC exploit yet