@pervognsen @nick reference on early POWER multi-loads https://bitsavers.org/pdf/ibm/IBM_Journal_of_Research_and_Development/341/ibmrd3401E.pdf pp. 7-10 starting with "The RS/6000 architecture has adopted the following strategy for dealing with misaligned data."
Load-multiple section starts. on p. 9 "Another aspect of including string operations..."
@pervognsen @nick I will say that they are IMO bang on the money here on _all_ counts - calling out that
a) mem copies/string copies etc. are important and usually unaligned
b) Alpha-esque "we give you a way to do SWAR loops for this" only gets you so far,
c) for load/store multiple, that function prologues/epilogues are the key use case
other ISAs have struggled to learn that lesson 30 years later...
> The architecture allows for the partial
completion of an operation and thegeneration of an
alignment-check interrupt when the datacrosses a cache-
line boundary. System softwarecan then complete the
instruction by fixing up the affected registersor memory
locations.
this has EINTR vibes
hmm are there any Unix syscalls that can partially happen and then return EINTR? I guess not... read() and write() can partially complete but then they return a length, and you don't get to know if it was short because of a signal...
so it looks like IBM's string instructions requiring "fixing up registers or memory" is even worse