so i can't directly map paired singles to altivec ops but altivec gives me 32 free registers for my emitted instructions to manipulate. fair, i guess. let's see if i can do 1:1 regalloc lol

as there is no way of directly moving fprs <-> vectors i will have to regularly load and store values to do the swap. if i manage to only talk to cache during those operations i think it can be not that slow

the main culprit is the paired single instructions with Rc=1 that will get me to update condition register aaaaaaaa idk how much load/store this will cost (fortunately it seems rare enough that Rc is set for paired singles?)

would have been neat to work with dcbi to avoid completely the ephemeral cacheline to be written to memory but it seems that it's usually unimplemented and supervisor-only anyway :(