furiously clicking through the histories of various microprocessors on wikipedia so i can figure out who to blame for the fact that you can't push/pop a single byte to the stack on the gameboy in an atomic fashion (you can only push/pop register pairs). a fact that led to a stupid bug that took me two days to figure out
(as far as i can tell, the gameboy's SM83 has it because the Z80 has it, and the Z80 has it because the Intel 8080 has it, but I can't figure out what the Intel engineers were thinking when they decided this is how it would work on the 8080. Stanley Mazor wrote about the design process of the 8080 here https://ieeexplore.ieee.org/document/4287219 but on the topic of push and pop he only says "Push and Pop instructions were needed for each of the three register pairs.")

@aparrish

I think it was just a practical convention inherited from those who came before them at IBM and other places.

From the perspective of business machines, it would rarely make sense to push data that's not word-sized data.

Which is still the same rationale ARM, Intel, AMD follow to date.

@aparrish

But you can still do

DI
DEC SP
LD (SP), A
EI

For most intents and purposes indistinct from a theoretical PUSH A

@haitchfive alas the gameboy doesn't let you load directly to SP—you either have to go through HL (LD SP, HL or LD HL, SP+n) or use the PUSH/POP instructions and adjust the stack pointer after with INC SP/DEC SP. (which leads to the problem I encountered: if an interrupt occurs between the POP and the DEC SP, a byte on the stack gets overwritten by the interrupt handler pushing the return address!)

@aparrish Ahhh I see, I didn't know that was unavailable on the SM83, I assumed perfect backwards compatibility.

Can you disable and enable interrupts though, to guard the unsafe parts of the atomic operation?

@haitchfive yeah but at that point i'm using 10+ cycles to push or pop. i'm deciding between guarding with DI/EI (slow) or just always PUSH/POPping two bytes to the stack but only actually using one of them (faster but wastes memory)

@aparrish

Yeah probably the latter.

You might even find uses for the second byte.

@aparrish @haitchfive It can be very frustrating. In trying to optimise a drawing loop recently i wanted to PUSH and POP A, but i needed Flags to be preserved across the POP, so i couldn't use POP AF. I can't remember what i did in the end, but yknow.
@haitchfive @aparrish backwards compatibility with what? The Z80 doesn't have _any_ LD instructions that use (SP); PUSH and POP are, on the Z80, the only way to indirect via SP. (and, FWIW, the only way to "read" SP is to store it in memory: LD SP, (NN) ).

@drj

Compatibility in the sense of supporting the specific snippet I posted earlier.

DI
DEC SP
LD (SP), A
EI
@haitchfive but i don't understand what CPU you think can run that snippet, the Z80 cannot.

@drj Sorry, something more like

DI
EXX
DEC SP
LD HL, 0
ADD HL, SP
LD (HL), A
EXX
EI

I wasn't implying I could deliver a complete Z80 solution on social media, but here we are

@haitchfive Oh yeah, ADD HL, SP i had forgotten about that.
@aparrish @haitchfive if you want to do this byte shenanigans, you have to PUSH then INC; and to reverse you have to DEC then POP. Which i _think_ is interrupt safe, and may even end up restoring to the same 8-bit register (but it trashes the other one in case of interrupt, haha).

@drj

Yeah that feels right, but no, that's machine-specific, not Z80-specific.

@drj @haitchfive alas it is not interrupt safe (which is what caused the annoying bug that took me so long to figure out in the first place!)

@aparrish I think there is an interrupt safe sequence tho. And i think it is DEC then POP. Because you avoid leaving live data below SP (at addresses < SP). In the DEC POP sequence i have illustrated below, L gets trashed, it is loaded with an unpredictable byte from memory.

I do think this is a lot of shenanigans, but i suppose i can see the point if you have a VM or mini-Forth-like that is doing a lot of byte-oriented stack ops. And re timing, i can see your original complaint. Now we are looking at 16/17 clocks instead of what morally should be 7.

(sorry, i got rather nerd-sniped by the problem)

@drj hmmm, thank you for thinking this through! i had push then inc for my "push byte" word but it hadn't occurred to me to do dec then pop for "pop byte" (and in testing I just assumed that interrupts were breaking both of them). i might try this out!
@drj (works like a charm, btw, thank you for being open to the nerd snipe)
@haitchfive yeah that makes sense, since the stack is mainly for 16-bit addresses on these machines. i also imagine it might have been easier to just reuse the mechanisms for loading the value of the (16-bit) stack pointer to the stack with 16-bit register pairs, rather than making specific circuitry for the individual 8-bit registers? (but of course i know nothing about microprocessor design)
@aparrish My hunch is that 16-bit addressing in the Z80 was a big bet, and there was plenty of engineering effort dedicated to making that consistent.

@aparrish @haitchfive
It seems to be a result of the hardware design; registers were layed out and controlled in pairs.

I found it here: https://www.righto.com/2013/03/register-file-8085.html and https://www.righto.com/2014/10/how-z80s-registers-are-implemented-down.html

The 8085's register file reverse engineered

On the surface, a microprocessor's registers seem like simple storage, but not in the 8085 microprocessor. Reverse-engineering the 8085 rev...

@aparrish i have been thinking about the timings of memory stores and fetches on the z80. PUSH and POP are quite efficient in space and clocks; largely because the target address is not in the instruction stream (it's SP). There are no other instructions to store or fetch a 16-bit register to an indirect target (LD HL, (nn) is a _lot_ slower). Hmm, so much for the joy of 16-bit PUSH and POP.
Why no 8-bit PUSH and POP? I suppose if both is not an option, i would rather have only 16-bit ops. And i wonder if at least one 16-bit PUSH and POP is needed for some systems programming reason involving interrupts (one 16-bit op is atomic, but two 8-bit ops would not be).
@aparrish Is it the Texan company that in the late 1960s made a luggable terminal that Intel cribbed their CPU designs from, or am I thinking of the x86?

@acb @aparrish that was Datapoint/CTC, and also it was for the 8008 instruction set.

The 8008 only had a very simple call stack, so it didn't need push/pop instructions. Its addresses were only 14 bits wide.