I think it was just a practical convention inherited from those who came before them at IBM and other places.
From the perspective of business machines, it would rarely make sense to push data that's not word-sized data.
Which is still the same rationale ARM, Intel, AMD follow to date.
But you can still do
DI
DEC SP
LD (SP), A
EI
For most intents and purposes indistinct from a theoretical PUSH A
@aparrish I think there is an interrupt safe sequence tho. And i think it is DEC then POP. Because you avoid leaving live data below SP (at addresses < SP). In the DEC POP sequence i have illustrated below, L gets trashed, it is loaded with an unpredictable byte from memory.
I do think this is a lot of shenanigans, but i suppose i can see the point if you have a VM or mini-Forth-like that is doing a lot of byte-oriented stack ops. And re timing, i can see your original complaint. Now we are looking at 16/17 clocks instead of what morally should be 7.
(sorry, i got rather nerd-sniped by the problem)