Soooo, turns out the variable doesn't get set/used correctly by sshd for some reason??? So disabling the hardware acceleration didn't actually fix it, it seems I just got very lucky at that moment.
What *does* reliably work, though, is forcing ssh to use only a specific processor. CPUs 0 and 1 are consistently broken, the others CPU work just fine. So it *is* a CPU issue.
I uploaded up the script this, in case you want it to test if your computer is affected (needs passwordless access to root via ssh):
https://0x0.st/8w6y.sh
Digging down further, it seems the issue is indeed with the encryption: chacha20-poly1305 and aes{128,256}-gcm are broken, other ones work fine. So at least I have a workaround now for any ssh issue I encounter: Restrict the ciphers.
Since OpenSSL doesn't offer a way to run these algorithms in the shell (related? maybe, idk), I grab one of the examples python-cryptography (which uses OpenSSL) and adapt it for testing:
https://0x0.st/8wIR.py
Turns out encrypting the same data with the same key material twice in a row can have different results. Uhh... that definitely shouldn't be the case.
Now I got something to test without running ssh. Setting the magic variable OPENSSL_ia32cap I can disable the hardware acceleration again. By toggling different capability bits, I figure out the thing that was causing issues is AVX2.
Sooo... the issue is definitely the first CPU core, with both threads affected, the AVX2 instructions broken in *some* way, leading to mangled encrypted data which leads to connection issues & silent data corruption with scp.
For the record, the affected CPU is an "AMD Ryzen 7 PRO 5850U", as included in a Lenovo T14 Gen 2 (Type 20XL). So if you have a T14 Gen 2... you may want to check the scripts.