Microsoft says “Prism” translation layer does for Arm PCs what Rosetta did for Macs
Microsoft says “Prism” translation layer does for Arm PCs what Rosetta did for Macs
S0 standby is borderline unusable on many PCs. On Apple silicon macs it’s damn near flawless.
My current laptop is probably the last machine to support S3 standby and I do not look forward to replacing it and being forced back into a laptop that overheats and crashes in my backpack in less than 15 minutes. On my basic T14 it works ok for the most part, but my full fat Thinkpad P1 with an i9 is in S0 standby for longer than a few minutes, and sometimes uses more power than when it was fully on. Maybe Meteor lake with it’s LP E cores will fix this but I doubt it.
There’s nothing stopping x86-64 processors from being power efficient. This article is pretty technical but does a really good explanation of why that’s the case: chipsandcheese.com/…/why-x86-doesnt-need-to-die/
It’s just that traditionally Intel and AMD earn most of their money from the server and enterprise sectors where high performance is more important than super low power usage. And even with that, AMD’s Z1 Extreme also gets within striking distance of the M3 at a similar power draw. It also helps that Apple is generally one node ahead.
On the x86 architecture, RAM is used by the CPU and the GPU has a huge penalty when accessing main RAM. It therefore has onboard graphics memory.
On ARM this is unified so GPU and CPU can both access the same memory, at the same penalty. This means a huge class of embarrassingly parallel problems can be solved quicker on this architecture.
It’s been a while since I’ve coded on the Xbox, but at least in the 360, the memory wasn’t really unified as such. You had 10 MB of EDRAM that formed your render target and then there was specialised functions to copy the EDRAM output to DRAM. So it was still separated and while you could create buffers in main memory that you access in the shaders, at some penalty.
It’s not that unified memory can’t be created, but it’s not the architecture of a PC, where peripheral cards communicate over the PCI bus, with great penalties to touch RAM.
Well for the current generation consoles they’re both x86-64 CPUs with only a single set of GDDR6 memory shared across the CPU and GPU so I’m not sure if you have such a penalty anymore
It’s not that unified memory can’t be created, but it’s not the architecture of a PC, where peripheral cards communicate over the PCI bus, with great penalties to touch RAM.
Are there any tests showing the difference in memory access of x86-64 CPUs with iGPUs compared to ARM chips?
Here is a great article on the topic. Basically, x86 spends a comparatively enormous amount of energy ensuring that its strong memory guarantees are not violated, even in cases where such violations would not affect program behavior. As it turns out, the majority of modern multithreaded programs only occasionally rely on these guarantees, and including special (expensive) instructions to provide these guarantees when necessary is still beneficial for performance/efficiency in the long run.
For additional context, the special sauce behind Apple’s Rosetta 2 is that the M family of SoCs actually implement an x86 memory model mode that is selectively enabled when executing dynamically translated multithreaded x86 programs.