Optimizing IRQ latency on the STM32H743 @ 480 MHz, perhaps for NES ROM emulation... Best result so far: 100 nanoseconds input-to-output latency when the vector table and the IRQ handler are relocated to Tightly-Coupled Memory without making HAL calls. Not bad, but the GPIO controller (several buses away) looks like the real performance killer here. WARNING: buggy code, see correction https://mk.absturztau.be/notes/ajvb448y305b01i4. #electronics #STM32
Keep optimizing IRQ latency on the STM32H743 @ 480 MHz. Just enabled i-cache and d-cache, and the IRQ latency dropped from 100 ns to 70 ns. 🚀 But cache shouldn't work like this. So my code is still touching slow memory somewhere. The stack perhaps, which is still in "normal" RAM. The slow Flash perhaps also makes it slower to abort main() if an instruction is stuck in a wait state. Need to check everything carefully... #electronics #STM32
@niconiconi I was about to ask if cache even matters for the stack and then realised it's probably the most important thing to cache unless the arch has some sort of SRAM block purely for the stack.
@gsuberland @niconiconi SRAM for latency-critical memory: which TCM is!
Q: Does a M7 have the dual stack thing with a main SP and a process SP? in that case, it'd sound pretty doable to just have the ISR stack in TCM.