This is a quite good "reference sheet for optimization techniques", with benchmarks done on their usefulness:
https://hackaday.com/2024/07/13/c-design-patterns-for-low-latency-applications/
There is a link to the paper on the article and sample code on GitHub:
https://github.com/0burak/imperial_hft
It would be interesting to run this code (or equivalent) in dev kits and see if the results are similar.

