New blog post! "Load store conflicts", in which we look at some performance sensitive code that has surprisingly dramatic performance swings based on the compiler and the microarchitecture used. Boosts appreciated!
Load-store conflicts
meshoptimizer implements several geometry compression algorithms that are designed to take advantage of redundancies common in mesh data and decompress quickly - targeting many gigabytes per second in decoding throughput. One of them, index decoder, has seen a significant and unexpected variance in performance across multiple compilers and compiler releases recently; upon closer investigation, the differences can mostly be attributed to the same microarchitectural detail that is not often talked about. So I thought it would be interesting to write about it.
