@regehr i don't have a concrete one that i can share immediately, but i had someone ask me "why are atomics needed? this works perfectly fine" on a concurrent spsc circular buffer queue that worked entirely relying on compilation unit barriers acting as compiler barriers, word sized load and store being atomic, and x86 tso behavior. running the same program with tests on arm immediately breaks, and so does turning on -flto