Overall, our results show
that the newer CAS instructions do not perform well on TX2 and A64FX, and the older LL-SC instructions can bring significant performance improvements on all Arm-based systems tested

LOL, LMAO https://www.researchgate.net/publication/370682772_A_Study_on_the_Performance_Implications_of_AArch64_Atomics