Still can't understand the strange GPU memory alignment and read/write speed gacha game. When malloc() gives you a SSR pointer, the speed is 740 GB/s, otherwise it's 620 GB/s. I now suspect it's not just alignment but a memory channel/bank interleaving effect. Depending on the location of the array, the DRAM channels/bank that have a chance to interleave to participate in a transaction jump up and down. Unfortunately AMD does not have documentation for Vega 20.
