I (clam) messed up https://chipsandcheese.com/2024/01/20/inside-qualcomms-adreno-530-a-small-mobile-igpu/ because I thought it ran in wave64 based on testing the wrong thing and reading Mesa code.
I now think it runs in wave32 mode and has two scheduling partitions per SP. That's based on tests I wrote to investigate Adreno 730 (which is a really interesting architecture in its own right), which I went back and ran on Adreno 530.
The first is a simple divergence penalty test, which measures throughput with OpenCL threads being coherent at different granularity. It works as expected and shows Pascal gets best throughput with 32 or higher, and GCN does so with 64 or higher. Adreno gets better throughput with 32 or higher, suggesting wave32