@aeva To give a concrete example: suppose you're doing some simple compute shader where all you're doing is
cur_pixel = img.load(x, y)
processed = f(cur_pixel, x, y)
img.store(x, y, cur_pixel)
and you're dispatching 16x16 thread groups, (x,y) = DispatchThreadID, yada yada, all totally vanilla, right?

