@aeva And caches are usually built to have multiple "banks" that each handle a fraction of a cache line. Let's say our hypothetical cache has 16 16-byte banks to cover each 256B cache line.
Well, all the requests we get from that nice sequential load go into the first 2 banks and the rest gets nothing.
So that's lopsided and causes problems, and will often mean you lose a lot of your potential cache bandwidth because you only actually get that if your requests are nicely distributed over mem.

