Vectorization doesn't always speed things up because of SIMD math.
Sometimes it speeds things up because it forces you to overlap independent dependency chains.

Exposing More Parallelism Is the Hidden Reason Why Some Vectorized Loops Are Faster - Not Vectorization per se - Johnny's Software Lab
I was preparing an article about Highway – portable vectorization library by Google – so I ported a few examples from my vectorization workshop from AVX to Highway. One of the examples was vectorized binary search. I assume most readers are familiar with simple binary search. It looks something like this: We take a lookup… Read



