Mastodawn

Day 20 of Advent of Compiler Optimisations!

Loop over 65,536 integers doing comparisons — that's 65,536 iterations, right? Wrong! With the right flags, the compiler processes 8 integers per iteration using SIMD instructions. Same number of assembly instructions, 8× the throughput. What's the trick that makes this possible?

#AoCO2025

SIMD City: Auto-vectorisation — Matt Godbolt’s blog

Doing more with less: vectorising can speed your code up 8x or more!

Show thread

Xarn Dec 21

@mattgodbolt Now do it with floats 🙃

(I spent lot of time trying to convince GCC/Clang to optimize various vectorizable float loops with just local assumptions without the big guns of `-fchange-how-floats-work-globally`, but they are surprisingly bad at that.)

Show thread

matt godbolt

@horenmar give me ten minutes... You'll see ..