Day 20 of Advent of Compiler Optimisations!

Loop over 65,536 integers doing comparisons — that's 65,536 iterations, right? Wrong! With the right flags, the compiler processes 8 integers per iteration using SIMD instructions. Same number of assembly instructions, 8× the throughput. What's the trick that makes this possible?

Read more: https://xania.org/202512/20-simd-city
Watch: https://youtu.be/d68x8TF7XJs

#AoCO2025

SIMD City: Auto-vectorisation — Matt Godbolt’s blog

Doing more with less: vectorising can speed your code up 8x or more!

@mattgodbolt Speaking of which: Go is getting experimental SIMD intrinsics. See https://github.com/golang/go/issues/73787

Is there any hope of getting a version with that enabled in compiler explorer? It would greatly help discussions, I believe, because it would make it easy to link to snippets that generate suboptimal sequences.

Only involves x86_64 so far. Requires building the compiler with GOEXPERIMENT=simd (given that there's tip, maybe you do custom builds already?)

simd/archsimd: architecture-specific SIMD intrinsics under a GOEXPERIMENT · Issue #73787 · golang/go

Update (12/16/2025): The AMD64 low-level SIMD package is now available in Go 1.26 RC1 under GOEXPERIMENT=simd. Also, the package is renamed to simd/archsimd, per #76473,. See #73787 (comment) . Upd...

GitHub
@Merovius we do custom builds of many compilers! Feel free to submit a PR: we have documentation on how to add new compilers :)
@mattgodbolt 👍 I'll look into it
@Merovius @mattgodbolt Just to clarify, I’m 90% sure it only requires setting GOEXPERIMENT=simd when building the application (go build), not when building the compiler. A stock compiler is fine.
@prattmic @[email protected] for this one it needed to be enabled at build time as well, but I’ll try it out before submitting a PR, thanks

@Merovius @mattgodbolt @prattmic

Just verified, plain build does the right thing w/ GOEXPERIMENT=simd (and of course I am cross-compiling to amd64 and then using Apple’s emulation, as one does).