Day 11 of Advent of Compiler Optimisations!

A clever loop that counts set bits using the "clear bottom bit" trick: value &= value - 1. Works great, generates tight assembly. But change one compiler flag to target a slightly newer CPU and something extraordinary happens to your loop. The compiler spots a pattern you didn't even know was there. What replaces your careful bit manipulation?

Read more: https://xania.org/202512/11-pop-goes-the-weasel-er-count
Watch: https://youtu.be/Hu0vu1tpZnc

#AoCO2025

@mattgodbolt So I tried this out with Rust and saw the same codegen that you describe (kinda crazy!). However, I tried the -march=sandybridge thing, but it didn't emit the xor eax, eax you said. I did see it with C++, though.
How would one go about benchmarking if the sandybridge popcnt issue is real so that I might try to fix this in rustc?
@fp take a look at things like uarch-bench by Travis Downs; setting up very subtle timing loops with carefully crafted dependency chains