Day 22 of Advent of Compiler Optimisations!

Comparing a string_view against "ABCDEFG" should call memcmp, right? Watch what Clang actually generates — no function call at all, just a handful of inline instructions using some rather cunning tricks. How does it compare 7 bytes so efficiently when they don't fit in a single register?

Read more: https://xania.org/202512/22-memory-cunningness
Watch: https://youtu.be/kXmqwJoaapg

#AoCO2025

Clever memory tricks — Matt Godbolt’s blog

We learn that compilers have tricks to access memory efficiently

@mattgodbolt WTF. I'm really disappointed that Clang is not smart enough to do

t1:
cmp rdi, 1 ; is length 1?
jne .LBB0_1 ; if not 1, goto "return false"
cmp byte ptr [rsi], 65 ; is the byte 65 ('A')?
.LBB0_1:
sete al ; set result to 0 or 1 accordingly
ret ; return
@riley these kinds of things are always worth asking the clang Devs and/or filing bugs to report :). Even if we don't know how to improve the compiler ourselves, a respectful issue request can help them help us!
@mattgodbolt Yeah, that would seem like good idea, but I'm afraid I'm probably juggling too many rabbit-holes for my dwindling spoon collection already. 
@mattgodbolt quibble: 7 bytes absolutely does fit into single register. It can't be read with single load though, because string_view does not guarantee null byte at the end.
@horenmar oh! Fair point, yes :-)

@mattgodbolt my first thought was to just read 8 bytes into one register and mask off the eighth before or after the compare. then i remembered that the eighth might not even be mapped because the start address might be unaligned.

that got me thinking: will there be something about alignment coming up?

@tiwe no plans for alignment: I'm not sure what I could write. What kind of thing have you seen compilers do? (My main experience is with x86 where alignment isn't really an issue these days)

@mattgodbolt i've heard of penalties for unaligned accesses at least for some SIMD stuff on x86 and on older ARMs. there are also architectures (ppc?) where you can only access memory in larger aligned words so the compiler needs to do shifting for individual bytes.

i tried to produce something with alignas but had no luck in that regard: https://aoco.compiler-explorer.com/z/jWWvnbxed

please also see how the hover thing for the string in the .quad is shown in reverse, you seem to account for little-endian architectures, but here (on s390x) it is not needed.

Compiler Explorer - C++ (s390x gcc 15.2.0)

using namespace std::literals; template <std::size_t len, std::size_t align> struct alignas(align) str_arr { char a[len]; }; bool t(str_arr<7,1> const * a) { return std::string_view(a->a, sizeof(a->a)) == "ABCDEFG"sv;} bool t(str_arr<8,1> const * a) { return std::string_view(a->a, sizeof(a->a)) == "ABCDEFGH"sv;} bool t(str_arr<7,4> const * a) { return std::string_view(a->a, sizeof(a->a)) == "ABCDEFG"sv;} bool t(str_arr<8,4> const * a) { return std::string_view(a->a, sizeof(a->a)) == "ABCDEFGH"sv;} bool t(str_arr<8,8> const * a) { return std::string_view(a->a, sizeof(a->a)) == "ABCDEFGH"sv;} bool t(str_arr<4,4> const * a) { return std::string_view(a->a, sizeof(a->a)) == "ABCD"sv;} bool t(str_arr<4,1> const * a) { return std::string_view(a->a, sizeof(a->a)) == "ABCD"sv;}

@mattgodbolt regarding endianness: years ago i tried to replace usage of htonl/ntohl with a loop in a templated function. it works great with clang, but gcc only sees it in one direction, not the other: https://aoco.compiler-explorer.com/z/Wc1PGPTGE
i had found something in gcc's bug tracker back then, but it is still unfixed.
Compiler Explorer - C++

template<typename T> auto host_to_network (T value) noexcept { static_assert(std::is_unsigned_v<T>, "shift is only supported on unsigned"); std::array<char, sizeof(T)> ret; for (auto & byte : ret | std::views::reverse) { byte = static_cast<char>(value & 0xff); if constexpr (sizeof(T) > 1) value >>= 8; } return ret; } template<typename T> auto network_to_host (std::array<char, sizeof(T)> const buf) noexcept { static_assert(std::is_unsigned_v<T>, "shift is only supported on unsigned"); T ret{0}; for (auto const byte : buf) { if constexpr (sizeof(T) > 1) ret <<= 8; ret |= static_cast<uint8_t>(byte); } return ret; } template auto host_to_network<uint8_t> (uint8_t value) noexcept; template auto network_to_host<uint8_t> (std::array<char, sizeof(uint8_t)> const buf) noexcept; template auto host_to_network<uint16_t> (uint16_t value) noexcept; template auto network_to_host<uint16_t> (std::array<char, sizeof(uint16_t)> const buf) noexcept; template auto host_to_network<uint32_t> (uint32_t value) noexcept; template auto network_to_host<uint32_t> (std::array<char, sizeof(uint32_t)> const buf) noexcept; template auto host_to_network<uint64_t> (uint64_t value) noexcept; template auto network_to_host<uint64_t> (std::array<char, sizeof(uint64_t)> const buf) noexcept;