Does anybody know why gcc and clang can't optimize away this bit shift on most architectures?

uint16_t test(uint16_t *buf) {
return (uint16_t)((char *)buf)[0] | ((uint16_t)((char *)buf)[1] << 8);
}

I tested with godbolt.org and it only really showed good optimizations on powerpc.

On all little endian systems this should be equivalent to just *buf which has much shorter assembly.

Ok, slight followup question:

Is this UB if ptr does not have the correct alignment for uint16_t?

(unsigned char *)(uint16_t *)ptr

The reason is that __builtin_assume_aligned(buf, 2) seems to allow further optimization on arches where alignment matters even though I would have expected that alignment to be implicit due to the pointer type.