I can't stop thinking about the LLM-generated compiler that passes all the unit tests but emits inner loops that benchmark over 150,000x slower than a gcc debug build. I couldn't possibly have intentionally come up with such a funny demonstration of the point of genuine expertise https://harshanu.space/en/tech/ccc-vs-gcc/
CCC vs GCC

A Guide to comparing Claude Code Compiler with GCC

Harshanu

@0xabad1dea It’s so diabolically bad I don’t know how you do it. We’re not talking about gcc -O3 here, which does some truly herculian things, we’re talking about GCC with basically every optimization disabled. I don’t understand how the generated code wouldn’t run within a finite constant factor of gcc here, you just have to spit out the dumbest possible assembly for a given input source.

You just know there’s some absolutely horrific workarounds going in here because it’s apocalyptically bad in utterly incomprehensible ways.

@erincandescent @0xabad1dea making it all even funnier, there’s a full set of optimization passes in the implementation

@regehr @0xabad1dea i know! there’s presumably a whole Source -> AST -> SSA -> Multiple optimization passes -> Assembly pipeline going on here! what on earth is it even doing in there that the output is this embarassingly bad?!

The output would be quite frankly embarassing for a single pass source -> assembly/machine code translator (which you can do for a half reasonable subset of C in 2kB of C code, see e.g. OTCC) but there’s an entire optimization pipeline in there?!

@erincandescent @0xabad1dea I took a very quick look at the code for some of the passes and they're at least superficially plausible. I think one would have to actually run the compiler to see what they're doing. perhaps working together to produce that amazingly slow code, like maybe each pass adds a bunch of copies and the stupid AI forgot copy propagation. something like that feels likely.
@regehr @erincandescent the blogger's assessment is that the main issue in the SQL loop is it was shuttling every single variable read/write through one single register, because once there are more variables than registers it doesn't know what else to do.
@0xabad1dea @erincandescent well that's technically a register allocator
@regehr @0xabad1dea and it’s a bad one but it’s like a 10x factor of bad one at worst. and is say that only really because all of the mov big_offset(%rbp), %reg and back are probably huge and giving the instruction decoder indigestion.
@erincandescent @0xabad1dea @regehr I'd love to know if this is really the problem. The blog post itself shows signs of having been AI-generated, and it contains a whole section "Why Subqueries Are 158,000x Slower" that makes no sense to me