Mastodawn

juniyonul Feb 27

I can't stop thinking about the LLM-generated compiler that passes all the unit tests but emits inner loops that benchmark over 150,000x slower than a gcc debug build. I couldn't possibly have intentionally come up with such a funny demonstration of the point of genuine expertise https://harshanu.space/en/tech/ccc-vs-gcc/

CCC vs GCC

A Guide to comparing Claude Code Compiler with GCC

Harshanu

Lesley Lai Feb 12

@0xabad1dea I have a feeling that this writing relies on LLM way too much

abadidea Feb 12

@lesley sometimes I feel like the only person in tech who knows how to write three consecutive paragraphs all by herself

@lesley @0xabad1dea There's a disclaimer at the bottom of the blog post stating that "The benchmark design, test execution, analysis and writing were done by a human with AI helping where needed."

Ratsnake Games 🔞Feb 12

@sodiboo @0xabad1dea @lesley

"where needed"

"WHERE NEEDED"????

For which part of this process do you NEED AI help?

*screams into screaming pillow*

Nina Kalinina Feb 12

@0xabad1dea makes two of us. The CCC isn't the flex AI proponents think it is, but there aren't enough people who can understand that it should have been a cautionary tale rather than a sensational headline. :(

Nina Kalinina Feb 12

@0xabad1dea like, I'll bait; great stuff, unsupervised agent produced something that can compile some C code that in a certain definition can be called "working", but absolutely not ready for any sort of production usage.
The agent has multiple reference implementations, extensive testing suite, and C is literally based on an extremely well defined standard. AI proponents claim that we're in an era where all we need is to provide a specification, and the agents will just implement the thing for us. This CCC thing is proof that they quite literally can't; it's difficult to think about a commercial software project that would have a specification better defined than the C standard. And a vanilla C compiler isn't all _that_ complicated, it's literally the kind of thing many undergrad SWE students build as a student project (yes yes lots of caveats and simplifications). You'd think Anthropic could improve on their CCC with the agents until they get the compiler working at least as well as the tcc would, but 1/2

Mad Engineering Feb 12

@0xabad1dea @nina_kali_nina Ai is about where.i was.as a junior in college. Had some impressive accomplishments, but you wouldn't trust their code to be efficient, elegant, or maintainable.

Nina Kalinina Feb 12

@madengineering @0xabad1dea unlike juniors, it cannot produce novel things; anything I've seen reportedly done by AI ended up being a rip off of something that was in the training data set, at best translated to a different programming language

Nina Kalinina Feb 12

@madengineering @0xabad1dea just the other day I've read a post how awesome claude is, it got a 37 year old game binary and produced a typescript port (that no one has ever seen published yet). But on a closer evaluation it turns out this AI typescript port is likely based on a currently maintained C port, and the JS port of the game is definitely in the training data too. Stealing machines be stealing.

@nina_kali_nina
...the sad thing is a tool that could look at that binary and point you at the reponof the port would have a genuinely impressive advancement for search

The technology they're building could actually be useful if they stopped throwing money in the furnace, and were more conservative with applying it. Though of course you likely don't n3ed ML for that anyways and I'm pretty sure the mass buildout is using ai as an exuse for massive resource and land grabs thst they already wanted to do sooo

@madengineering @0xabad1dea

@nina_kali_nina @madengineering @0xabad1dea I’ve seen this myself as well. Reviewed a draft PR for some enhancement for our internal compiler generated by Claude. It’s weird and too complicated and my spidey sense is tingling. I go look for the same module name in the rust compiler. It’s basically the same structure, but simplified.

And I think that’s generally what people are seeing when they see an LLM being “creative”. They’re just unable to comprehend how vast a pool of human work it’s cribbing from.

@madengineering @0xabad1dea @nina_kali_nina And unlike juniors, it cannot internalize the lessons you teach it. You can correct it, and it will take that into account. But as soon as that slides off the context window, it’s gone.

Mad Engineering Feb 12

@0xabad1dea @nina_kali_nina @bytex64 I've improved since college.

So Claude, consider this a challenge. I've upped my game, now up yours.

Nina Kalinina Feb 12

@0xabad1dea but the blog post announcing the CCC quite literally says that the agents made the code base unmaintainable and cannot fix any more bugs without introducing new ones. So, that's a fail too.

And then looking at it from a practical perspective: if I want a C compiler, I can get one for free, and I have multiple options: clang, gcc, pcc, tcc, chibicc, and probably many more. If for some reason I want to add the support for a new platform in them, I can, too. It's been done too many times to count. Why would I want to spend merely 20 grand on building a thing that is, by all sensible benchmarks, at best is a toy?

I have an answer, and I don't like it. If I wanted to undermine labour, if I wanted to destroy FOSS, if I wanted to steal human work and resell it, that would've been exactly what I'd do. And I'm yet to be proven otherwise that there are other real motivations behind such projects.

2/2

Thomas Depierre Feb 12

@nina_kali_nina @0xabad1dea I have another answer, but it is not satisfying.

They really have no idea what they are doing and they are so privileged they never had to really face reality.

Nina Kalinina Feb 12

@Di4na @0xabad1dea it's not a happier* answer, yeah...

prom™️Feb 12

@nina_kali_nina @0xabad1dea I think people are also motivated by fascination and simple bragging, but the larger point is also lost on me. Putting the tech aside for a second, on a business level I only see those corps stealing and hyping.

aburka 🫣Feb 12

@nina_kali_nina @0xabad1dea also, the source of all of those real compilers was used (stolen, for the GPL ones) to create the AI agent, so the exercise is testing on the training set

@0xabad1dea very interesting read

neobot_flag_trans

@0xabad1dea claude has a fucking compiler.
what the fuck.
are we vibecompiling alongside vibecoding now

Cat 🐈🥗 (D.Burch)

@thing @0xabad1dea Don't worry, it passed all vibeunit vibetests!

Tzimisce Flesh Feb 12

@catsalad @thing @0xabad1dea Vibecoded unit tests
Call that vibechecks.

goedelchen Feb 12

@flesh @0xabad1dea @catsalad @thing For Germans: Today is Viberfastnacht.

@0xabad1dea oh my God a shitty compiler was made from fragments of plagiarized code by a sentence recombinator. The world is ending

🔻 aetios 🇪🇺Feb 12

@0xabad1dea I knew it. My first response to this was 'I bet it generates horrible executables' and I'm not even a computer engineer.

Chris [list of emoji]Feb 12

I wonder how it does with the tests that gcc passes but the agent didn't have access to.

Fritz Adalis Feb 12

@0xabad1dea @catsalad
It's not slow, it's being careful.

Else, Someone Feb 12

@0xabad1dea amazing benchmarking

goatcheese Feb 12

@0xabad1dea Their choice of using rust for ccc is also not innocuous: using rust means benefitting from the expertise of all the rust contributors.
For example, unsafe blocks aside (i didn't check if ccc has any), you can't praise this compiler for not segfaulting if rustc made sure it can't!

@goatcheese @0xabad1dea
This is a very good point. The "yay the compiler runs without crashing" part would be far less likely achieved were it writing the compiler in C.

@da77a9
I - don't think that's necessarily accurate. (Context: I'm a c programmer and compiler hacker, and not at at all a fan of these things)

But i think that that's downplaying that creating a rust implementation of any given feature is harder than doing the same in C, since it forces you to pay attention to e.g. lifetimes.

any competent C programmer would be doing thst anyways even if tje compiler doesn't force them to, but - there's arguments both ways here tbh

"Rustc made sure it can't!" Is... not really accurate.

If you write c code with some kinds of bad memory management, it'll compile but segfault.

If you write the equivalent rust, it won't compile

So necessarily in order to compile at all, it must be doing the same me.ory management that it'd need to do in C. The compiler checking its work probably makes it easier because it's constant small feedback which is more likely in the training set, but with unit tests available anyways, the increased complexity of writing rust over writing c is possibly a wash.

@goatcheese @0xabad1dea

@da77a9 @goatcheese @0xabad1dea @pixx I imagine the rust compiler providing you with tips for how to fix errors helps a lot, because a Ralph can just try the suggestions and see if they make the errors go away.

goatcheese Feb 12

@da77a9 @0xabad1dea @pixx I would guess that the breadcrumbs given by rustc's error messages are more conducive to the LLM landing on working code than runtime crashes using C. Also what you or I find hard may not matter to an LLM that doesn't "understand" like a human does.

I guess my point was that this "from-scratch implementation" is a fiction: an LLM could not have invented the first C compiler any more than it could have invented rustc. It's spitting out a remixed version of code and techniques invented by many, many, extremely ingenious humans. At a cursory glance it has wow factor, but ultimately all LLM vendors are greedy ingrates trying to extract value from the work of others

@goatcheese @0xabad1dea @pixx
Oh absolutely it can't invent. But

1) rust compile errors prevent some versions of Frankenstein's compiler from even lurching off the table, so Frankenstein has to try again with different body parts (feedback loop)

2) a train of borrowed fragments of rust, that pass that fitness test, and that fit together probabilistically, from a sample of rust code that works (as well as compiles) is more likely to stay on the rails than the same in C.

It is interesting that "correct" but it's a memory hog (no ownership conflicts but lifetime management issues?) is evident from the comparison of gcc vs ccc execution.

I'm not suggesting any sort of magical properties from rust - just that it removes some degrees of freedom.

Frank Bennett 🇯🇵Feb 12

@0xabad1dea There's a subtoot I could post here about code written by *humans* indifferent to my own experience with a particular problem space … but I won't.

Ray McCarthy Feb 12

@0xabad1dea And it was essentially an "open book exam" for Claude".

@0xabad1dea I saw the hype around this and couldn't think of anything constructive to say. It's unfortunate that it didn't plagiarize a few thousand lines tiny C bootstrap compiler rather than spewing out a low quality copy of a monster. It's not hard to write a tiny C compiler. I'd trust the output from said tiny C compiler to be correct before I'd trust this thing. But most importantly, of all the things that a pattern replicating machine should be able to replicate, its a tedious repeated pattern set of code (lexers, recursive descent parsers, codegen from AST..) to process a regular language. It doesn't seem to demonstrate any surprising capability, nor any real utility. I'd be more interested (and terrified) to see results from ML learning how to emit machine code that makes the generated code pass the programs unit tests while also getting best benchmark scores... I like horror movies...

Graeme 🏴󠁧󠁢󠁳󠁣󠁴󠁿Feb 12

@0xabad1dea I wonder who picked rust for it? It should have written the compiler in C, then compiled itself.... 😀

Moses Izumi Feb 12

@0xabad1dea
Please don't call it that CCC next time.
I clicked on this thread through @nina_kali_nina 's reply in the vain hope that it was going to be about something cool, not yet another heinous misuse of an overgrown proprietary chatbot.

abadidea Feb 12

@moses_izumi @nina_kali_nina ... I didn't call it anything at all? but yes that's the thing's name, that I didn't pick, not sure what you expect me (a person making fun of it for existing) to do about it?

Moses Izumi Feb 12

@0xabad1dea @nina_kali_nina I just thought it was morbidly funny that it shared an acronym with the moderately famous hacking convention Chaos Communication Congress (actually called C3 but you get the point): I didn't mean it as a direct response.

Would be even grosser if said acronym was coined by Anthropic themselves.

Nina Kalinina Feb 12

@moses_izumi @0xabad1dea indeed CCC is how Anthropic's employee called the project, it's not something we made up, sorry. It's kind of like Gemini the protocol Vs Gemini the lying bot

jn (BE side)Feb 14

@nina_kali_nina @moses_izumi @0xabad1dea ClaudeCC, if need be. i don't let them steal the name of my favorite nerd club

Dan Frumin Feb 12

@moses_izumi @0xabad1dea @nina_kali_nina it's literally the name of the thing

Erin 💽✨Feb 12

@0xabad1dea It’s so diabolically bad I don’t know how you do it. We’re not talking about gcc -O3 here, which does some truly herculian things, we’re talking about GCC with basically every optimization disabled. I don’t understand how the generated code wouldn’t run within a finite constant factor of gcc here, you just have to spit out the dumbest possible assembly for a given input source.

You just know there’s some absolutely horrific workarounds going in here because it’s apocalyptically bad in utterly incomprehensible ways.

Erin 💽✨Feb 12

@0xabad1dea …the more i ruminate on it the more i think digging into the output (which is rather difficult given the poor quality and lack of debugging symbols) would find that it’s done something like sometimes implementing multiplication iteratively or something. it’s really astoundingly bad.

big awoo notation Feb 12

@[email protected] @0xabad1dea looking at the disassembly makes me think it has invented 6502-64 /j

John Regehr Feb 12

@erincandescent @0xabad1dea making it all even funnier, there’s a full set of optimization passes in the implementation

Erin 💽✨Feb 12

@regehr @0xabad1dea i know! there’s presumably a whole Source -> AST -> SSA -> Multiple optimization passes -> Assembly pipeline going on here! what on earth is it even doing in there that the output is this embarassingly bad?!

The output would be quite frankly embarassing for a single pass source -> assembly/machine code translator (which you can do for a half reasonable subset of C in 2kB of C code, see e.g. OTCC) but there’s an entire optimization pipeline in there?!

John Regehr Feb 12

@erincandescent @0xabad1dea I took a very quick look at the code for some of the passes and they're at least superficially plausible. I think one would have to actually run the compiler to see what they're doing. perhaps working together to produce that amazingly slow code, like maybe each pass adds a bunch of copies and the stupid AI forgot copy propagation. something like that feels likely.

abadidea Feb 12

@regehr @erincandescent the blogger's assessment is that the main issue in the SQL loop is it was shuttling every single variable read/write through one single register, because once there are more variables than registers it doesn't know what else to do.

John Regehr Feb 12

@0xabad1dea @erincandescent well that's technically a register allocator

Erin 💽✨Feb 12

@regehr @0xabad1dea and it’s a bad one but it’s like a 10x factor of bad one at worst. and is say that only really because all of the mov big_offset(%rbp), %reg and back are probably huge and giving the instruction decoder indigestion.

Jason Orendorff Feb 12

@erincandescent @0xabad1dea @regehr I'd love to know if this is really the problem. The blog post itself shows signs of having been AI-generated, and it contains a whole section "Why Subqueries Are 158,000x Slower" that makes no sense to me

David Chisnall (*Now with 50% more sarcasm!*)Feb 12

@erincandescent @0xabad1dea

And we're talking about the kind of things that tcc can compile. TCC was originally an entry into the International Obfuscated C Competition, as a C compiler that fitted on one screen and could compile itself (the back end bit is in QEMU as the Tiny Code Generator, which QEMU uses for JITing small fragments of emulated code).

The full version is bigger, but still very small. And it can compile SQLite.

It's pretty naïve. It doesn't do anything more than peephole optimisation. In the worst case performance is usually around 25% of GCC (occasionally worse for vectorised hot loops), for some things it's closer to 90%.

TCC is not designed for generating fast code, it was designed to be simple and to generate code quickly (they did a demo about 20 years ago with tcc embedded in GRUB, compiling the Linux kernel and then booting it. It took 30s to compile the kernel in an x86 emulator on a 1.25GHz PowerPC host). So if you're generating slower code than TCC, that's really embarrassing.

@0xabad1dea and this is with having access to the full source code of multiple existing compilers lmao

@0xabad1dea This thing was always a show pony. Like in a way: it's impressive. But, at the same time it's completely useless. Hopefully the novelty will wear off, but I feel like it's going to take a while.

@0xabad1dea "Claude’s C Compiler is a remarkable achievement." Is it really? Presumably, it was trained on GCC's source. The fact it didn't do *much* better is the remarkable thing to me.

Jeremy Kun Feb 12

@troglet @0xabad1dea It was also trained on decades of course notes from compiler courses, thousands of student course projects sitting on GitHub, hundreds of textbooks about compilers, the mailing lists of discussions of all those compiler engineers, etc.