won't say I'm totally proud of myself here, but once I saw that the Claude C compiler was super buggy according to YARPGen and Csmith, I had a hard time preventing myself from doing something about it

https://john.regehr.org/writing/claude_c_compiler.html

claude_c_compiler

@regehr one worry I would have about this is whether using a reducer is "reasonable". I know it's "needed" for humans to analyse the problems, but I don't know if the claude compiler has a compositional enough structure that big programs hit the same problems small programs do (I know clang etc *do* have this structure but that's because they were written by humans). This feels like some opposite to the "small model theory" we get in program synthesis for working out which candidates are likely to be most general.

I guess that's easy to verify by checking all the pre-shrinking cases are fixed by the fixes to the reduced bugs, and maybe against all the intermediate shrunk programs.

@lenary I feel like many (but surely not all) of our assumptions about software are true for the vibe coded stuff. perhaps a little bit because they're sort of universal observations, but more because the LLMs live and die by their training -- which is us