won't say I'm totally proud of myself here, but once I saw that the Claude C compiler was super buggy according to YARPGen and Csmith, I had a hard time preventing myself from doing something about it
won't say I'm totally proud of myself here, but once I saw that the Claude C compiler was super buggy according to YARPGen and Csmith, I had a hard time preventing myself from doing something about it
Interesting, nothing surprising, to me at least.
The problem we are seeing in LLVM and from what I understand other open source project as well, it is not that they can't provide patches that "work".
They can't seem to effectively address code review. They will fix some things correctly, some incorrectly and on some comments, often go off and do completely wrong things.
Another issue is that they have a hard time following existing idioms in the code base. They often produce solutions they don't conform to the current idiom and they seem unable to make the correction based on review feedback.
Other issues are shallow fixes that make a crash go away but actually don't fix the real root problem, "it works" but it is wrong as well.
Combined w/ the fact that the average lines of code changes in a patch is small. The kind of hand holding required results in a large net negative return on value.
I think these flaws in inherent in the model and not really fixable long-term in the current LLM based tooling. We need models that can actually "reason" and "understand" and LLMs can't do that.
This is what we get w/ statistical inference and it is ineed impressive but not sufficient.
This is for all intents and purposes a huge experiment and Open Source gets a front row seat but we can't get off the ride nor are we supported or compensated for what is essentially a large burden on our resources.