Out of curiosity, I wanted to see if ChatGPT noticed the bug/mistake I mentioned here:

https://github.com/obsproject/obs-studio/pull/13198#issuecomment-4019545931

I pasted the relevant function in and... it nitpicked all sorts of irrelevant stuff about error handling (allocation failures etc), types, and API contracts, and completely missed the actual issue. It even mentioned "the use of CLOEXEC is reasonable" and the fact it conflicts with some logic branches was completely missed.

This is a function using standard POSIX APIs. This isn't even obscure code, it's doing something (spawning a process with a pipe) that has been done a million times before.

It just didn't catch this bug because this bug / a similar shaped bug wasn't in its training data. Because AI isn't intelligent, it's still just fancy autocomplete and can't think.

("But you didn't catch it either" - I wasn't reviewing the whole function when I wrote that PR, I was fixing a different bug and had some tunnel vision since I was just normalizing/extending/fixing a pattern that was already there, so I didn't notice the use of CLOEXEC earlier. But I did ask ChatGPT to review the whole function, not just the PR change.)

Fix spurious fd closing (os_process and v4l2) by hoshinolina · Pull Request #13198 · obsproject/obs-studio

Description Fix two bugs causing the wrong file descriptors to be closed. The os_process one is harmless but was discovered while debugging this. The v4l2loopback one is actively causing crashes. M...

GitHub

Honestly it feels like AI is... something like optimizing for nitpicking "best practices" while completely missing the real issues.

Nobody writes code according to all "best practices". Error handling in the example above is an anti-pattern. This isn't a kernel. Littering your code with NULL checks for memory allocation (and functions which can only fail due to memory allocation failures behind the scenes, like fdopen) is a net negative because it achieves nothing (if small mallocs are failing you code is going to crash anyway) while making the code more obtuse and complex and hard to read, which means real bugs will slip through more often.

If you're the kind of person that cares about having more "strictly correct" code... you shouldn't be using C to begin with, but rather a language where doing that doesn't hurt readability (like Rust). But AI isn't going to give you that advice...

@lina give claude or Gemini a try

@lina sort of similar in my experience when i put in some code at a client's request. a lot of very non-specific suggestions that are perhaps frequently good practice but not very relevant.

although in one instance it came close enough that if i hadn't figured out the relevant bug already, it perhaps could have nudged me in the right direction.

@lina Imagine writing printf() without checking the return value 🫨

@lina my devil's advocate response is that chatgpt has really fallen off. it would be more fair to get like a recent claude to try it.

my honest response is malloc_or_die

@dramforever Many POSIX APIs (like those it pointed out) only fail due to usage error or internal malloc failure. There's no "or die" variant for that kind of stuff.
@lina of course, as you said, the real solution is to use something with actual error handling support, not this distraction-only boilerplate
@lina you had me till the last paragraph

Not checking failures like that gives me such an overwhelming ick

@natty Checking for memory allocation failure in a general purpose application is a waste of time. If allocations are failing, the only thing you can do is abort. Modern languages like Rust recognize this and default to making allocations panic, instead of forcing the programmer to handle it like a normal error. Even C++ does this with exceptions. Only in contexts like kernels and critical systems does it make sense to always handle allocations as fallible.

In C, letting the NULL deref crash is equivalent (and safe, NULL derefs are **not** a memory safety violation in practice, outside array indexing operations but we're talking about small object allocs with fixed size here). Sure you can check and explicitly abort... but that ends up being the same thing, just more code for no reason.

Every error check the LLM pointed out could only practically happen due to allocation failure.

@lina yeah I'm aware I'm just autistic af about the way I write code
@lina @natty also due to overcommit the main time allocation fails is when you try to allocate stupendous amounts of memory, or if it is run under a specific, lower, memory limit
@charlotte @natty @lina unpopular oppinion: overcommit is a mistake and a security issue and shouldn't exist
@lina @natty lots of people make an xmalloc wrapper that just aborts on fail
@SRAZKVT @natty That doesn't help here, where the issue is standard POSIX functions that can only legitimately fail when internal memory allocations fail.
@lina @natty you make wrappers for those to ? sorry if i'm missing something
@SRAZKVT @natty I mean yeah but making infallible wrappers for every relevant C/posix function seems rather annoying
@lina @natty it's easier to handle than having everything abort by default when you need it to not
@lina @natty If your runtime uses a memory cache it would make sense to handle it by clearing the cache & freeing the memory.

I'm thinking in particular of libc and the malloc cache.

But a lot of other VM-based languages make this a relevant consideration depending on configuration.

@lina

If you're the kind of person that cares about having more "strictly correct" code... you shouldn't be using C to begin with, but rather a language where doing that doesn't hurt readability (like Rust). But AI isn't going to give you that advice...

LLVM is ensloppifying and Rust relies on it so that's not great. (GCCRust is not ready.)

One could say Ada but Ada is not properly bootstrappable yet.

So the main thing left is formally verified C or formal checkers/proofers/etc with C export like Low*.

The KaRaMeL user manual and documentation — The KaRaMeL user manual documentation

@lispi314 @lina gcc-rust would've been ready eons ago if Rust was a properly designed language with a proper spec, but instead they have to reverse engineer the original implementation to figure out how to implement stuff

@lina when Copilot reviews first became a thing on GitHub I reluctantly decided to click it just to see what it would say and it lines up **exactly** with what you’re describing here: irrelevant nitpicking about things that we don’t want to change anyway. It didn’t even catch the glaring error that a human could’ve spotted immediately.

And that was at least over a year ago, so these things aren’t getting any better either 😅

@lina My solution to the allocation problem is to use a wrapper function that crashes on failure.
@lina or tbh if it was in its training data, it occurred so few times that even if it was incorporated into its weights, it is not very significant in them

@lina sounds about right with my experiences.

Even better is if a bunch of systems use very similar looking code. The LLM will point out "bugs" that don't actually exist in the code under review. In my case it was saying (paraphrased) "check visibility before calling click" in code using the cypress framework. Cypress does that check internally by default.

@lina
this is a bit besides the point, but for the sake of completeness, Gemini 3 Pro in ai studio does manage to find it (if I understood the issue correctly, at least). that being said, it only found it for stdout, not stderr, and ofc amidst a bunch of nitpicks (none related to allocation failures though). nothing else I tried got this far.

@refi64 Neat, so that one is kinda useful for this.

Still don't have much hope of it helping with more niche stuff... ^^