The only code review agent I have ever seen be even remotely good is just Codex xhigh. All the review services (and I've seen at least a dozen at this point) suck so bad that I'm not sure how they make any money at all.
@nateberkopec We have the built in Copilot review, which spews nonsense ~75% of the time but in this specific context (code review) its easy enough to ignore the noise for the value in the few comments left. We're about to migrate to Claude though, which I'm hoping is an improvement.
@tsvallender @nateberkopec I’d quit if 3/4 of comments on my PR were useless. And you’re paying for it. Why are you doing it to yourself?
@tsvallender @nateberkopec @pointlessone completely agree. I came here to say “do you hear yourself?”. This sounds like Stockholm syndrome.
@zenspider @tsvallender @pointlessone it's not an uncommon opinion/situation FWIW among my client base. Drives me absolutely insane, even as an LLM-augmentation booster myself
@nateberkopec @zenspider @pointlessone I really don’t see why. Objectively, it’s prevented bugs shipping and cut-down on overall review time by catching some issues before a human review. The cost is a minute or two of the author’s time to scan the comments and quickly resolve the ones that aren’t helpful. I’m not saying it’s perfect, I am saying it has value _in this context_.

@tsvallender you say it takes only a minute to scan and resolve the ones that are not helpful but it still takes time. Ultimately you train yourself to ignore a big chunk of feedback. It’s similar to how you setup monitoring and an overzealous alert that you ignore 3 out of 4 times. It creates noice that you learn to ignore. You think it’s useful once in a while but you still spend your mental bandwidth on filtering the noice. You also may think that this only happens to AI reviews but this training totally translates to all other feedback that looks similar, which is all feedback because it’s in the same place and uses the same UI.

@nateberkopec @zenspider

@pointlessone @nateberkopec @zenspider

If there’s evidence of that I’d be interested, but I don’t think those things are analagous. Alerts are different in that they’re _push_, so they do need to be high value or you will tune them out entirely, agreed. But I _don’t_ think learning to tune out AI noise turns into tuning out humans here, you go into the process with a different headspace (I do, at least). The AI feedback feels like a CI step, the human feedback is a conversation.

@tsvallender Idunno. Flaky CI doesn’t sound very enticing to me even if analogy might be better.

I’m curious what kind of feedback you get from AI. Could you give a few examples of feedback you typically ignore and a few of useful comments?

@nateberkopec @zenspider

@pointlessone

Heh, I guess maybe CI isn’t the right analogy either when you put it like that. I’ll try and remember to grab a couple next time I deal with some. I will just reinforce though, I was only arguing that it does have value, not that it’s fantastic!

@tsvallender I didn’t say it doesn’t. I’m just unconvinced the value is worthwhile.

Like there’s value in asbestos. It’s just negative aspects outweigh benefits. Not saying that AI is full on asbestos, just a colorful demonstration of the idea.