If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

That's a cognitively brutal task.

Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

I propose any productivity gains will be consumed by false negative review failures.

@pseudonym This.

I do a lot of "computer science labs", where students learn to write code, and they wave me down when they have questions. When their code doesn't do what they expect, it's often easy to figure out what went wrong because you can spot a bit of code that looks funky. And usually, the problem is in those few lines.

LLM code is meant to look like good code, so you don't get these little shortcuts.

@Moutmout

Good example I hadn't thought of.

Yes, human novice code mistakes have a "shape" to them a teacher can recognize quickly, or suspect because of how the error manifests.

These are different classes of "good looking" failures.

@Moutmout @pseudonym

Dunning Kreuger as a Service 🙃

@Moutmout and not just code: it took me just as long to find all the crucial mistakes in an AI translation as it would have taken me to do the translation myself.

Evaluation of the risks: https://www.draketo.de/software/ai-translation-evaluated#completely-changed
@pseudonym

AI Translation Evaluated: Effort and Risks

Verstreute Werke von ((λ()'Dr.ArneBab))