Mastodawn

If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

That's a cognitively brutal task.

Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

I propose any productivity gains will be consumed by false negative review failures.

Show thread

Robin Adams Feb 26

@pseudonym Especially since the sort of mistake that LLMs make is the sort of mistake that's hardest for humans to spot. They produce bad code that looks like good code, because they were trained on a lot of good code and told "Write code that looks like this".

Show thread

mirth

@robinadams @pseudonym It's even worse in some ways. The tools don't just write code, they also write tests, run the tests, fix any failures, clean up, and document it. The result probably runs and does something close to the intent. At this point a human has to understand what's happened and _then_, without the benefit of hands-on involvement, spot the problems. Not easy.