this spring I've been teaching undergrads to use LLM agents. my rationale for doing this was that it would give me a chance to covertly teach lots of real software engineering, which is what I've done.

meanwhile, I've been watching the students closely to try to figure out whether a coding agent is a leveling factor (reducing differences in effectiveness between different students) or an anti-leveling factor (amplifying differences). at this point I'm 99% sure it's the second thing.

@regehr do you have an example for why that is? I’ve been wondering that myself and leaning towards (2) as well, but it was just a gut feeling with no evidence.

@ryan so you know how one student will have a bug, form a wrong hypothesis about it, get on the wrong track, and take a very very long time to track down the bug, whereas another student will just sort of home in on the issue right away?

I feel like it's just more of the same. it's a difference in how effective people's mental models and hypotheses are, and the available tools simply amplify whatever effects are already there.

but I have no hard evidence for anything

@regehr @ryan This matches what I hear from friends in industry.

I hear a lot of stories of projects where someone outsourced thinking to an LLM and spent days or weeks debugging something and not fixing it when applying a bit of human judgment would solve the problem. A friend of mine makes good money as a consultant who cleans up people's LLM-assisted messes.

On the flip side, the growth rate of startups is faster than ever because you can ship at previously impossible velocities.

@regehr I feel like I see this in every area, e.g., testing. If I want to test something "decently well", I can do that more quickly than ever . Just for example, I vibe coded some heinous multi-threaded shared memory code that I would have previously only done for the most critical code since it would've taken a lot of time to get decent confidence on correctness.

I then spent a couple hours getting a coding agent to produce a reasonable fuzzer, add deterministic replay, asserts, etc.

@regehr It used to take quite a bit of care and a fair amount of work to deterministic replay out of a huge pile of gross multithreaded code, but that's a task coding agents are actually pretty good at since they can just fuzz things in a loop and then track down the source of non-determinism every time some is detected. And once you do it once and have some instructions for the LLM, it takes very little time to do it the next time.

On the flip side, as a user, I'm seeing more bugs than ever.

@danluu @ryan I heard a story recently about a startup where the engineering team accomplished all of their goals for the first quarter of 2026 by mid-January
@regehr @danluu @ryan including 10 more years of technical debt.

@danluu @regehr Yeah that matches what I see at Mozilla. If you just want to write code quicker, you can do that.

But if your goal is to write better code, you can do that too. It’s so easy to write fuzzers or exhaustive test suites now. And LLMs are getting really good at finding defects too.

Which way you choose is a bit of a reflection on what you want. Unfortunately most people in our industry just want more code quicker..

@regehr Sounds like you have the makings of some kind of experiment that you can run on your students...

@ryan