Code generated from LLMs is going to need more testing than code written by developers. This seems self-evident to me, but I suspect a lot of people are going to learn it (or ignore it) the hard way.

Given that most existing codebases are not well tested, and most developers don't test, this does not bode well.

The practical consequence of using LLMs to generate code is that many developers will find they have unwittingly moved themselves into a role they were probably trying to avoid: they have automated the creation of legacy code and have redefined their job role as debugging and fixing such code.
@kevlin to be fair, this was already somewhat preluded by SODD (Stack Overflow Driven Development) 🤷
@DJGummikuh @kevlin Oh crickey, a system operating off 'ceteris paribus' style responses to specific troubleshooting could have some very peculiar effects on programming - as machines can not internalise the distinction between holistic coding and firefighting.
@kevlin me, rewriting co-pilots class casts to instanceof pattern matching in java 10 times a day: too real
@kevlin
I am forcibly reminded of the first chapter of Vernor Vinge's A Fire Upon The Deep, wherein computing system developers use apparently innocuous recipes from an archive created by a malevolent AI; they build clever systems whose inner workings they don't thoroughly understand, and those systems ultimately turn on them.
Looks like we can go the same way without even needing a sentient AI...
@kevlin *ponders the paradox of llm generated tests*
@kevlin If it's not already, this should be shared with some folks on that large social network for business-y people...

@kevlin
I've been told LLMs are not a threat to developer jobs (even by my recently laid-off developer friends) because LLMs do not generate good code.

My counter-argument is that the people who would decide to replace software engineers with LLMs are the people already unable to tell if their company's code is any good - OR if they are spending enough on QA.

@hagbard @kevlin

And pretty much all of those decision makers have totally bought into the assumption that there should be a separate QA -- instead of improving development practice so that it's not needed.

So, anyway, they'll have the LLMs "do QA" too. 🙄

@JeffGrigg @kevlin

Cool cool. QA has always gone faster and been more affordable with more glibness.

@kevlin

Yup. I don't code for money anymore, but yes, to all of that.

I #volunteer #teach #Python to #HighSchool students. While the school has an academic honesty policy, my version is practical.

I'm not your #conscience and if you didn't write the code, you may not even know that it's broken, much less how to fix it. This is what #devs who use #AI are setting themselves up for.

@vor @kevlin

I do volunteer tutoring.

And same on practicality:
If someone else writes and gives you perfect code, you still will not have learned how to write such code.

"But I'll read their code and know!"

"No. That's not how code gets written. You can't 'memorize all the answers' in this field."

@kevlin I'm kinda looking forward to this future.

1. Find a problem and write a test, maybe an example of what I want a function to do and its side-effects.

2. Let the LLM attempt (and probably cheat) to generate the code.

3. Write more tests, probably some docs too, to make it closer to what I want.

4. keep iterating until other engineers using the code stop making "mistakes" with it too.

My guess is that it's going to take me a lot more tests and docs for it to write good enough code, which, is kinda true today anyway, except I still have to write the code too.

And folks still manage to find a way to break things 🤪

@kevlin It's going to be a hoot if the LLMs are propagating old buggy code around, because there is so much posted publicly, ha ha. More than half of the code posted publicly is "THIS CODE DOESN'T WORK, HELP ME!" -- not sure the LLMs are advanced enough to recognize they ingested broken code.
@kevlin @ai6yr That’s my new favourite argument, thank you.
@kevlin To me what is hard is defining the problem in complex systems. Not "I want to do X" but do X while saving Y for use in Z in such a way that it doesn't break B. Maybe not a real good analogy but being able to define all the parameters in a complex system is what makes me a good dev. I have never been able to lay out a complete problem, up front, in a way I could feel it into AI. And the side problems is where the debugging is
@katana0823 @kevlin But what if you had some kind of formal language with a constrained grammar to specify your problem, then maybe the AI would ... oh. wait ...
@katana0823 @kevlin Somewhat ironically, non-AI systems have been great at this for decades (e.g. RDBMS query planners) and as an industry we have done almost nothing to pursue the potential in that area. I think this is because we don't agree on a good cost measurement for each subsystem and/or approach so aggregation of costs has seemed beyond reach. LLMs are showing the value of ignoring some minor level of error in order to see some valuable aggregate.

@kevlin @mogul I asked ChatGPT 4 to write some caching code. I saw some problems. Then, I had to follow up with, “identify problem areas in this”.

To give it credit, *when prompted* it knew what could go wrong. And, again, when prompted, could write mitigation strategies to avoid issues.

Overall, I give ChatGPT 4 “junior developer” status.

@pixelscience @kevlin @mogul

It didn't know what could go wrong. It gave the illusion of knowing. Maybe there's no difference.

@kevlin @kenkousen

I am actually more worried about developers blindly trusting the code the LLM generates bc "It came out of an AI".

An LLM is guaranteed to be wrong a certain percentage of the time.

https://thesteve0.blog/2023/03/28/what-chatgpt-is-not/

What ChatGPT is NOT – TheSteve0's Little World

@kevlin TBH, I enjoy that far more than the boilerplate web development stuff I do all the time

Let the computer do the boring stuff; I'll make it nice

@kevlin @pvaneynd I think this will not end well. Coding is a skill that must be maintained. The more you us an AI to generate the code, the more your coding skills will deterioate and the harder it will be for you to evaluate if the code produced actually does what it is supposed to. That is (as noted by John Siracusa) really hard, even if it is code you have written yourself. There is a reason that programmers spend a of time debugging their code.
@kevlin - like the joke goes:
Sr Dev: Where did you get this code?
Jr Dev: From a Stack Overflow question
Sr Dev: From the solution, or from the problem?
@kevlin Literally the end of Greenfield projects.
@kevlin Black hat hackers are going to have a field day (years, decades) with LLM-generated code. Mass-produced exploitable bugs will ensue, mark my words.
@cxj Yes, was discussing this just the other day. And, playing both sides of the exploitation, they'll also use LLMs to generate attacks.
@kevlin writing code is much overrated. Debugging and fixing superficially correct code is properly interesting and difficult. The digital creators have been self agrandising for a while: green field is where the glory and reward are, but actually brownfield is where the challenge is.
@kevlin Even without LLMs, that was 40% if my job.
@kevlin so they'll have to do the part that actually takes work...
@kevlin what about tests generated by LLMs, though? 😁
@kevlin All code is legacy code that needs to be debugged. There was never a time when people could just write code and choose to not make mistakes.
@mistersql @kevlin It is true that llm generated code is going to need more, and probably different, review than code written by the actual people on your team
@mistersql @kevlin It’s never been a better time for folks to get into property based testing, probably
@mistersql @kevlin an LLM prompt that produces a piece of code that solves a real world problem according to a defined set of specifications is just a program written in a very high level language and the people who are trained in understanding real world problems in a way that translates to prompts like that are programmers
@mistersql @kevlin It’s not even a different skillset just a new programming paradigm
@mistersql @kevlin I’m super skeptical of these tools but generating certain types of code under certain conditions is probably going to be one of the use cases that ends up being worthwhile
@kevlin as a professional developer for over ten years I can confidently say the time I spend writing code is negligible when compared to the time I spend reasoning about the system and trying to make it more closely match the expectations and needs of the users while remaining performant and maintainable. Even more so when it's code I have not written myself.

I guess the idea of programmers spending hours typing away furiously to create something comes from the media. Writing the actual code is often the easiest part and usually doesn't even take a day.

Turning that part of my job over to an overgrown auto-complete would not improve much if anything at all.
@kevlin this sounds like hell.
@roguehireling The road there is paved with many (many) good intentions 😈
@kevlin May I have retired from programming by this time.
@kevlin @dreid When chatgpt got publicly big and folks were searching for use cases, code generation abilities were praised. Some folks were praising using such tools to generate unit tests which clearly they saw as low value and easy. of course to me tests seemed the last thing you’d want to generate because we’re very bad at reviewing code for correctness! So we’ll have autogenerated legacy code with very misleading and incorrect tests. Yay.

@kevlin

Indeed. LLMs can do, if at all, what has already been done.

Who is going to do the new things to keep training the LLMs, if everybody is out of a job because of LLMs?

@MartinEscardo Indeed. Model collapse is a genuine issue.