“One of the retreat's most shareable insights was that test-driven development produces
dramatically better results from AI coding agents. The mechanism is specific: TDD prevents a failure mode where agents write tests that verify broken behavior. When the tests exist before the code, agents cannot cheat by writing a test that simply confirms whatever incorrect implementation they produced.
This reframes TDD as a form of prompt engineering. The tests become deterministic validation for non-deterministic generation. Several practitioners described moving review efforts entirely to the test suite, treating the generated code as expendable. If the tests are correct and the code passes them, the code is acceptable regardless of how it looks.”

Maybe there is some hope for the last 10 years of my career before I retire…

https://www.thoughtworks.com/content/dam/thoughtworks/documents/report/tw_future%20_of_software_development_retreat_%20key_takeaways.pdf

@grmpyprogrammer Wait until you've seen LLMs in combination with formal verification:

https://model-checking.github.io/kani-verifier-blog/2023/05/01/writing-code-with-chatgpt-improve-it-with-kani.html

Kani is a model checker for Rust. In the blog post they:
1. Implement the proof first
2. Let the LLM implement the function until the proof passes

Mind-blowing stuff! And nobody is talking about it (meaning formal methods and automated reasoning).

See also here:
https://floss.social/@janriemer/115241985820433035

@vladimir_lu

@grmpyprogrammer I’ve found this to be absolutely true. When testing AI workflows I got far better results by asking one agent to write tests and then giving the spec to another, and having it run the tests without permission to modify them.
@grmpyprogrammer This presumes I know what the tests should be in advance. I frequently need to noodle with the code before I even know what the structure will be.

@Crell you generally know what you want the outcome to be. That means you have a spec. If you have a spec you can give it to an agent that can turn it into functional tests. And then you have tests to read and have code generated against.

The hard part is writing a spec and considering edge cases. The more detail I give an ai the more accurate it is.

To be clear I’m not an apologist for ai. Just someone who has learned about it. @grmpyprogrammer

@sarah @grmpyprogrammer Often I don't. :-) Certainly not at anything beyond the most superficial level. Like for my serializer, I know it should, um, serialize and deserialize. But the way in which I wanted it to be extensible, I didn't know until I was writing extensions.