Mastodawn

Rob Myers 🍵Mar 14

The pivotal session that is helping me see a path forward for AI-augmented test-driven development.

(Video is a bit rough 'n' ready, but packed with insights...er, once we got going...)

#AIdevelopment #LLMs #agentic #TDD #softwaredevelopment

https://youtu.be/Oz3KS9-Yohg

Scott and Rob use Claude Code to build an app using Test-Driven Development

YouTube

Show thread

Orchun Kolcu Mar 15

@RobMyers
Congratulations on the book! 🥳

I'm just at the point where it starts churning out code, but.. predictably it goes sideways from the get go.

You wouldn't write that coordinates story. If you wrote that test, that's not what you'd call it and you wouldn't write that implementation (with more code than the test demanded) and you definitely wouldn't have just a single test for that story...

EDIT: or maybe I'm completely wrong and it is doing a much better job off screen.. let's finish the video 😁

Show thread

Orchun Kolcu Mar 15

@RobMyers
Do you think the results would have been any different if you didn't ask for "TDD" but just asked it to write tests and make them pass?

Show thread

Rob Myers 🍵Mar 16

@orchun I’m certainly unsure of whether it followed Red-Green-Clean, but I seem to recall insisting it write tests first in some way. (Did we? I don’t recall.) This is exactly my problem with AI: I feel the need to review all tests, and it feels more difficult than crafting tests as we go. I’ll be looking more closely at clusters of tests in part 2.

Show thread

Orchun Kolcu Mar 16

@RobMyers
I've given up on TDD with it but there is a workflow I would like to try - it's the second post here:
https://hachyderm.io/@orchun/116217583464911995

Orchun Kolcu (@[email protected])

Now I'm seeing a lot of "LLMs are ok if you TDD and write the tests yourself" and I don't see how that works: - Writing your own tests in the typical TDD loop is clearly incompatible with agentic "go do this and tell me when done" coding. - Even when we exclude sub agents, TDD typically needs micro-tests for quick feedback. - If you are writing micro-tests, most of the time it will be faster to pass and refactor on your own then getting the LLM to do so. - The minority of time when it's productive to turn to the LLM doesn't remotely result in/resemble the workflow that people call AI coding. I suspect the common scenario out there is at best people writing acceptance tests and evaluating the results. That's not TDD, it's just the first part of outside-in TDD... Sooooo... 🤷 Seems like an effort to accommodate people who are already on the LLM bandwagon.

Hachyderm.io

Show thread

Rob Myers 🍵6d ago

@orchun I might be coming to similar conclusions. Kind of a “spec what you know, investigate with dialog and exploratory testing.” Though I liked having CC write unit tests because I can look for all the expected edge cases, in the future I think gherkin or something simple like that would suffice. We write (or suggest*) some, the AI writes some, we review those.

Ultimately we can’t give up test-driven *thinking*.

Show thread

Rob Myers 🍵

@orchun * after all, LLMs are good at parsing without a rigorous give/when/then. With context, we can instead say “oh, and also…” rather than writing gherkin ourselves.

Show thread

Orchun Kolcu 4d ago

@RobMyers
What I want to start with
is coding (vs gherkin) the acceptance tests myself (which would be subcatenous) so retain a modicum of control over the code design. And of course with very incremental spec/deliverables.

But my curiosity is really about whether mutability testing would practically work for forcing the LLM to generate exhaustive unit tests, and whether it (and I) can deal with managing the over specification of tests it would undoubtedly create.... After all, if I don't review the tests, all I'm gaining is regression control, not correctness.

To switch the topic a bit... I'm no conspiracy theorist but I think there is a very very real danger of becoming LLM dependent here.

Right now all the fervor towards LLM use assumes inference costs stay cheap. What happens when people/orgs are hooked and can't do without LLMs but the costs are now real?