@adrianco @tastapod I took a look at tdd-guard recently and it's really useful if "strongly suggesting tdd process through prompting" isn't cutting it. Strongly recommend but non-trivial setup/token spend!

@adrianco I haven't been doing much with LLM tools tbh. Lots of lurking in chat groups where others are doing things though.
I haven't seen tdd-guard but it seems like it might help. In general I disagree with Claude's 'Phase 2'. I almost never use Gherkin feature files, I have a longstanding promise to myself to write up my reasons, but this reply thread might be that write-up! 1/n
@adrianco Almost all the BDD I've ever done uses JUnit or PyTest. I always structure my code examples (tests) as:
# Given
# When
# Then
It isn't dogma so much as how I think about design. I often start in the Then or When section writing a 'model client'. This tends to surface domain terms which then find their way into the production code.
This tells me what I need to set up in the Given section. This is especially fun in Java or Kotlin because the IDE fills in a lot of the blanks. 2/n
@adrianco I just keep hitting Cmd-Enter on the red squiggles until there aren't any left, and assign some dummy values. I'm sure an LLM could automate this but I have never felt it slowing me down. I tend to appreciate the thinking time.
So Gherkin, then.
About 100 years ago, I wrote a scenario runner in Ruby using RSpec, called RBehave.[1] It had a neat internal DSL inspired by RSpec but for G/W/T and steps. It quickly found its way into the RSpec core.
3/n
RBehave is a framework for defining and executing application requirements. Using the vocabulary of behaviour-driven development, you define a feature in terms of a Story with Scenarios that describe how the feature behaves. Using a minimum of syntax (a few “quotes” mostly), this becomes an executable and self-describing requirements document. BDD has been around in the Ruby world for a while now, in the form of the excellent RSpec framework, which describes the behaviour of objects at the code level. The RSpec team has focused on creating a simple, elegant syntax and playing nicely with other frameworks, in particular Rails and the Mocha mocking library.
@adrianco It had a setting that would render the scenario title and steps as plain text as they ran, which was pretty neat (this was pre-Markdown).
Some of the RSpec folks wondered whether you could round-trip this, and have plain text as an input. So in the spirit of 'we thought it would be easy' and several regexps later, plain text scenario runner was born. Then Aslak Hellesøy rewrote this to use a proper grammar parser, and Cucumber was born.
4/n
@adrianco Writing and running plain text scenarios was a hit, and Cucumber became the most downloaded automation tool in the world... for testers!
Testers would write these scenario files in plain English (or Norwegian or whatever) and they, or usually developers, would write the plumbing to automate them. Once there was a critical mass of steps, the testers could compose new plain text scenarios with little or no developer involvement.
Which sounds great, right?
5/n
@adrianco From a raw engineering perspective, you just introduced several layers of indirection, each using different technologies, to do something that you could literally do in a single line of Java or Python.
Each Scenario is made up of Steps (Givens, Whens and Thens). These are mapped to methods or functions in some target language using annotations containing regular expressions. So you have 3 or 4 different languages before you even start.[1]
6/n
@adrianco Remember, all of this could be a single line of Java in a // Given section!
The pitch is that steps are reusable. Yay! Except that you can't refactor plain text, to rename an evolving domain concept across tens or hundreds of scenarios, say, and anyway, testers don't think in terms of refactoring.
(Note, none of this has anything to do with BDD as such, which is about how the team communicates and collaborates to get work done.)
7/n
@adrianco So there are a number of pathologies that play out over time, which are my issue with using Cucumber / Gherkin for anything other than a very specific circumstance.
1. Proliferation of near-identical copy-pasta scenarios.
I have worked with clients who had literally thousands of 'BDDs' (yes, BDD as a plural noun meaning feature files), in Cucumber, SpecFlow (now ReqnRoll), and variants. These would take many hours to run so were either ignored or resented.
8/n
@adrianco With one client, we grouped and categorized several thousand BDDs and rewrote their core behaviour into PyTest tests using Requests (lovely HTTP library), and some SQL and AMQP libraries we found.
The testers learned enough Python to be dangerous, including things like helper functions and basic refactoring.
We ended up with a few hundred scenarios in well-structured PyTest tests which would run in a few minutes and provided way more confidence than 'the BDDs' ever did.
9/n
2. Shocking performance
Cucumber and friends run orders of magnitude slower than Just Writing Tests. You end up with scaffolding in the plumbing and plumbing in the scaffolding, with a twisty maze of steps all alike. And boy those stack traces! Heaven help you tracking things down when something fails, especially intermittently.
10/n
3. Scenarios and steps serve different audiences.
The business stakeholders generally care about the scenario title, 'The one where...'. They want to know that this case is one we have considered, and that we will Do The Right Thing when that scenario occurs. They glaze over as soon as you start describing the steps, especially the amount of setup in the Givens or the degrees of checking results in the Thens.
Testers often care about the steps, but not the implementation.
11/n
@adrianco And the developers would rather be anywhere else than deep in the weeds of BDD step definitions or their dependencies, or trying to figure out why the parameters aren't mapping correctly (it's always a typo, but no linting of course, because English).
4. It is easier to read in code
No, honestly. Even (especially?) for non-developers. I have had this conversation so many times with testers. 'We can read your code because it is full of domain terms doing sensible things!'
12/n
@adrianco It turns out that using DDD with domain-based, intention-revealing names and a consistent and well-curated domain model are far more versatile than 'You can write your steps in English!'
---
I'll pause here. I have a lot of time for the Cucumber folks. They are smart and invested and they care about (and get!) BDD. Sadly, 99% of the time Cucumber is just a tool for test automation. The scenarios are not a joint collaboration (see 'Different audiences'), just a different syntax.
/fin
@tastapod @adrianco Yes please.
You seem to have the same view as I do, which is why trying to use LLMs to write BDD acceptance tests always makes me facepalm. It misses the entire point of *talking to the customer*. It’s just (yet) another layer of abstraction away from that. The Drucker quote is spot on IMO.
Don’t get me wrong, exploring the idea is fine, but to me it just feels wrong and, to a degree, pointless (and that’s without considering the ethical aspects of using AI)
@adrianco @tastapod A valid experiment. But I immediately wonder what is being lost in the interaction
LLMs do not have understanding. They do not have a mental model. They cannot pick up inconsistencies. They do not have instinct. They cannot pick up on subtle signals that something is not quite right, or there's more. They won't query, or question, or delve deeper. They are simply stochastic parrots. Text extrusion machines. I think we need more
I await the conclusions with interest
@adrianco ooh, I never mentioned my 'very specific circumstance'. That's for tomorrow, then.
Yes, I suspect you could persuade Claude to write sensible, intention-revealing scenario tests with a G/W/T structure. That might be cool.
@adrianco The real message with this ramble, though, is that while LLMs could make a pretty decent fist of all that plumbing/scaffolding, you end up in the Peter Drucker situation of doing with great efficiency that which should never have been done at all.
Just write the PyTest! (Also, I <3 PyTest's fixtures. Such a lovely framework affordance.)
@adrianco I am convinced that the next significant shift in software development involves ML. I am equally convinced that LLMs are not the way and are just a massive shill, sucking all the oxygen (and investment dollars) out of the room.
Why not _start_ with the premise that we want to encode rules and heuristics and build a ML solution from that, rather than trying to persuade a forgetful, stochastic token prediction engine to do the job, in this case by throwing lots of them at it?