When working on #goodenoughtesting, I keep three LLM tabs open: Claude, Codex, and Gemini.
I test each prompt against all three to catch where instructions break down. When they disagree, that's where I focus as I want to reliable generate tests.
Adding Amp to the mix soon.




