I just dreamed up an amusing way for things to fail using LLMs:
1. Deliverable is a test suite testing some new feature. Validate by looking at the logs to see if things passed.
2. LLM makes generates tests and generates passing logs.
3. Everybody signs off on it. Job well done.
4. Nothing works, of course, because the logs were generated by the LLM instead of by the test suite.
It would be hard to make this mistake, but it's amusing to think about...