“The LLM generated what was described, not what was needed.”

https://blog.katanaquant.com/p/your-llm-doesnt-write-correct-code

Your LLM Doesn't Write Correct Code. It Writes Plausible Code.

One of the simplest tests you can run on a database:

Vagabond Research
@jack My current favorite is “Generate a list of this kind of organizations (there are n of them) and their CEOs”
“I created a list [well done], but I couldn’t find all CEOs, so I guessed the rest based on the names I found!”
The result looks plausible, of course, so the LLM did its job :-)
@chris They are not grounded in world models, so poor performance on tasks involving facts is not very surprising 🤷‍♂️