After hearing Sebastian Bubeck talk about the #SparksOfAGI paper today, I decided to give #GPT4 another chance.

If it can really reason, it should be able to solve very simple logic puzzles. So I made one up. Sebastian stressed the importance of asking the question right, so I stressed that this is a logic puzzle and didn't add anything confusing about knights and knaves.

Still, it gets the solution wrong.

Just for fun, here's a #KnightsAndKnaves version of the same puzzle.

It does no better.

(Actual solution: Only a knight can say "at least one of us is a knave", so 4 is a knight and indeed there is at least one knave. Since Jailer 2 says that Jailer 1 is a knight, Jailers 1 and 2 have to be the same type. If both are knaves, then jailer 3 must be a knight (otherwise 1's statement would be true). If both are knights, jailer 3 must be a knave. So the tiger is behind door 1 or 3. Open door 2.)

Ok, to be fair it's not systematically wrong. I ran the original, simple-language problem six times. Four times it incorrectly told me to open doors 2 and 3. Once it incorrectly told me to open doors 1 and 2. And once it correctly told me to open doors 1 and 3.

Even in this final case the logic itself was flawed.

Also to be fair, it does better than #GPT3.5, which concludes "In either case, you should not choose door 1, since we know that the tiger is not behind that door."