Apple did the research; LLMs cannot do formal reasoning. Results change by as much as 10% if something as basic as the names change.

https://garymarcus.substack.com/p/llms-dont-do-formal-reasoning-and

LLMs don’t do formal reasoning - and that is a HUGE problem

Important new study from Apple

Marcus on AI
@ShadowJonathan not to sound antiintellectual, but isn't it kinda obvious that a *text* generator, no matter how complex, can't do abstract reasoning?
@halva @ShadowJonathan yeah, I appreciate the demonstrations, but this feels a little like, "New study confirms bicycles cannot fly."
@graue @halva @ShadowJonathan The record for human powered flight was accomplished on what is basically a bicycle with wings and a propeller attached. Some AI researchers believe that they can add the equivalent of wings and a propeller to an LLM and accomplish the equivalent
The technical term is multi-agent model.
@MartyFouts @halva @ShadowJonathan Ah yes, what a success story. What a useful and practical technology. Heading to the airport right now for my flight to Chicago on a bicycle with wings and propeller attached.

@graue @halva @ShadowJonathan You started with “can fly”. But sure move the goalposts to “can carry commercial passenger traffic” to avoid the point of the analogy extension. 😉

Have a safe flight and be sure to tip the pilot.

@MartyFouts Point taken: my analogy was too generous to LLMs.