Apple did the research; LLMs cannot do formal reasoning. Results change by as much as 10% if something as basic as the names change.

https://garymarcus.substack.com/p/llms-dont-do-formal-reasoning-and

LLMs don’t do formal reasoning - and that is a HUGE problem

Important new study from Apple

Marcus on AI
@ShadowJonathan not to sound antiintellectual, but isn't it kinda obvious that a *text* generator, no matter how complex, can't do abstract reasoning?
@halva @ShadowJonathan yeah, I appreciate the demonstrations, but this feels a little like, "New study confirms bicycles cannot fly."

@graue @halva @ShadowJonathan

Companies like OpenAI and their defenders claim generative AI can reason, learn, etc. We know it’s nonsense, but it’s still extremely important it gets called out.

@rubenerd @graue @halva @ShadowJonathan This is a huge problem. They are the experts. Hinton and Ilya claiming a function can have understanding. Why are they lying? Seems counter productive to scare the hell out of people. Well I know why Ilya is lying he just got $1bn...

@nf3xn @rubenerd @graue @halva @ShadowJonathan I doubt Hinton is lying although he’s probably wrong. There’s a problem in philosophy: is the mind separate from the body? If it’s not, then it should be possible to model the brain well enough to simulate thought processes (at least in principle.)

Computational physics tells us that there is a function that could perform the simulation and Hinton’s career is looking for it.

@MartyFouts @rubenerd @graue @halva @ShadowJonathan How can he be wrong? He does not understand what he has wrought? He is literally the pioneer. One must assume that they have a far better grasp of how it works than anyone. What you are talking about is a million times removed from these crude devices. We are quantum beings that collapse wave functions (allegedly).
@MartyFouts @nf3xn @rubenerd @graue @halva @ShadowJonathan there’s a huge gap between ā€œpossible in principleā€ and ā€œthis does it nowā€. A kite can fly, but being able to build a kite doesn’t mean you can build an airplane
@graue @halva @ShadowJonathan But imagine of the world's most awful people were pouring hundreds of billions of dollars into telling everyone bicycles can fly. Then you would need to spend resources refuting that. 🤬

@dalias @graue @halva @ShadowJonathan You’d think that people who own a bicycle can just check…

On a tangentially related note, flying bicycles are invented by future humanity ā€˜The Dark Forest’ personal flying vehicles in the form of helicopter backpacks. They’re ā€œbicyclesā€ in the sense that they’re two counter-rotating coaxially-mounted propellers. That’s actually not a bad idea. If only we poured billions of dollars into making that work.

@enoch_exe_inc @dalias @graue @halva

> You’d think that people who own a bicycle can just check…

does the emperor have no clothes? would people call him out on it?

@ShadowJonathan @enoch_exe_inc @dalias @graue @halva I would. Whether from a government or a corporation, I will not reject the evidence in front of my eyes.

@ShadowJonathan @enoch_exe_inc @dalias @graue @halva

I'm an avid cyclist. I'll be out riding most of today and tomorrow.

Technically ...
Some bicycles do fly:

https://www.youtube.com/watch?app=desktop&v=v9KJwOZ3frk

And many "mountain bikers" are known for getting a lot of "air"

https://www.youtube.com/watch?v=6s6-O054SXg&ab_channel=BrendanFairclough

But yes, generally, bicycles do *not* fly.

WITH NO ENGINE!! University Students Create a Human-Powered Aircraft

YouTube
@dalias @ShadowJonathan @halva @graue I'd prefer if the refutation was a bit more final/terminal, but yes.
@graue @halva @ShadowJonathan The record for human powered flight was accomplished on what is basically a bicycle with wings and a propeller attached. Some AI researchers believe that they can add the equivalent of wings and a propeller to an LLM and accomplish the equivalent
The technical term is multi-agent model.
@MartyFouts @halva @ShadowJonathan Ah yes, what a success story. What a useful and practical technology. Heading to the airport right now for my flight to Chicago on a bicycle with wings and propeller attached.

@graue @halva @ShadowJonathan You started with ā€œcan flyā€. But sure move the goalposts to ā€œcan carry commercial passenger trafficā€ to avoid the point of the analogy extension. šŸ˜‰

Have a safe flight and be sure to tip the pilot.

@MartyFouts Point taken: my analogy was too generous to LLMs.

@MartyFouts @graue @halva @ShadowJonathan

You got the story wrong, exactly like LLM wrong.

They had wing to fly, but needed speed so they added the bicycle.

With LLM we have text generation, when we will have a reasoning IA, we will add LLM to talk to us.

Like the bicycle that can't fly but can produce speed, LLM can't reason but can talk.

@Aedius @graue @halva @ShadowJonathan I didn’t say anything about how the device evolved, only describing its eventual state. So no, I didn’t get the story wrong.

But I see you do understand the underlying point: there are researchers who are taking the bicycle with wings approach, making the assumption that multi-agent methods will work around LLM limitations.

@graue @halva @ShadowJonathan yeah, but there's a lot of people out there saying bicycles can fly. Sometimes you gotta do the science just to have something on paper to smack the idiots with.

@graue @halva @ShadowJonathan

This is in the context of massive companies spending billions hyping bicycles as viable replacements for aircraft.

It's blindingly obvious it's all a lie, but the hype keeps making it onto the front page and people keep investing in it as if it was true. Airlines are talking about replacing their planes with bikes etc etc.

There are serious discussions (by people who should really know better) about how plane makers are no longer needed because bicycles exist. It makes no sense but there's so much money invested that no one wants to be the one to admit it.

@FediThing @graue @halva exactly this, and research like what Apple just did is basically figure out the lift performance of a bike, see that it doesn't exist, and pointing out for what it is; not a plane

@ShadowJonathan @graue @halva

The question is, what happens when such research conflicts with share price-juicing hype?

Do companies try to damp down the hype for the sake of long term sanity? Or do they go with the hype to get maximum juicing and bury any sceptical voices?