"The bottom line from Apple’s research is stark: we’re not witnessing the birth of AI reasoning.

We’re seeing the limits of very expensive autocomplete that breaks when it matters most."

Damning proof from Apple researchers that the hype from big tech surrounding #AI is an expensive illusion.

https://medium.com/@ninza7/apple-just-pulled-the-plug-on-the-ai-hype-heres-what-their-shocking-study-found-24ad42c234a0

#tech

Apple Just Pulled the Plug on the AI Hype. Here’s What Their Shocking Study Found

New research reveals that today’s “reasoning” models aren’t thinking at all. They’re just sophisticated pattern-matchers that completely…

Medium
@eosfpodcast I'm actually glad that Apple is taking such a measured approach in this area, even if it's affecting their stock price negatively (full disclosure... I own a few shares). I still wonder exactly how it's going to really improve our lives. Apple seems to be compartmentalizing the capabilities to small, useful features, many of them not even obviously “AI”. Seems pretty sensible to me. I don't need deep fakes, or a machine writing things for me. Let me do the art and humanities stuff, and make an AI that will do my laundry, or clean my home. 

“AI -- Fun until it's not”
@eosfpodcast uhm, told you so?
@mirabilos @eosfpodcast I read the summary and was immediately left thinking, "Shocking to whom, exactly?"
@eosfpodcast @danirabbit Since you have to log in to Medium to actually read the article, and the intro text and title seem like clickbait-y hype, so I was skeptical, here’s a link to the actual research: https://machinelearning.apple.com/research/illusion-of-thinking
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes…

Apple Machine Learning Research
A knockout blow for LLMs?

LLM “reasoning” is so cooked they turned my name into a verb

Marcus on AI
@danirabbit @ramsey I read this original paper this week. And it's very well researched and confirms most of my views on LLMs and RLMs.

@ramsey @eosfpodcast @danirabbit Thank you.

I can’t stand it when people post email-required links.

@eosfpodcast
If someone would like to see the text without login: https://archive.ph/ASo9a
@eosfpodcast this is some king's-new-clothes stuff right here.

@eosfpodcast It's why I don't use auto correction in messages and it's not even Ai. It made me lazy and not care about spelling because I was expecting system to fix it itself (I'm not English native, though how some natives use there and their wrong, maybe they should disable it too).

Ai chatbots are not much different, you ask them and they spit out some BS. People don't search and investigate things anymore to learn anything along the way, they just expect answer, correct or not.

@eosfpodcast TIL that Apple is to AI what Toyota is to BEV cars.

@eosfpodcast I see we're now in the "of *course* you can't expect this skill from an LLM, that's not what they're good at" stage of the discourse. This sort of remark is also popping up a lot in comments to people posting links to that article about how an old Atari console chess game from almost 50 years ago stomped ChatGPT recently.

This observation is of course totally correct. LLMs are genuinely not good for this sort of thing. They can't play chess very well (certainly not as well as just about any off-the-shelf dedicated chess program from anytime in the last several decades), and they can't actually reason about anything, instead just printing text that *looks* like what a reasoning person might do or think. And, of course, people critical of "generative AI" have been pointing this out for years.

Pointing this out is neither trite nor useless.

When OpenAI literally advertises "Learn something new. Dive into a hobby. Answer complex questions." as a sensible use case for ChatGPT (https://openai.com/chatgpt/overview/), people will expect behavior like this from an LLM.

When Google uses its LLM, Gemini, not just as a search adjunct (where it might be at least *somewhat* useful, since it could plausibly associate your request with actual web pages' text) but as a tool for creating factual summaries of information on the web, people will expect behavior like this from an LLM.

When the entire first *year* of ChatGPT hype claimed things like showing "sparks of artificial general intelligence" (https://www.microsoft.com/en-us/research/publication/sparks-of-artificial-general-intelligence-early-experiments-with-gpt-4/), people will expect behavior like this from an LLM.

When every vendor out there (Anthropic, OpenAI, Microsoft, you name 'em) aggressively markets their models as "reasoning" models, people will expect behavior like this from an LLM.

There are lots of people who have no clue about the limitations of this flavor of "AI", precisely because it has been -- and still is -- hyped so aggressively in counterproductive and misleading ways.

@eosfpodcast

The goal is not to provide answers and information

The goal is to DESTROY information.

@eosfpodcast Unfortunately, humans are pattern matchers as well. And, humans demonstrate a lack of reasoning ability constantly. Case in point, religion(s)! When the AI tells you it has faith in the nonsense, it is spewing, case closed!

@eosfpodcast This is why the EU shouldn't try to keep up with US companies and AI, but should focus on sovereignty,security, and privacy. Taking back control over our own data..

@kimvsparrentak @kimvsparrentak

@eosfpodcast I mean if anyone knows expensive illusion, it’s Apple. 🤣🤣

@eosfpodcast

Yup, that Google Collab that I just wrote using a simple prompt and a math equation to help a coworker understand a very difficult concept that would have probably taken me a whole day to write traditionally is an illusion.

@BlueBee @eosfpodcast that's not reasoning, though, that's just another example of the machine filling in details extrapolated from its training data.

The point of this study is to illustrate that the model has no *comprehension* of what its doing.

@xale @eosfpodcast

That's called moving the goal post.

@BlueBee @eosfpodcast How so? The paper explicitly discusses "reasoning" models, not generative behavior. I'm not saying that the model didn't do what *you're* claiming it did, only that you're not describing the same thing the paper is.

@xale @eosfpodcast

Okay so I'm reading the paper now.

So far it seems like they are saying... When you use a language model wrong, it performs poorly.

Which... Duh.

For example, you can't ask the language model itself to do a math equation. It will get it wrong, it's just not in its ability to do so. But you can ask a language model to write a program that solves the equation.

The model itself cannot do the calculation. This is known.

This is why modern models do some trickery on the back end where they use traditional tools to validate conclusions. Like having a calculator or a Python interpreter.

I'm just annoyed at everyone still thinking this stuff is a fad and it's going to go away. It's blindingly obvious it's not going away. I think we should be more focused on supporting those who will make models we can use instead of allowing two giant companies to have the whole thing to themselves that they then use to brainwash us all.

Because I would really love to run local models and build cool workflows for people instead of talking directly to Google and OpenAI.

Allowing two companies to have a monopoly on the thinking device you plug into your brain- that they can shut off, seems like a bad idea.

@eosfpodcast @BlueBee Great use case! And you were right there in case it went off track. But a very limited scope.

@eosfpodcast

This is clearly the beginning of a new era... Good. I just wait to see all the negative comments that Apple is cooked, lost it and will fade while the AI hypers are the one to be in the dustbin of history. I need a new phone and I might consider Apple now.

@eosfpodcast

"New research reveals that today’s “reasoning” models aren’t thinking at all. They’re just sophisticated pattern-matchers"

No. No, new research does not show that. Everyone who had the least bit of understanding already knew that this was what was going on. But I'm glad to hear that Apple has "realized it"

If anything I would say this is how they frame it, as to not seem like idiots to have bet so much in this horse for so long.

@eosfpodcast what is their commercial interest to post this? Do they imply that their own vomit emitters are better?
@eosfpodcast And yet they're using it for everything but autoclaret!
@eosfpodcast thankfully they didn't decide to try and bilk a few billion dollars from investors. Guess they figured they would lose more in the fallout.
@eosfpodcast But will they let me delete “Image Playground.app”? No.
@eosfpodcast I rarely agree with Apple, but well..... yes.
I hope this reaches more common folk as well, other than just us nerds :D