When billion-dollar AIs break ...
Recently Apple published a paper on LRMs (Large Reasoning Models) and how they found that “that LRMs have limitations in exact computation” and that “they fail to use explicit algorithms and reason inconsistently across puzzles.” I would consider this a death blow paper to the current push for using LLMs and LRMs as the basis for AGI. Subbaro Kambhampati and Yann LeCun seem to agree. You could say that the paper knocked out LLMs. More recently, a comment paper showed up on Arxiv and shared around X as a rebuttal to Apple’s paper. Putting aside the stunt of having Claude Opus as a co-author (yes, I’m not kidding), the paper in itself is a poor rebuttal for many reasons which we shall explore, but mainly for missing the entire point of the paper and prior research by AI researchers such as Professor Kambhampati.
"So, our argument is NOT "humans don't have any limits, but #LRMs do, and that's why they aren't intelligent". But based on what we observe from their thoughts, their process is not logical and intelligent."
https://garymarcus.substack.com/p/a-knockout-blow-for-llms?r=lw58&utm_medium=ios&triedRedirect=true
I know it feels like it is imminent that the #Singularity will happen. The more I use #LLMs and #LRMs, the more I’m convinced that it’s much farther away. Simulation of #AGI is in the near future, but it doesn’t mean it’s truly the singularity.
What happens the day after humanity creates AGI?
https://bigthink.com/the-future/what-happens-the-day-after-humans-create-agi/
Apple’s LLM study draws important distinction on reasoning models
It systematically probes so-called Large Reasoning Models (#LRMs) like Claude 3.7 and DeepSeek-R1 using controlled puzzles (Tower of Hanoi, Blocks World, etc.), instead of standard math benchmarks that often suffer from data contamination.
💻 **The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity**
“_We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles._”
🔗 https://machinelearning.apple.com/research/illusion-of-thinking
#AI #ArtificialIntelligence #LRMS #Technology #Tech #Thinking #Reasoning @ai
"In LRMs, the term “reasoning” seems to be equated with generating plausible-sounding natural-language steps to solving a problem, and the extent to which this provides general and interpretable problem-solving abilities is still an open question. The performance of these models on math, science, and coding benchmarks is undeniably impressive. However, the overall robustness of their performance remains largely untested, especially for reasoning tasks that, unlike those the models were tested on, don’t have clear answers or cleanly defined solution steps, which is the case for many, if not most, real-world problems, not to mention “ fixing the climate, establishing a space colony, and the discovery of all of physics,” which are achievements OpenAI’s Sam Altman expects from AI in the future. And although LRMs’ chains of thought are touted for their “human interpretability,” it remains to be determined how faithfully these generated natural-language “thoughts” represent what is actually going on inside the neural network in the process of solving a problem. Multiple studies (carried out before the advent of LRMs) have shown that when LLMs generate explanations for their reasoning, the explanations are not always faithful to what the model is actually doing.
Moreover, the anthropomorphic language used in these models may mislead users into trusting them too much. The problem-solving steps that LRMs generate are often referred to as “thoughts”; the models themselves tell us that they are “thinking” (...) According to an OpenAI spokesperson, “Users have told us that understanding how the model reasons through a response not only supports more informed decision-making but also helps build trust in its answers.” But the question is, are users building trust based mainly on these humanlike touches, when the underlying model is less than trustworthy?"
📗 AI
🔴 Four Predictions for AI in 2025
🪧 Plummeting Costs: AI inference costs are dropping sharply, making advanced models more scalable for enterprises.
🪧 Large Reasoning Models (LRMs): Models like OpenAI’s o1 enable deep reasoning and generate synthetic training data, accelerating innovation. 🧵