Mastodawn

David August ❌👑4d ago

We knew, but the proof is nice.

"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"

The guess-the-next-words machines don’t actually understand anything.

https://nitter.poast.org/heynavtoor/status/2041243558833987600#m

#math #ai

Show thread

audioflyer79 3d ago

@davidaugust Ecosia AI gets it right. It looks like the paper referenced was published in 2025, so the research conducted prior. The models are all much better now. I’m no AI apologist, but I think any argument of “AI sucks because it’s not good at _____” is on tenuous ground and will be proven wrong as the models continue to improve. @Ecosia

Show thread

Alison Wilder 3d ago

@audioflyer79 @davidaugust I mean, it's worth noting that the LLMs have ingested that paper by now. : /

Show thread

audioflyer79 3d ago

@alisynthesis @davidaugust fair enough. I changed up the problem completely and added some reasoning and it did pretty well. It appears to be generating code to solve the math. The only thing it missed is that very unripe bananas are green, not yellow.

James picks 40 apples on Monday. Then he picks 35 lemons on Tuesday. On Wednesday, he picks half as many bananas as he did apples, but five of them were very unripe. How many yellow fruits does James have?

Show thread

Robotistry 1d ago

@audioflyer79 @alisynthesis @davidaugust
The correct mathematical response to your question is either a statement of uncertainty (I can't answer that because I don't know what color the apples are or how ripe the lemons and non-unripe bananas are) or a request for clarification (what kind of apples? are the lemons ripe? how ripe are the ripe bananas?).

The fact that it provides a guess indicates that it has correctly understood what *you want it to say*.

It's not doing math. It's playing "what does the user expect?"

Show thread

blterrible 1d ago

@Robotistry @audioflyer79 @alisynthesis @davidaugust

"What does the user expect" implies that it is aware of a user. This is also not really true. It emits a response matching the best fit to it's training data and query.

Show thread

David August ❌👑1d ago

@blterrible @Robotistry @audioflyer79 @alisynthesis quite right.

If we include human designers/programmers as if part of the artificial intelligence system in question, then business goals or interface concerns of “what does the user expect” would come into the process. But if we don’t logically or schematically incorporate the humans’ business goals as if part of the AI system in question, then yes: the AI system does not in fact actively know nor expect anything on behalf of the user.

Show thread

audioflyer79 1d ago

@davidaugust @Robotistry @alisynthesis @blterrible What’s really interesting to me about these conversations is not what we can say about what AI “knows” or “awareness” or “understanding,” but rather, what is says about humans and our need to “other” any intelligence competing with our own. We have no real understanding of what awareness, understanding, or consciousness is, we just know we have it. 1/

Show thread

audioflyer79 1d ago

@Robotistry @blterrible @alisynthesis @davidaugust …and anything non-human doesn’t have it because *reasons*. I believe consciousness/awareness/understanding is a continuum, not a binary, and that all of the failures and mistakes made by LLMs could just as easily be attributed to humans in another context. Or to put it another way, that the failures of LLMs are *human* failures, mostly because they are trained on human data. 2/

Show thread

audioflyer79 1d ago

@alisynthesis @davidaugust @blterrible @Robotistry and that the faults we attribute to LLMs (they’re only matching patterns to their training data, they’re only replying what the user expects) are really not all that different to how humans operate. Our brains are pretty much giant pattern matching association machines. Emergent properties we feel are there, like consciousness, have no provable basis 3/

Show thread

audioflyer79 1d ago

@Robotistry @alisynthesis @davidaugust @blterrible nor any way to prove any other creature, natural or synthetic, doesn’t have. The Turing Test goalposts will keep getting pushed back until we realize we’re not as special as we think we are. 4/

Show thread

audioflyer79 1d ago

@davidaugust @blterrible @alisynthesis @Robotistry Also, big tech sucks, the way AI is being developed and accelerated is ethically wrong, and AI may well do more harm to the world than good. 5/5

Show thread

David August ❌👑22h ago

@audioflyer79 @blterrible @alisynthesis @Robotistry “We have no real understanding of what awareness, understanding, or consciousness is…” Philosophy of Mind
disagrees. So does semiotics.

“…brains are pretty much giant pattern matching association machines,” nope. There is no evidence human intelligence is statistical frequency matching based, nor is any other organic intelligence.

The Turing test is a very specific litmus test, not a wide use test for the presence of intelligence.

Show thread

audioflyer79 20h ago

@davidaugust @Robotistry @alisynthesis @blterrible showing the limits of my ignorance. No, the brain is not statistical, but it does focus pretty heavily on pattern matching from a neuronal connection perspective, no? I mentioned the Turing test as a placeholder for any test we use to prove AI is “other.”

Show thread

David August ❌👑20h ago

@audioflyer79 @Robotistry @alisynthesis @blterrible ah, that makes some sense, Turning Test as a placeholder for any test we might apply.

Organic intelligences, like humans, do tend to excel at pattern recognition, but I’m not sure that is the core of what makes them intelligent. You’re right that organic intelligences are often stronger than many synthetic intelligences at pattern recognition, but I think pattern recognition is only part of what constitutes intelligence.

Show thread

Robotistry

@davidaugust @audioflyer79 @alisynthesis @blterrible One of the things humans are really, really good at is adapting to tools. (I suspect tool use and invention are more fundamental to our intelligence than pattern matching.)

This is one reason research into human-robot interaction is so challenging - the human will adapt their actions and expectations to the tools after just a handful of uses and won't be able to give good feedback about how hard or difficult it is to use or what change in performance they would expect.

Which means that because we have been trained in this particular form of call and response, the mere fact of the system treating a question like an elementary school math test may predispose the user to assume that they *intended* the model to treat it that way and not realize that their original intention was to gather a different kind of information about the system.

That's one thing that makes the Apple paper particularly nice - they managed to intentionally avoid human built-in post hoc rationalizations and focus on the specific question they wanted answered.

Show thread

audioflyer79 6h ago

@alisynthesis @davidaugust @Robotistry @blterrible except that the intention of the prompt isn’t at issue, I was talking about the format of the prompt. I wouldn’t expect an LLM to know my intention.
Very good and interesting point about tool use. I’d be curious to see studies comparing how ML might “play” with a tool to learn how it works vs humans.

Show thread

audioflyer79 6h ago

@davidaugust @blterrible @alisynthesis @Robotistry I appreciate you guys humoring me while I try to sound like I know anything. Very interesting topic and lots to think about.

Show thread

Robotistry 5h ago

@audioflyer79 @alisynthesis @davidaugust @blterrible You might like Dreamships and Dreaming Metal by Melissa Scott, if you can find copies. They explore both sides of the "is it sentient" question.

Show thread

David August ❌👑2h ago

@Robotistry @audioflyer79 @alisynthesis @blterrible “This is one reason research into human-robot interaction is so challenging - the human will adapt their actions and expectations to the tools after just a handful of uses and won't be able to give good feedback about how hard or difficult it is to use or what change in performance they would expect.” *chef’s kiss*