Mastodawn

“Potemkin Understanding in Large Language Models”

A detailed analysis of the incoherent application of concepts by LLMs, showing how benchmarks that reliably establish domain competence in humans can be passed by LLMs lacking similar competence.

H/T @acowley

Link: https://arxiv.org/abs/2506.21521

Show thread

GLC Jun 28

@gregeganSF @acowley

"incoherent application of concepts"

Reminder that no concepts are involved. in a random (Markov) walk through word space. Shannon 1948.

From Pogo: "We could eat this picture of a chicken, if we had a picture of some salt."

Show thread

l'empathie mécanique Jun 28

@glc @gregeganSF @acowley But that is not the word space they're walking...

Show thread

GLC Jun 28

@dpwiz @gregeganSF @acowley

I suppose you must be referring to Pogo, which is not, for the present purposes, even a word space (or: not fruitfully treated as such).

Show thread

l'empathie mécanique Jun 28

@glc @gregeganSF @acowley no, the LLMs aren't operating in **word**-space.

Show thread

GLC

@dpwiz @gregeganSF @acowley

Are you trying to distinguish tokens and words?

Or do you have a point? If so, what is it?

Show thread

l'empathie mécanique Jun 29

@glc @gregeganSF @acowley No, bytes/tokens/words/whatever is irrelevant. The important part that's wrong in the "word-space" model is that it misses the context. The "language" part is a red herring. What's really going on is a tangle of suspended code that's getting executed step by step. And yes there are concepts, entities, and all that stuff in there.

Show thread

GLC Jun 29

@dpwiz

I'd say there is syntax without semantics (in the traditional sense of formal logic, that is).

You have some other view evidently.
That much is now clear.

I don't see much difference from Markov and Shannon, apart from some compression tricks which are needed to get a working system.

Show thread

l'empathie mécanique Jun 29

@glc Perhaps. I just hope this not another "X is/has/... Y" claim.
What's your favorite or most important consequence of this distinction?

Show thread

GLC Jun 29

@dpwiz

That no concepts are involved, and the numerous corollaries of that, I suppose. At least, that's what I find myself harping on now and then.

I have no strong interest in the details. though considerable interest in watching this play out.

—Someone like Cosma Shalizi is going to actually get into the weeds a bit more:
http://bactra.org/notebooks/nn-attention-and-transformers.html

You'll probably find much to agree with and much to disagree with there. And at adequate length.

"Attention", "Transformers", in Neural Network "Large Language Models"

Show thread

l'empathie mécanique Jun 30

@glc > I find this literature irritating and opaque.

That's a promising start! (8