I'm looking for an article showing that LLMs don't know how they work internally

https://feddit.it/post/18191686

I'm looking for an article showing that LLMs don't know how they work internally - Feddit.it

I found the aeticle in a post on the fediverse, and I can’t find it anymore. The reaserchers asked a simple mathematical question to an LLM ( like 7+4) and then could see how internally it worked by finding similar paths, but nothing like performing mathematical reasoning, even if the final answer was correct. Then they asked the LLM to explain how it found the result, what was it’s internal reasoning. The answer was detailed step by step mathematical logic, like a human explaining how to perform an addition. This showed 2 things: - LLM don’t “know” how they work - the second answer was a rephrasing of original text used for training that explain how math works, so LLM just used that as an explanation I think it was a very interesting an meaningful analysis Can anyone help me find this?

I don’t know how I work. I couldn’t tell you much about neuroscience beyond “neurons are linked together and somehow that creates thoughts”. And even when it comes to complex thoughts, I sometimes can’t explain why. At my job, I often lean on intuition I’ve developed over a decade. I can look at a system and get an immediate sense if it’s going to work well, but actually explaining why or why not takes a lot more time and energy. Am I an LLM?

I agree. This is the exact problem I think people need to face with nural network AIs. They work the exact same way we do. Even if we analysed the human brain it would look like wires connected to wires with different resistances all over the place with some other chemical influences.

I think everyone forgets that nural networks were used in AI to replicate how animal brains work, and clearly if it worked for us to get smart then it should work for something synthetic. Well we’ve certainly answered that now.

Everyone being like “oh it’s just a predictive model and it’s all math and math can’t be intelligent” are questioning exactly how their own brains work. We are just prediction machines, the brain releases dopamine when it correctly predicts things, it self learns from correctly assuming how things work. We modelled AI off of ourselves. And if we don’t understand how we work, of course we’re not gonna understand how it works.

They work the exact same way we do.

Two things being difficult to understand does not mean that they are the exact same.

Maybe work is the wrong word, same output. Just as a belt and chain drive does the same thing, or how fluorescent, incandescent or LED lights produce light even though they’re completely different mechanisms.

What I was saying is that one is based on the other, so similar problems like irrational thought even if the right answer is conjured shouldn’t be surprising. Although an animal brain and nural network are not the same, the broad concept of how they work is.

What I was saying is that one is based on the other

Not in any direct way, no.