Why are LLMs so bad at being accurate? And why do people use them for research despite them being so inaccurate?

#LLM #AI #Research #Accuracy

https://www.christopher-james-hall.com/blog/llm-accuracy-and-implications-for-research

Just ask chat? Why are LLMs so inaccurate? — Platform Journalism

AI is a big deal at the moment. Grants are given to research that involves AI. Job ads are asking for AI knowledge or skills. And the first step in learning for many undergraduates, fresh out of high school, is to “just ask chat”. What’s more, the newest version of ChatGPT apparently has “ new PhD a

Platform Journalism

@Platform_Journalism This ain't rocket science. Current generation #LLMs have no semantic layer: they have no model of the world, and no concept of truth. When they are right, it's entirely by accident.

They generate sequences of tokens. The sequences of tokens they generate are probable, given the statistical distribution of similar sequences in texts they have ingested. But there is no concept of meaning there.

People use them in research because they also don't care about truth.

@Platform_Journalism As the writer says "Despite being a big deal, I can’t seem to find any good uses for LLMs." Exactly. They are automated deception. Nothing else. 😭🤬
@Platform_Journalism The best test of LLMs is to do exactly what you did: ask it about something you know well. Every type of misrepresentation and inaccuracy in that response is present in *all* responses to querys, you just don’t notice them because if you’re using an LLM to tell you about it you are presumably not knowledgeable about that topic.

@beasom @Platform_Journalism I had this same problem with the academic system. I thankfully did courses I was already trained in and had to point out the many problems with their content they were delivering.

Some courses I had to demand a refund after proving there was too many mistakes and I didn't want to continue into areas I didn't know due to the number of identified problems with their course material.

@Platform_Journalism The results are problematic but they not just singular, they are compounded I have found in my experiments. You have the underlying system of token generation being probabilistic with a temperature to create forced randomness becuase it creates more human like striings of text and is more believable.

On top of this you have a guiding principled prompt for most main stream systems that are more there for business purposes. So re-inforcing engagement and helpfulness as primary principles of what they deliver.

And further on top of this you have additional systems that may tweak both incoming and outgoing messages and controlling further injected context (you may not see) into the conversation. All together these pull away from particular elements that professionals should typically be after.

My view is that what is "professional" these days is more about feel than metric assessment. So LLMs are very good at create an emotional interpretation that many are happy enough with.

I have a recommendation for you to try if you haven't. Give Claude a try. It still has all the same issue, but I find Anthropic at least try to address some of these things better than ChatGPT or Gemini. They also at least try to do better research on demonstrating some of these concerns and what causes them.

Still far from perfect though.

As for what can they be used for. They are reasonable for being a rough sounding board. When I need to solve something technical like a function and it's syntax I will usually engage with it when it's something I'm not that familiar with to get ideas of things I should look for. I'd say I get a benefit like 1 in 5 things, but it's sometimes quicker than pushing through google. It's also better at summarising search results, which I can verify when I go to a link. So it has some uses.