ELI5. Limit of current gen AI/LLMs

https://lemmy.world/post/43760792

ELI5. Limit of current gen AI/LLMs - Lemmy.World

I have some data science background, and I kinda understand how LLM parameter tuning works and how model generates text. Simplifying and phrasing my understanding, an LLM works like - Given a prompt: Write a program to check if input is an odd number (converts the prompt to embedding), then the LLM plays a dice game/probability game of: given prompt, then generate a set of new tokens. Now my question is, how are the current LLM’s are able to parse through a bunch of search results and play the above dice game? Like at times it reads through say 10 URLs and generate results, how are they able to achieve this? What’s the engineering behind generating such huge verbose of texts? Cause I always argue about the theoretical limitations of LLM, but now that these “agents” are able to manage huge verbose of text I dont seem to have a good argument. So what exactly is happening? And what is the limit of AI?

An LLM reads the previous prompts and replies, plus any base prompts. This is considered the context window. Don’t ask me why its not infinite.

The machine will then generate text following the previous text that continues the spirit and intent of the previous text, based on other texts previously digested into weights.

Its the same thing as your phones autocomplete but with a few gigabytes of weights instead of a few kilobytes.

If the data its working with is larger than the context, it will lose it. Theres a chance it’ll halucinate anyway because the text generator later in the text is non-deterministic. Say you’re working with insurance data. Maybe your data is familiar enough to data it previously injested data. So now it starts using wrong data, but it “feels” right as far as the LLM is concerned, because its a text generator, not a truth checker.

You can ask it to look again but its just generating fresh tokens while the context gets more polluted.

Just start looking at the volumes of non-trivial psuedo-information it generates and just try to verify some of the facts it states about your data.

It’s fundamentally not the same thing as autocomplete. Give autocomplete all the data an LLM has, every gig, every terabyte if it, and it still won’t be an LLM. Autocomplete lacks the semantic meaning layer as well as some other parts. People say it’s nothing but autocomplete from a misunderstanding of what a reward function does in backpropagation training (saying “the reward function is to predict the next word” is not even close to the equivalent of “it’s doing the same thing as autocomplete”)

I’m writing this short reply with hopes that when I have more time in the next two days or so I’ll come back with a more complete explanation, (including why context windows have to be limited).