A jargon-free explanation of how AI large language models work

Want to really understand large language models? Here’s a gentle primer.

https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how-ai-large-language-models-work/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

A jargon-free explanation of how AI large language models work

Want to really understand large language models? Here’s a gentle primer.

Ars Technica
@arstechnica it's autocorrect, but more trained.
It doesn't "know", it doesn't "understand", it doesn't "think"..
It's just using probability (see statistics) to make sentences up (like autocorrect when it suggests other words as you type).
@rebeccafinn @arstechnica several times the article says that researchers don't understand how the LLM "does" something. I find that very strange. How could they not know what it is doing? Is this just the author not differentiating between outside researchers that don't have access to proprietary information and internal researchers working to build the LLM?
@cetan @rebeccafinn @arstechnica It's mathemagics created by neural networks, probably. They are known to create code which baffles humans, but somehow works :)
@arstechnica
This article about how Large Language models work is well worth reading.
@arstechnica @dangillmor I would respectfully suggest that if an article uses “vector”, it’s not jargon-free (at least for most audiences).
@arstechnica
The article is 90% usable if one is trying to understand the LLM, By the end. When it starts to discuss GPT-3 performance the author radically pivots to examples of users anthropomorphizing the LLM model. One example even compares GPT-3 to 3 to 7/ year old kids. Kinda killed the whole thing for me.
A logical explanation of LLM structures suddenly morphs into witchcraft.