Mastodawn

Hackerman_uwu May 24, 2024

Is this real though? Does ChatGPT just literally take whole snippets of texts like that? I thought it used some aggregate or probability based on the whole corpus of tectonic was trained on.

Show thread

bionicjoey May 24, 2024

It does, but the thing with the probability is that it doesn’t always pick the most likely next bit of text, it basically rolls dice and picks maybe the second or third or in rare cases hundredth most likely continuation. This chaotic behaviour is part of what makes it feel “intelligent” and why it’s possible to reroll responses to the same prompt.

Show thread

sugar_in_your_tea May 24, 2024

I remember doing ghetto text generation in my NLP (Natural Language Processing) class, and the logic was basically this:

Associate words with a probability number - e.g. given the word “math”: “homework” has 25% chance, “class” has 20% chance, etc; these probabilities are generated from the training data

Generate a random number to decide which word to pick next - average roll gives likely response, less likely roll gives less likely response

Repeat for as long as you need to generate text

This is a rough explanation of Baysian nets, which I think are what’s used in LLMs. We used a very simple n-gram model (e.g. n words are considered for the statistics, e.g. “to my math” is much more likely to generate “class” than “homework”), but they’re probably doing fancy things with text categorization and whatnot to generate more relevant text.

The LLM isn’t really “thinking” here, it’s just associating input text and the training data to generate output text.

Show thread

Karyoplasma

Most LLMs use are transformers, in fact GPT stands for Generative Pre-trained Transformer. They are a different to Bayesian networks as transformers are not state machines, but rather assign importance according to learned attention based on their training. The main upside of this approach is scalability because it can be easily parallelized due to not relying on states.

Transformer (deep learning architecture) - Wikipedia