Is this real though? Does ChatGPT just literally take whole snippets of texts like that? I thought it used some aggregate or probability based on the whole corpus of tectonic was trained on.
It does, but the thing with the probability is that it doesn’t always pick the most likely next bit of text, it basically rolls dice and picks maybe the second or third or in rare cases hundredth most likely continuation. This chaotic behaviour is part of what makes it feel “intelligent” and why it’s possible to reroll responses to the same prompt.

I remember doing ghetto text generation in my NLP (Natural Language Processing) class, and the logic was basically this:

  • Associate words with a probability number - e.g. given the word “math”: “homework” has 25% chance, “class” has 20% chance, etc; these probabilities are generated from the training data
  • Generate a random number to decide which word to pick next - average roll gives likely response, less likely roll gives less likely response
  • Repeat for as long as you need to generate text
  • This is a rough explanation of Baysian nets, which I think are what’s used in LLMs. We used a very simple n-gram model (e.g. n words are considered for the statistics, e.g. “to my math” is much more likely to generate “class” than “homework”), but they’re probably doing fancy things with text categorization and whatnot to generate more relevant text.

    The LLM isn’t really “thinking” here, it’s just associating input text and the training data to generate output text.

    Yeah I’m not an AI expert, or even really someone who studies it as my primary role. But my understanding is that part of the “innovation” of modern LLMs is that they generate tokens, which are not necessarily full words, but simply small linguistic units. So basically with enough training the model can learn to predict the most likely next couple of characters and the words just generate themselves.

    I haven’t looked too much into it either, but from that very brief description, it sounds like that would help to mostly make it sound more natural by abstracting a bit over word roots and considering grammar structures, without actually baking those into the model as logic.

    AI text does read pretty naturally, so hopefully my interpretation is correct. But it’s also very verbose, and can repeat itself a lot.

    Sounds quite similar to Markov chains which made me think of this story:

    thedailywtf.com/…/the-automated-curse-generator

    Still gets a snort out of me every time Markov chains are mentioned.

    The Automated Curse Generator

    It was 1999, and Brian's company's new online marketing venture was finally off the ground and making a profit using an off-the-shelf conglomeration of bits and pieces of various content management, affiliate program, and ad servers. Brian's team had hit all of the goals for the first funding tranche, and the next step was to use those millions of dollars to grow the staff from twelve to fifty, half of whom would be software developers working directly for Brian. The project was an $8 million, nine-month development effort to build, from the ground up, the best 21st-century marketing/e-commerce/community/ad network/reporting system mousetrap possible. Leading a team of twenty people was a big step up for Brian, so he buckled down, read management theory books, re-read The Mythical Man Month, learned the ins-and-outs of project management software, invested in UML and process training, and carefully pored over resumes to find the best candidates.

    The Daily WTF
    Yup, and I’m guessing LLMs use Markov chains, which are also a really old concept (the idea is >100 years old, and it’s used in compression algorithms like LZMA).
    Most LLMs use are transformers, in fact GPT stands for Generative Pre-trained Transformer. They are a different to Bayesian networks as transformers are not state machines, but rather assign importance according to learned attention based on their training. The main upside of this approach is scalability because it can be easily parallelized due to not relying on states.
    Transformer (deep learning architecture) - Wikipedia

    Back in my day, we called that “hard-mode plagiarism.” They can’t punish you if they can’t find a specific plagiarized source!
    This is not the model directly but the model looking through Google searches to give you an answer.