The Long Context
In "You Exist In The Long Context," Steven Johnson explores the advancements in large language models (LLMs), particularly the significant impact of long context windows. Johnson illustrates this progress by creating an interactive game based on his book, showcasing the LLM's ability to handle complex narratives and maintain factual accuracy. He draws a parallel between LLMs' short-term memory improvements and the case of Henry Molaison, a patient with severe memory impairment, highlighting how expanded context windows have overcome previous limitations. He ultimately argues that this enhanced contextual understanding allows for more sophisticated applications, including personalised learning and collaborative decision-making. Johnson concludes by discussing the potential for LLMs to become invaluable tools for accessing and integrating expert knowledge.
Limitations of Early Language Models like GPT-3
Early language models like GPT-3, while impressive for their time, exhibited a significant limitation: a limited context window. This meant they had a restricted short-term memory, analogous to the condition of patient H.M., who was unable to form new memories after a specific brain surgery.
GPT-3, introduced in 2019, had a context window of just over 2,000 “tokens”, equivalent to about 1,500 words. This was the maximum amount of new information that could be shared with the model. Exceeding this limit caused the model to "forget" information presented earlier in the conversation. It could follow short instructions based on its vast long-term memory (parametric memory) but struggled with extended narratives or explanations requiring the retention of information over a longer stretch of text. Essentially, interacting with GPT-3 was like having a conversation with someone who had to constantly be reintroduced to the topic because they couldn't retain information beyond a few sentences.
This limited context window resulted in several shortcomings:
- Conversational Incoherence:The inability to remember previous turns in a conversation made interactions with GPT-3 feel disjointed and repetitive. Users had to repeatedly provide context, leading to an unnatural flow.
- Increased Hallucinations: While GPT-3 possessed a vast knowledge base, its limited short-term memory made it prone to fabricating information, especially when the required information was not part of the immediate context.
- Inability to Handle Complex Narratives or Arguments: GPT-3 struggled to follow narratives or arguments that spanned beyond its limited context window. Understanding relationships between events and concepts spread across a large text was impossible, limiting its analytical capabilities.
The subsequent expansion of context windows in models like ChatGPT (which boasts an 8K context window, four times larger than GPT-3) marked a significant advancement in AI capabilities. These larger context windows facilitated more coherent conversations, reduced hallucinations, and allowed for a deeper understanding of complex narratives. However, it's essential to note that even with these advancements, AI models still do not possess human-like consciousness or sentience.
Impacts of Expanding AI Context Windows
The expansion of AI context windows has been a pivotal factor in the advancements of AI capabilities, going beyond simply increasing the size of training data or model parameters. This expansion has led to significant improvements across various aspects of AI functionality:
Long Context Windows vs. RAG
The advancements in long context windows have sparked a debate on the necessity of techniques like Retrieval Augmented Generation (RAG). While long context windows allow models to process and utilize vast amounts of context directly, RAG combines the retrieval of relevant information from external sources with the generative capabilities of LLMs. Here are some key applications and advantages of RAG:
The choice between long context windows and RAG significantly influences the overall performance of deep learning models in various real-world applications. RAG is significantly more scalable and cost-effective than long context windows because it only retrieves and processes the most relevant pieces of information, reducing the number of tokens that need to be processed. This approach minimizes computational costs and latency, making it suitable for high-volume queries and real-time applications.
Summary
In summary, long context windows improve LLM performance by allowing the model to process and retain more internal context without external retrieval. In contrast, RAG is an algorithmic retrieval technique that enhances LLMs by fetching relevant information from external sources. While long context windows cannot replicate the exact functionality of RAG, they can be used in conjunction with RAG to create a more powerful system. This combination allows the model to leverage the strengths of both approaches: the ability to process extensive internal context and the efficiency of selective external information retrieval.
Unlock the Future of Business with AI
Dive into our immersive workshops and equip your team with the tools and knowledge to lead in the AI era.
Get in touch with us


