Mastodawn

Do LLMs Benefit from Their Own Words?🤔

In multi-turn chats, models are typically given their own past responses as context.
But do their own words always help…
Or are they more often a waste of compute and a distraction?
🧵
arxiv.org/abs/2602.24287
#AI

Show thread

Leshem Choshen Mar 9

We compare two context setups on real-world multi-turn chats (WildChat, ShareLM): Full Context (user + assistant turns) vs Assistant-Omitted (user only).

Surprisingly, removing past assistant turns largely preserves answer quality while reducing context lengths by up to 10×

Show thread

Leshem Choshen Mar 9

Why might this be?

We find that many “multi-turn” conversations aren’t truly dependent across turns.

In our data, 36.4% of user turns are self-contained prompts, even when they appear mid-conversation.

Show thread

Leshem Choshen Mar 9

Even follow-up prompts can be answered using user prompts alone.

ex:
User: “Give me a recipe for shepherd’s pie.”
Assistant: [recipe]
User: “Now make it vegetarian.”

The second prompt contains enough instruction for the LLM to answer from scratch using user prompts alone.

Show thread

Leshem Choshen Mar 9

Conditioning on past model outputs can introduce *context pollution*.

When earlier responses contain long, noisy reasoning traces, models may anchor to their own words or become confused, leading to bugs, hallucinations, incorrect formulas, and stylistic artifacts.

Show thread

Leshem Choshen Mar 9

Motivated by these findings, we design a context-filtering strategy that predicts when prior assistant responses are helpful. In settings where FC outperforms AO, selectively retaining assistant history preserves ~95% of full-context performance with only ~70% of the tokens.

Show thread

Leshem Choshen

We hope the findings motivate context management systems that more carefully weigh the consequences of storing past model outputs.

📍paper: http://arxiv.org/abs/2602.24287
@lchoshen.bsky.social @ramon-astudillo.bsky.social Tamara Broderick, Jacob Andreas

🇧🇷 To appear at the ICLR 2026 MemAgents Workshop

Do LLMs Benefit From Their Own Words?

Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.

arXiv.org