Mastodawn

"Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%"

I've suspected this all along. Folks spending mucho-plenty time curating project-level .md files have been deluding themselves that it helps.

https://arxiv.org/abs/2602.11988

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md, by either manually or automatically generating them. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files following agent-developer recommendations, and a novel collection of issues from repositories containing developer-committed context files. Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%. Behaviorally, both LLM-generated and developer-provided context files encourage broader exploration (e.g., more thorough testing and file traversal), and coding agents tend to respect their instructions. Ultimately, we conclude that unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.

arXiv.org

Show thread

Joe Fabisevich

@jasongorman @joe The paper’s conclusion is subtly different than that. It says that auto-generated AGENTS.md provide no value, but a manually crafted one provides positive marginal returns.

The real takeaway should be:
- You should have project files like an AGENTS.md.
- You should use it to address real issues like “always compile using command xyz” to have the agent work the way you want, rather than auto-gen slop.
- If you do that your cost of inference also won’t shoot up 20%.

Show thread

Jason Gorman Mar 3

@mergesort @joe It's the "minimal requirements" doing the heavy lifting here. Only what the LLM needs for the task at hand.

Show thread

Joe Fabisevich

Mar 3

@jasongorman @joe Fair enough! I generally agree with that, though I will say my AGENTS.md is like 200-300 lines which doesn’t feel very minimal but is appropriately what I’ve needed to add as I’ve been working with AI in my codebase for the last 1-2 years.

Show thread

Jason Gorman Mar 3

@mergesort @joe Can you modularise it into task-specific files?

Show thread

Joe Fabisevich

Mar 3

@jasongorman @joe It depends. Most of these are around how I want the project itself to build and run, but I do have Skills to handle more granular things now. (Which I would describe as task-specific files.)

Show thread

Jason Gorman Mar 3

@mergesort @joe That's exactly what they. It's all just context to an LLM :-)

Show thread

Joe Fabisevich

Mar 3

@jasongorman @joe I think we agree! I write a bunch about this over at build.ms (like here: https://build.ms/2025/10/17/your-first-claude-skill), I was just noting a subtle distinction and earnestly wasn’t trying to start a debate over small differences. 😄

Your First Claude (and ChatGPT) Skill

Learn a new and powerful way to build software on-demand, with little more than a simple description. No code required.

Show thread

Jason Gorman Mar 4

@mergesort @joe That's exactly what they are. Anthropic presumably acknowledging here that big global contexts are not a good idea?

Show thread

Joe Fabisevich

Mar 4

@jasongorman @joe I don’t think there’s ever been much debate about that in the AI community, people have been trying to minimize token load since the earliest days of the ChatGPT API. There have been many intermediate solutions (MCP, RAG, etc), but this is a core reason why agentskills.io has become a pretty defacto standard from Claude to Codex to even OpenClaw.

Show thread

Jason Gorman Mar 4

@mergesort @joe You can't beat entropy 🙂