Designing agentic loops
https://simonwillison.net/2025/Sep/30/designing-agentic-loops/
Designing agentic loops
https://simonwillison.net/2025/Sep/30/designing-agentic-loops/
I think this is a strictly worse name than "agentic harness", which is already a term used by open-source agentic IDEs (https://github.com/search?q=repo%3Aopenai%2Fcodex%20harness&... or https://github.com/openai/codex/discussions/1174)
Any reason why you want to rename it?
Edit: to say more about my opinions, "agentic loop" could mean a few things -- it could mean the thing you say, or it could mean calling multiple individual agents in a loop ... whereas "agentic harness" evokes a sort of interface between the LLM and the digital outside world which mediates how the LLM embodies itself in that world. That latter thing is exactly what you're describing, as far as I can tell.
I like "agentic harness" too, but that's not the name of a skill.
"Designing agentic loops" describes a skill people need to develop. "Designing agentic harnesses" sounds more to me like you're designing a tool like Claude Code from scratch.
Plus "designing agentic loops" includes a reference to my preferred definition of the term "agent" itself - a thing that runs tools in a loop to achieve a goal.
I think that's actually quite different.
Context engineering is about making sure you've stuffed the context with all of the necessary information - relevant library documentation and examples and suchlike.
Design the agentic loop is about picking the right tools to be provided to the model. The tool descriptions may go in the context but you also need to provide the right implementations of them.
Yeah, "connected" feels right to me.
Those decisions feel to me like problems for the agent harness to solve - Anthropic released a new cookbook about that yesterday: https://github.com/anthropics/claude-cookbooks/blob/main/too...
It boils down to information loss in compaction driven by LLM's. Either you could carefully design tools that only give compacted output with high information density so models have to auto-compact or organize information only once in a while which eventually is going to be lossy.
Or you just give loads of information without thinking much about it, assuming models will have to do frequent compaction and memory organization and hope its not super lossy.
I recently built my own coding agent, due to dissatisfaction with the ones that are out there (though the Claude Code UI is very nice). It works as suggested in the article. It starts a custom Docker container and asks the model, GPT-5 in this case, to send shell scripts down the wire which are then run in the container. The container is augmented with some extra CLI tools to make the agent's life easier.
My agent has a few other tricks up its sleeve and it's very new, so I'm still experimenting with lots of different ideas, but there are a few things I noticed.
One is that GPT-5 is extremely willing to speculate. This is partly because of how I prompt it, but it's willing to write scripts that try five or six things at once in a single script, including things like reading files that might not exist. This level of speculative execution speeds things up dramatically especially as GPT-5 is otherwise a very slow model that likes to think about things a lot.
Another is that you can give it very complex "missions" and it will drive things to completion using tactics that I've not seen from other agents. For example, if it needs to check something that's buried in a library dependency, it'll just clone the upstream repository into its home directory and explore that to find what it needs before going back to working on the user's project.
None of this triggers any user interaction due to running in the container. In fact, no user interaction is possible. You set it going and do something else until it finishes. The model is very much to queue up "missions" that can then run in parallel and you merge them together at the end. The agent also has a mode where it takes the mission, writes a spec, reviews the spec, updates the spec given the review, codes, reviews the code, etc.
Even though it's early days I've set this agent missions that it spent 20 minutes of continuous uninterrupted inferencing time on, and succeeded excellently. I think this UI paradigm is the way to go. You can't scale up AI assisted coding if you're constantly needing to interact with the agent. Getting the most out of models requires maximally exploiting parallelism, so sandboxing is a must.
This also feels like the structure that Sketch.dev uses --- it's asynchronous running in a YOLO mode in a container on a cloud instance, with very little interaction (the expectation is you give it tasks and walk away). I have friends that queue up lots of tasks in the morning and prune down to just a couple successes in the afternoon. I'd do this too but for the scale I work on merge conflicts are too problematic.
I'm working on my own dumb agent for my own dumb problems (engineering, but not software development) and I'd love to hear more about the tricks you're spotting.
This is great. I feel like most of the oxygen is going to go to the sandboxing question (fair enough). But I'm kind of obsessed with what agent loops for engineering tasks that aren't coding look like, and also the tweaks you need for agent loops that handle large amounts of anything (source code lines, raw metrics or oTel span data, whatever).
There was an interval where the notion of "context engineering" came into fashion, and we quickly dunked all over it (I don't blame anybody for that; "prompt engineering" seemed pretty cringe-y to me), but there's definitely something to the engineering problems of managing a fixed-size context window while iterating indefinitely through a complex problem, and there's all sorts of tricks for handling it.