Mastodawn

I recently built my own coding agent, due to dissatisfaction with the ones that are out there (though the Claude Code UI is very nice). It works as suggested in the article. It starts a custom Docker container and asks the model, GPT-5 in this case, to send shell scripts down the wire which are then run in the container. The container is augmented with some extra CLI tools to make the agent's life easier.

My agent has a few other tricks up its sleeve and it's very new, so I'm still experimenting with lots of different ideas, but there are a few things I noticed.

One is that GPT-5 is extremely willing to speculate. This is partly because of how I prompt it, but it's willing to write scripts that try five or six things at once in a single script, including things like reading files that might not exist. This level of speculative execution speeds things up dramatically especially as GPT-5 is otherwise a very slow model that likes to think about things a lot.

Another is that you can give it very complex "missions" and it will drive things to completion using tactics that I've not seen from other agents. For example, if it needs to check something that's buried in a library dependency, it'll just clone the upstream repository into its home directory and explore that to find what it needs before going back to working on the user's project.

None of this triggers any user interaction due to running in the container. In fact, no user interaction is possible. You set it going and do something else until it finishes. The model is very much to queue up "missions" that can then run in parallel and you merge them together at the end. The agent also has a mode where it takes the mission, writes a spec, reviews the spec, updates the spec given the review, codes, reviews the code, etc.

Even though it's early days I've set this agent missions that it spent 20 minutes of continuous uninterrupted inferencing time on, and succeeded excellently. I think this UI paradigm is the way to go. You can't scale up AI assisted coding if you're constantly needing to interact with the agent. Getting the most out of models requires maximally exploiting parallelism, so sandboxing is a must.

Designing agentic loops