Mastodawn

MindGods 5d ago

Components of a Coding Agent

https://magazine.sebastianraschka.com/p/components-of-a-coding-agent

Components of A Coding Agent

How coding agents use tools, memory, and repo context to make LLMs work better in practice

Ahead of AI

Show thread

beshrkayali 5d ago

> long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info)

I think spec-driven generation is the antithesis of chat-style coding for this reason. With tools like Claude Code, you are the one tracking what was already built, what interfaces exist, and why something was generated a certain way.

I built Ossature[1] around the opposite model. You write specs describing behavior, it audits them for gaps and contradictions before any code is written, then produces a build plan toml where each task declares exactly which spec sections and upstream files it needs. The LLM never sees more than that, and there is no accumulated conversation history to drift from. Every prompt and response is saved to disk, so traceability is built in rather than something you reconstruct by scrolling back through a chat. I used it over the last couple of days to build a CHIP-8 emulator entirely from specs[2]. I have some more example projects on GitHub[3]

1: https://github.com/ossature/ossature

2: https://github.com/beshrkayali/chomp8

3: https://github.com/ossature/ossature-examples

Show thread

Yokohiii 5d ago

I like it a lot, I find the chat driven workflow very tiring and a lot of information gets lost in translation until LLMs just refuse to be useful.

How does the human intervention work out? Do you use a mix of spec and audit editing to get into the ready to generate state? How high is the success/error rate if you generate from tasks to code, do LLMs forget/mess up things or does it feel better?

The spec driven approach is potentially better for writing things from scratch, do you have any plans for existing code?

Show thread

beshrkayali

Thanks!

> How does the human intervention work out? Do you use a mix of spec and audit editing to get into the ready to generate state?

Yes, the flow is: you write specs then you validate them `ossature validate` which parses them and checks they are structurally sound (no LLM involved), then you'd run `ossature audit` which flags gaps or contradictions in the content, and from that it produces a toml build plan that you can read and edit directly before anything is generated. You can reorder tasks, add notes for the llm, adjust verification commands, or skip steps entirely. So when you run `ossature build` to generate, the structure is already something you have signed off on.

> The spec driven approach is potentially better for writing things from scratch, do you have any plans for existing code?

Right now it is best for greenfield, as you said. I have been thinking about a workflow where you generate specs from existing code and then let Ossature work from those, but I am honestly not sure that is the right model either. The harder case is when engineers want to touch both the code and the specs, and keeping those in sync through that back and forth is something I want to support but have not figured out a clean answer for yet. It's on the list, if you have any thoughts please feel free to open an issue! I want to get through some of the issues I am seeing with just spec editing workflow (and re-audit/re-planning) first, specifically around how changes cascade through dependent tasks.

Regarding success rate, each task requires a verification command to run and pass after generation and if it fails, a separate fixer agent tries to repair it using the error output. The number of retry attempts is configurable. I did notice that the more concise and clear the spec is the more likely it is for capable models to generate code that works (obviously) but that's what auditing is supposed to help with. One interesting case about the chip-8 emulator I mentioned above is that even mentioning the correct name of the solution to a specific problem was not enough, I had to spell out the concrete algorithm in the spec (wrote more details here[1]). But the full prompt and response for every task is saved to disk, so when something does go wrong one can read the exact prompt/response and fix-attempts prompt/response for each task.

I wrote more details in an intro post[2] about Ossature, if useful.

1: https://log.beshr.com/chip8-emulator-from-spec/

2: https://ossature.dev/blog/introducing-ossature/

Components of A Coding Agent

Writing a CHIP-8 Emulator from Spec | The Log