Components of a Coding Agent
https://magazine.sebastianraschka.com/p/components-of-a-coding-agent
Components of a Coding Agent
https://magazine.sebastianraschka.com/p/components-of-a-coding-agent
> This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code.
Unless I'm misunderstanding what's being described here, running Claude Code with different backend models is pretty common.
https://docs.z.ai/scenario-example/develop-tools/claude
It doesn't perform on par with Anthropic's models in my experience.
> It doesn't perform on par with Anthropic's models in my experience.
Why do you think that is the case? Is Anthropic's models just better or do they train the models to somehow work better with the harness?
It is more common now to improve models in agentic systems "in the loop" with reinforcement learning. Anthropic is [very likely] doing this in the backend to systematically improve the performance of their models specifically with their tools. I've done this with Goose at Block with more classic post-training approaches because it was before RL really hit the mainstream as an approach for this.
If you want to look at some of the tooling and process for this, check out verifiers (https://github.com/PrimeIntellect-ai/verifiers), hermes (https://github.com/nousresearch/hermes-agent) and accompanying trace datasets (https://huggingface.co/datasets/kai-os/carnice-glm5-hermes-t...), and other open source tools and harnesses.