What does software engineering become when no one writes the code?

An OpenAI team shipped close to a million lines in five months with zero hand-written, all by Codex. It isn't model magic but engineering that moved up a layer: from writing code to building the environment the agent runs in, with legible logs, a repo as the system of record, and linter-enforced architecture. They warn it took heavy investment and won't generalize for free.

https://benjaminhan.net/posts/20260626-harness-engineering/?utm_source=mastodon&utm_medium=social

#AgenticSystems #OpenAI #AI

Harness Engineering – synesis

OpenAI’s Ryan Lopopolo reports on building a million-line product with zero hand-written code, where the engineering work moved from writing code to building the environment agents run in.

synesis

What if you searched over an agent's code, not just its prompts? Automated Design of Agentic Systems does exactly that: a meta agent writes new agent scaffolds in Python and keeps the ones that score well on a task. The discovered designs beat hand-built baselines, and they keep working on new tasks and new base models that the search never optimized for.

https://benjaminhan.net/posts/20260625-adas/?utm_source=mastodon&utm_medium=social

#AgenticSystems #Metacognition #LLMs #AI

Automated Design of Agentic Systems – synesis

ADAS casts the design of agent scaffolds as a search problem, and a meta agent programming candidates in Python finds designs that beat hand-built baselines across reading, math, and abstract reasoning.

synesis

How much of a coding agent is the loop, and how much is the machinery around it? A source-level read of Claude Code's TypeScript finds the loop almost trivial: call model, run tools, repeat. Almost all the code is around it: a deny-first permission gate, a five-layer compaction pipeline, four graduated extension mechanisms, isolated subagents. The open-source OpenClaw gateway answers the same questions differently.

https://benjaminhan.net/posts/20260623-claude-code-design-space/?utm_source=mastodon&utm_medium=social

#Paper #AgenticSystems #Claude #Coding #AI

Dive into Claude Code: The Design Space of Today’s and Future AI Agent Systems – synesis

A close reading of Claude Code’s published TypeScript maps its architecture: a simple agent loop wrapped in permission, compaction, extensibility, and subagent systems.

synesis

Where does an AI math agent get its ability, the model or the orchestration around it? In the first large-scale test of formal proof search on open problems, an agent closed 9 of 353 Erdős problems in Lean. In its own ablation, a plain generate-and-verify loop solved all nine, while smaller models and the specialized prover alone solved nothing.

https://benjaminhan.net/posts/20260606-ai-formal-proof-search/?utm_source=mastodon&utm_medium=social

#AI #AIforScience #Mathematics #AgenticSystems

Advancing Mathematics Research with AI-Driven Formal Proof Search – synesis

The first large-scale test of LLM-driven formal proof search on open problems, with an AlphaProof-equipped agent closing 9 of 353 Erdős problems and 44 of 492 OEIS conjectures in Lean.

synesis

Where should an agent run relative to its sandbox? LangChain shipped Deep Agents, a model-agnostic take on the harness pattern behind Claude Code. Its key difference from the Claude Agent SDK is exactly that question: Deep Agents can run outside the sandbox and drive it as a tool, so credentials stay off the sandbox and the security boundary wraps execution, not the agent. That decoupled shape is where production is heading.

https://benjaminhan.net/posts/20260529-langchain-deep-agents/?utm_source=mastodon&utm_medium=social

#AgenticSystems #AIEngineering #AI

LangChain’s Deep Agents: A Batteries-Included Agent Harness – synesis

LangChain packages the planning-plus-subagents-plus-filesystem pattern behind Claude Code into a model-agnostic library, and names a new layer of the agent stack to place it in.

synesis

If hundreds of subagents check each other and agree, is the answer right? Claude Code can now write an orchestration script that fans a task across them in parallel, with other agents refuting each finding until they agree. But convergence is not correctness. They share one base model and context, so they share blind spots, and wrong answers converge as cleanly as right ones. It only holds if the refuters are genuinely adversarial.

https://benjaminhan.net/posts/20260529-dynamic-workflows-claude-code/?utm_source=mastodon&utm_medium=social

#AgenticSystems #AIEngineering #AI

Dynamic Workflows in Claude Code – synesis

Claude Code can now write orchestration scripts that fan a task out across tens to hundreds of parallel subagents, check the findings against each other, and resume an interrupted run where it left off.

synesis

Should you pick a boring language like Go so coding agents write more reliable code? A widely-shared argument says low-variance ecosystems beat fragmented ones like JavaScript or Python. The likelier driver is corpus size, not variance: models are strongest on the most-represented languages, the fragmented ones. And betting on the median buys reliability where it was already cheapest, not on the unfamiliar work that costs time.

https://benjaminhan.net/posts/20260529-use-boring-languages-with-llms/?utm_source=mastodon&utm_medium=social

#AgenticSystems #SoftwareEngineering #AI

Use Boring Languages with LLMs – synesis

Coding agents produce more reliable output in cohesive language ecosystems like Go than in fragmented ones like JavaScript or Python.

synesis

What are the effective ways to use Claude Code in large codebases? Anthropic's recommended order to build up the harness is CLAUDE.md, hooks, skills, plugins, LSP, then MCP. The most common failure they see is reaching for MCP first, before the basics work. For outcomes, the harness matters more than the model swap.

https://benjaminhan.net/posts/20260528-claude-code-large-codebases/?utm_source=mastodon&utm_medium=social

#AI #AgenticSystems #SoftwareEngineering #Anthropic

How Claude Code Works in Large Codebases: Best Practices and Where to Start – synesis

Anthropic opens its “Claude Code at scale” series with patterns from successful enterprise rollouts: five harness extension points, the order to add them, and three configuration patterns for monorepos and multi-repo orgs.

synesis
🚀 Oh great, just what we needed—yet another "revolutionary" software stack from a self-proclaimed tech messiah. 🎉 Wes McKinney leads a "small team of veterans" to invent the wheel, again, but this time with *agentic systems* and lots of 🚀 emojis. Get ready for a wild ride of #buzzwords and imaginary breakthroughs! 🙄
https://kenn.io/ #techinnovation #softwaredevelopment #agenticsystems #WesMcKinney #HackerNews #ngated
Kenn Software

Agentic systems for what comes next.

Singapore Researchers Harmonize Diverse SIEMs with Agentic Rule Translation

Imagine having multiple Security Information and Event Management platforms working in perfect harmony - Singapore researchers have made this a reality by developing a game-changing approach called agentic rule translation, enabling seamless interoperability between diverse SIEMs.…

https://osintsights.com/singapore-researchers-harmonize-diverse-siems-with-agentic-rule-translation?utm_source=mastodon&utm_medium=social

#SiemInteroperability #AgenticSystems #SecurityInformationAndEventManagement #Singapore #ResearchAndDevelopment

Singapore Researchers Harmonize Diverse SIEMs with Agentic Rule Translation

Discover how Singapore researchers achieve SIEM interoperability with agentic rule translation and learn how to harmonize your security systems now.

OSINTSights