"Does this mean that complex pieces of software can now be built cheaply by AI? Not so fast:
The “single prompt” claim is misleading. The blog post says the operating system was built from a single prompt. But halfway through the post, Google discloses that the prompt “ended up being many thousands of lines” long. How many attempts did it take to generate the prompt? How specific were the instructions to the agent? Without these critical details, it is hard to know if the secret sauce is a better model or just more effort put into prompting the model. Moreover, the run was carried out on a scaffold1 with specialized roles, delegation to subagents, and an agent to detect and prevent cheating. In the launch post, Google views the scaffold as a product feature. But we don’t know whether the scaffold was overfit to this task of building an operating system from scratch, or whether it would perform as well on other complex software engineering tasks.
Google’s writeup is not explicit about what counted as human intervention. The post mentions that the final run to develop the operating system required “no additional guidance or corrections from a human.” But it does not define that standard. It describes infrastructure to kill and restart stuck agents. The post mentions an earlier run in which the agents appeared to cheat, after which the team added anti-cheating measures and re-ran the task. But it does not report dry runs as part of the methodology. Nor does it clearly say whether any agents escalated to a human, whether the final run required any manual restarts, approvals, or fixes, or how many retries it took until the agent was successful."
https://www.normaltech.ai/p/did-googles-ai-agents-really-build








