Today I published CAST (Coordinated Agent Swarm Testing): a methodology that uses 100 coordinated LLM agents to substitute for human beta testing in pre-launch quality assurance.

Full paper on Zenodo, code open source on GitHub.

Thread with details ↓
#AIAgents #LLMs #OpenScience
Case study: applied CAST to GingerPen, a 12,500-line React/TypeScript app across 54 source files.

● 100 agents in 5 specialized squads
● 63 minutes runtime
● $7.97 total cost
● 772 raw findings, 687 unique after deduplication
● Approximately 1% of the typical human beta testing cost ($2,000 to $10,000)

#SoftwareTesting #MultiAgentSystems
CAST: Coordinated Agent Swarm Testing as a Substitute for Human Beta Testing in Web Application Development

Traditional human beta testing remains a costly, time-intensive, and logistically complex phase of the software development lifecycle. Recruiting 10 to 30 testers, managing feedback cycles over two to four weeks, and triaging unstructured reports typically costs between $2,000 and $10,000 per release while still leaving significant defect classes undiscovered. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code comprehension, reasoning about software behavior, and generating structured analytical output. This paper introduces CAST (Coordinated Agent Swarm Testing), a methodology that substitutes coordinated swarms of LLM-powered agents for human beta testers. CAST organizes 100 independent AI agents into five specialized squads of 20, each agent instantiated with a unique persona that defines its expertise, behavioral archetype, and testing focus. Agents perform static code analysis against the full source of a target application and return structured findings in a controlled JSON schema. An aggregation layer deduplicates findings using content fingerprinting and computes cross-squad confidence scores. We present a case study applying CAST to GingerPen, a 12,500-line React/TypeScript book formatting platform spanning 54 source files. The framework completed a full 100-agent analysis in approximately 63 minutes at a cost of $7.97, producing 772 raw findings that reduced to 687 unique issues after deduplication. Findings spanned accessibility, error handling, state management, security, performance, and data integrity categories. We compare CAST against estimated human beta testing costs and timelines and find that it achieves comparable defect coverage for code-level issues at approximately 1% of the cost. We discuss the methodology's strengths in systematic coverage and reproducibility, its limitations in visual and runtime testing, and future directions including browser-automated visual agents, multi-model ensembles, and CI/CD integration. CAST is proposed not as a full replacement for human testing but as a cost-effective first pass that catches a substantial fraction of pre-launch defects, fundamentally shifting how development teams approach quality assurance.

Zenodo
Built as an independent researcher under Scaffold Studios, using Claude for both the agent runtime and the development workflow.

Open to discussion, feedback, and collaboration on AI agent systems and AI Trust and Safety more broadly.

#IndieResearch #AITrustAndSafety