We tested Claude, Gemini, Codex, Qwen, and MiniMax on real bug detection. The best model hit 53%. After adversarial debate, detection jumped to 80%.