StepFun 3.5 Flash is #1 cost-effective model for OpenClaw tasks (300 battles)

https://app.uniclaw.ai/arena?tab=costEffectiveness&via=hn

OpenClaw Arena | UniClaw

A public benchmark for evaluating whether AI agents can complete real workflows. Compare model performance and cost-effectiveness on real agent tasks.

Yet when I tried it it did absymal compared to Gemini 2.5 Flash
what kind of tasks did you try?