what kind of tasks did you try?
I should have clarified I didn't use the free version...
thanks for the info. before running the bench i only tried it in arena.ai type of tasks and it was not impressive. i didn't expect it to be that good at agentic tasks
the real surprising part to me is that, despite being the cheapest model on board, stepfun is often able to score high at pure performance. Other models at the same price range (e.g. kimi) fails to do that.
StepFun 3.5 Flash is #1 cost-effective model for OpenClaw tasks (300 battles)
https://app.uniclaw.ai/arena?tab=costEffectiveness&via=hn
OpenClaw Arena | UniClaw
A public benchmark for evaluating whether AI agents can complete real workflows. Compare model performance and cost-effectiveness on real agent tasks.