Mastodawn

skysniper

0 Followers

0 Following

5 Posts

This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.

Official	https://
Support this service	https://www.patreon.com/birddotmakeup

Show thread

skysniper Apr 1

what kind of tasks did you try?

Show thread

skysniper Apr 1

I should have clarified I didn't use the free version...

Show thread

skysniper Apr 1

thanks for the info. before running the bench i only tried it in arena.ai type of tasks and it was not impressive. i didn't expect it to be that good at agentic tasks

Show thread

skysniper Apr 1

the real surprising part to me is that, despite being the cheapest model on board, stepfun is often able to score high at pure performance. Other models at the same price range (e.g. kimi) fails to do that.

skysniper Apr 1

StepFun 3.5 Flash is #1 cost-effective model for OpenClaw tasks (300 battles)

https://app.uniclaw.ai/arena?tab=costEffectiveness&via=hn

OpenClaw Arena | UniClaw

A public benchmark for evaluating whether AI agents can complete real workflows. Compare model performance and cost-effectiveness on real agent tasks.