StepFun (@StepFun_ai)

Step 3.5 Flash 모델이 MathArena에서 1위를 차지했으며 전체 점수 96.11%, AIME 2026 I에서 97% 성능을 기록했습니다. 런당 비용은 $0.40로, 11B 액티브 파라미터 규모의 모델이 높은 성능과 저비용을 동시에 보여준 사례입니다.

https://x.com/StepFun_ai/status/2021721309567221772

#step3.5 #matharena #llm #modelperformance #cost

StepFun (@StepFun_ai) on X

Step 3.5 Flash is now #1 on MathArena 🏆 96.11% overall. 97% AIME 2026 I. $0.40/run. not bad for an 11B active param model 😤 https://t.co/SaOMQ32hYO

X (formerly Twitter)
MathArena.ai

MathArena: Evaluating LLMs on Uncontaminated Math Benchmarks

🤖📉 "AI #struggles to make it past the #math playground, aiming for the Olympiad podium but barely earning a participation ribbon. 🎖️ Attempting to turn equations into entertainment, MathArena's latest brainwave is evaluating bots on math tests most humans cringe at. Maybe next time, they'll try teaching #AI to count its own errors first. 😂"
https://matharena.ai/imo/ #MathOlympiad #MathArena #TechHumor #ParticipationRibbon #HackerNews #ngated
MathArena.ai

MathArena: Evaluating LLMs on Uncontaminated Math Benchmarks

Gemini 2.5 gets 24.4% on MathArena USAMO beating previous top score of 4.7%

https://matharena.ai/

#HackerNews #Gemini2.5 #MathArena #USAMO #Score #TechNews #AI

MathArena.ai

MathArena: Evaluating LLMs on Uncontaminated Math Benchmarks