샤오미가 MiMo-V2-Pro를 공개했다. 인텔리전스 지수 49로 GLM-5(50)와 Kimi K2.5(47) 사이에 자리하며 이전 공개작 MiMo-V2-Flash(41)보다 성능이 향상됐다. 가중치는 미공개·자사 API 전용, 1M 토큰 컨텍스트·텍스트 전용. GDPval-AA Elo 1426, AA-Omniscience +5로 환각률이 낮고 토큰 효율·비용(약 $348)도 경쟁력 있음.

https://x.com/ArtificialAnlys/status/2034239267052896516

#xiaomi #mimov2pro #ai #reasoning #benchmarks

Artificial Analysis (@ArtificialAnlys) on X

Xiaomi has released MiMo-V2-Pro, which scores 49 on the Artificial Analysis Intelligence Index, placing it between Kimi K2.5 and GLM-5 @Xiaomi's MiMo-V2-Pro is a new reasoning model and a significant upgrade over their prior open weights release, MiMo-V2-Flash (309B total / 15B

X (formerly Twitter)
Von der Leyen promises ETS tweaks in ‘days’

The Commission president also floated a €30 billion “ETS investment booster” to help industry decarbonize.

POLITICO

Brie Wensleydale (@SlipperyGem)

샤오미의 MiMo 모델들이 발표되었다는 소식입니다. 작성자는 직접 테스트는 못 했지만 벤치마크 결과와 샤오미의 우수한 제품 이력을 근거로 이 모델들이 경쟁력이 있을 것으로 예상한다는 평가입니다.

https://x.com/SlipperyGem/status/2034469797069779244

#xiaomi #mimo #aimodels #benchmarks

Brie Wensleydale🧀🐭 (@SlipperyGem) on X

The Xiaomi MiMo models have dropped. I don't have the time to test it, but judging from their benchmarks, and Xiaomi's impeccable rerecord of delivering amazing products (I'm Xiaomi Gang), I'm sure these models will be a contender in some way

X (formerly Twitter)
🔮 Behold, a revolutionary tome unveiling the mystical art of... splitting data sets! 🎩✨ Dive into a world where machine learning geniuses compete in a bizarre contest of who-can-overfit-the-best, and where #benchmarks are the sacred cow 🐄 that everyone loves to hate but won't stop worshipping. Spoiler: it's #groundbreaking, like discovering water is wet. 💧🤯
https://mlbenchmarks.org/00-preface.html #dataScience #machineLearning #overfitting #techHumor #HackerNews #ngated
Preface - The Emerging Science of Machine Learning Benchmarks

Preface - The Emerging Science of Machine Learning Benchmarks

Ivan Fioravanti ᯅ (@ivanfioravanti)

Claude Code, Codex, OpenCode, Droid 등 여러 코드 생성 모델 간에 산출물 품질 차이가 매우 크다고 지적하며, 이러한 '밤과 낮' 수준의 차이를 확인하기 위해 더욱 체계적인 벤치마크가 필요하다고 촉구하고 있습니다.

https://x.com/ivanfioravanti/status/2034279147090870323

#codegeneration #benchmarks #claude #codex #droid

OpenAI's GPT-5.4 mini reaches 94% of flagship performance at 70% lower cost, while nano cuts prices 92%. The gap between premium and budget AI models continues shrinking - mini trailed by 12 points on coding tests seven months ago, now just 3 points behind.

Meanwhile Google removed paywalls from Personal Intelligence, giving free US users Gemini access to Gmail and other personal data.

The race toward cheaper intelligence accelerates.

https://www.implicator.ai/openai-cuts-prices-70-google-reads-your-gmail-for-free-2/

#AI #pricing #benchmarks

OpenAI Mini Cuts Prices 70%; Google Opens Personal AI Free

GPT-5.4 mini hits 94% of flagship benchmarks at 70% less. Google drops paywall on Personal Intelligence for all free US users.

Implicator.ai

Logan Kilpatrick (@OfficialLoganK)

기존 벤치마크에서 AI가 성능 포화(saturation)를 보이므로 더 엄격한 벤치마크가 필요하다고 설명하며, 모델을 학습(learning), 메타인지(metacognition), 주의(attention), 집행기능(executive functions), 사회적 인지(social cognition) 등 여러 인지적 차원으로 평가할 새로운 벤치마크 개발을 제안합니다.

https://x.com/OfficialLoganK/status/2033978256454504915

#benchmarks #evaluation #cognition #agi

Logan Kilpatrick (@OfficialLoganK) on X

AI continues to saturate most benchmarks, so we need new ones which hold a rigorous bar. Help us measure models along the following dimensions: learning, metacognition, attention, executive functions, and social cognition. https://t.co/81pWpVgfmL

X (formerly Twitter)

Logan Kilpatrick (@OfficialLoganK)

AGI(특히 인지 능력) 진척을 측정하기 위한 벤치마크 공모를 @kaggle에서 진행한다고 안내하며 총상금 20만 달러가 걸려 있다고 알림. 참가자들이 Kaggle에 AGI 관련 인지 능력 평가용 벤치마크를 제출해 모델의 인지적 진보를 객관적으로 측정하도록 유도하는 캠페인 안내 내용입니다.

https://x.com/OfficialLoganK/status/2033978254344786351

#kaggle #agi #benchmarks #evaluation

Logan Kilpatrick (@OfficialLoganK) on X

Help us measure the progress towards AGI (specifically cognitive capabilities) by building benchmarks on @kaggle, with $ 200K in prizes available! Details in 🧵

X (formerly Twitter)

Hey Pythonistas !

I'm looking for tips for handmade benchmarks. of course i think about time() use, for the time execution, but did you have other suggestions of methods or statements to use?

#python #software #benchmarks #engineer