Mastodawn

AI Leaks and News (@AILeaksAndNews)

ARC-AGI-3 벤치마크가 공개되었습니다. 이 벤치마크는 에이전트형 지능을 평가하며, 현재 Gemini 3.1 Pro, GPT-5.4, Claude Opus 4.6, Grok 4.2의 점수가 함께 제시되었습니다. 차세대 AI 모델의 추론·에이전트 성능을 가늠할 중요한 기준으로 보입니다.

https://x.com/AILeaksAndNews/status/2036884229133410723

#benchmark #agi #agenticai #gpt5 #gemini

AI Leaks and News (@AILeaksAndNews) on X

ARC-AGI-3 has been released The benchmark scores models on agentic intelligence Currently the labs score: Google’s Gemini 3.1 Pro: 0.37% OpenAI’s GPT-5.4: 0.26% Anthropic’s Claude Opus 4.6: 0.25% xAI’s Grok 4.2: 0% How many months until an AI model saturates this benchmark?

X (formerly Twitter)