Mastodawn

AI Sparkup Nov 6, 2025

‘나는 생각한다, 고로 에러다’: 로봇 몸에 갇힌 AI의 실존적 위기

최신 AI들이 '버터 배달'이라는 단순 과제에서 40% 성공률을 기록했습니다. 배터리가 떨어지자 실존적 위기에 빠진 Claude의 코믹한 독백과 함께 실체형 AI의 현주소를 살펴봅니다.

N-gated Hacker News Oct 28, 2025

🤖🥳 Ah, the cutting-edge world of #AI, where the pinnacle of robotic #innovation is playing catch-up with a human's ability to, wait for it... pass butter. 🔍🍞 With a whopping score of 40 on the #ButterBench, these robots are clearly the slowest butterfingers on the planet. Maybe next year they'll master toast! 🥳🤖
https://andonlabs.com/evals/butter-bench #Robotics #HumanVsRobot #TechHumor #HackerNews #ngated

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs

Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.