Mastodawn

Yuchen Jin (@Yuchenj_UW)

PewDiePie가 코드 성능에서 Llama-4, DeepSeek v2.5, GPT-4o를 제쳤다고 주장하는 모델을 훈련했다고 밝힘. 해당 모델은 Qwen2.5-32B를 파인튜닝한 것으로, 주장된 우위는 단 하나의 벤치마크(Aider Polyglot)에서 나온 결과라 과대평가나 벤치마크 최적화 가능성(benchmaxxing)을 지적하는 내용임.

https://x.com/Yuchenj_UW/status/2027408009912357174

#pewdiepie #qwen2.5 #benchmark #gpt4o #llama4

Yuchen Jin (@Yuchenj_UW) on X

PewDiePie: “I trained a model that beats Llama-4, DeepSeek v2.5, and GPT-4o on coding.” Looking into it. It’s a fine-tuned Qwen2.5-32B, evaluated on ONE benchmark: Aider Polyglot. Peak benchmaxxing lol.

X (formerly Twitter)