More! More! More! Tech Workers Max Out Their A.I. Use.

At a number of companies, employees compete on leaderboards to show how much A.I. they’re using. They’re racking up big bills along the way.

The New York Times

Vision AI Checkup은 51개 비전·언어 모델을 수십 개의 실사용 과제(총 89개 프롬프트)로 평가한 리더보드입니다. 1위는 Gemini 3.1 Pro(84.1%)이며 Gemini 3 Flash, Gemini 3.1, OpenAI O4 Mini/GPT-5.4 등이 상위권에 포진해 있습니다. 벤치마크·평가 코드는 오픈소스이며 누구나 프롬프트를 추가해 기여할 수 있습니다.

https://visioncheckup.com/

#ai #vision #multimodal #models #leaderboard

Vision AI Checkup

See how LLMs, foundation models, and VLMs do on vision tasks.

How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

https://dnhkng.github.io/posts/rys/

#HackerNews #HuggingFace #LLM #Leaderboard #Gaming #GPUs #AI #Research #Machine #Learning

LLM Neuroanatomy: How I Topped the LLM Leaderboard Without Changing a Single Weight

ML, Biotech, Hardware, and Coordination Problems. Sometimes I write about hard problems and how to solve them.

David Noel Ng