How We Broke Top AI Agent Benchmarks: And What Comes Next

https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

Center for Responsible, Decentralized Intelligence at Berkeley

If only the blog itself wasn't written by AI?

>No reasoning. No capability. Just exploitation of how the score is computed.

shudder

I wonder what college freshman-level writing classes are teaching about writing voice and AI. The tell-tale patterns are pretty frustrating to read.
Whatever classes these guys took, they skipped the one on scientific misconduct.