Researchers reveal flaws in AI agent benchmarking

Princeton University researchers suggest fixes for common issues in benchmarking methods.

InfoWorld