📰 OpenAI Discloses Flawed Test Cases Undermine SWE-bench Verified Benchmark
OpenAI has officially withdrawn support for the SWE-bench Verified benchmark after discovering that at least 16.4% of its test cases contain critical flaws, rendering them unreliable for evaluating advanced AI coding systems. The revelation has sparked widespre...





