📰 OpenAI Discloses Flawed Test Cases Undermine SWE-bench Verified Benchmark

OpenAI has officially withdrawn support for the SWE-bench Verified benchmark after discovering that at least 16.4% of its test cases contain critical flaws, rendering them unreliable for evaluating advanced AI coding systems. The revelation has sparked widespre...

#AINews #AI #Teknoloji #MachineLearning #Haber

🔗 https://aihaberleri.org/en/news/openai-discloses-flawed-test-cases-undermine-swe-bench-verified-benchmark