AVeriTeC (NeurIPS 2023): 4,568 real-world fact-checked claims, web-retrieved evidence, four-way labels, temporal-leak-free split.

Two structural gaps: gold answers are frozen but the retrieval surface isn't (two systems a year apart hit different Google), and the not-enough-evidence class rewards weak retrievers — predicting NEI when retrieval fails matches gold by coincidence.

https://benjaminhan.net/posts/20260507-averitec/?utm_source=mastodon&utm_medium=social

#Paper #Benchmark #FactVerification #NeurIPS #AI

AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web – synesis

A 4,568-claim fact-checking benchmark sourced from 50 real fact-checking organizations, with web-retrieved evidence, a 4-way verdict label including not-enough-evidence, and a temporal-leak-free split.

synesis