Mastodawn

This https://aclanthology.org/2024.eacl-long.5/ is a very important paper, published at #EACL2024. But the sad truth is that this would have been avoidable, if people would have followed well-known best practices in doing science: Avoid the hype, use local #llms in defined and controlled states. Reminds me of "Googleology is Bad Science" from 2010: https://aclanthology.org/J07-1010/

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

Simone Balloccu, Patrícia Schmidtová, Mateusz Lango, Ondrej Dusek. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.

ACL Anthology

Show thread

Nils Reiter Mar 21, 2024

In brief: The problem is that #ChatGPT has seen most of the samples in most of the benchmarks used. I.e., many evaluations involve testing on the training data.

Show thread

Nils Reiter Mar 21, 2024

Consequence: We have no idea whether OpenAI models really are as good as their position in the leaderboards claims.

Show thread

Hobson Lane Mar 21, 2024

@nilsreiter
Maybe we do. IMO this paper strong evidence that OpenAI is incapable of any semblance of #reasoning, #math, or #planning. GPT4 got ~10% accuracy on the easiest class of reasoning tests for humans (using LLMs' native language for representing reasoning, code). 0% accuracy on medium and harder problems for post 2021 tests that would be difficult for OpenAI to incorporate into their training data squeezing out other memorized solutions: https://arxiv.org/pdf/2312.02143.pdf

Show thread

Nils Reiter Mar 22, 2024

@hobs the paper I originally posted wasn't about GPT performance, but about our ability to evaluate it. I think the only sure way to control this with ChatGPT / GPT4 is to use each benchmark only once.

Show thread

Hobson Lane Mar 23, 2024

@nilsreiter
Indeed. I agree.

Show thread

Hobson Lane Mar 21, 2024

@nilsreiter
Reproducible language models and data science models, what a crazy concept.
#DataScience #CausalModels #science #ethics #honesty #PostTruth #capitalism #USAcademia
@BenjaminHan