Remember seeing something about GPT-4 doing well on standardized tests? It turns out it may have memorized the answers.
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
#gpt4 #AIHype #ThisIsWhyWeDontTestOnTheTrainingData
GPT-4 and professional benchmarks: the wrong answer to the wrong question

OpenAI may have tested on the training data. Besides, human benchmarks are meaningless for bots.

AI Snake Oil

@janellecshane @KevinMarks

ehhh memorize isn't even the right word. still anthropormophizing too much, dagnamit

@quinn @janellecshane @KevinMarks talk of storage as computer memory, as in RAM/ROM, is standard
@mapto @janellecshane @KevinMarks when is the last time you said that your computer memorized that file?
@quinn @janellecshane I might have said it memory mapped a file, but that's a different metaphor.
You could say that the LLM had already read the answers to those questions (again with the existing metaphor of reading data)
@KevinMarks @janellecshane It's just important to remember that these amazing new AIs are still closer to being very elaborate magic 8-balls than they are to being children, or even cats.