Mastodawn

Janelle Shane Mar 29, 2023

Remember seeing something about GPT-4 doing well on standardized tests? It turns out it may have memorized the answers.
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
#gpt4 #AIHype #ThisIsWhyWeDontTestOnTheTrainingData

GPT-4 and professional benchmarks: the wrong answer to the wrong question

OpenAI may have tested on the training data. Besides, human benchmarks are meaningless for bots.

AI Snake Oil

Show thread

Quinn Norton

@janellecshane @KevinMarks

ehhh memorize isn't even the right word. still anthropormophizing too much, dagnamit

Show thread

Martin Ruskov Mar 30, 2023

@quinn @janellecshane @KevinMarks talk of storage as computer memory, as in RAM/ROM, is standard

Show thread

Quinn Norton Mar 31, 2023

@mapto @janellecshane @KevinMarks when is the last time you said that your computer memorized that file?

Show thread

Kevin Marks Mar 31, 2023

@quinn @janellecshane I might have said it memory mapped a file, but that's a different metaphor.
You could say that the LLM had already read the answers to those questions (again with the existing metaphor of reading data)

Show thread

Quinn Norton Mar 31, 2023

@KevinMarks @janellecshane It's just important to remember that these amazing new AIs are still closer to being very elaborate magic 8-balls than they are to being children, or even cats.