I am really confused in the field of machine learning right now. Like really. This is the field that likes to gate keep other fields who are not "technical" or whatever, and we're seeing papers advertising evaluation datasets that are the outputs of other models.

Like machine translation training and evaluation datasets that are the outputs of other machine translation systems.

What happened to the BASIC concept of not testing on your training set?

Or anything related to learning theory?

@timnitGebru I have been seeing this too! And then when someone calls them on it, they reply with a bunch of complicated-looking formulas to explain why "statistically, if you take all this fancy math into account, it's totally OK to do this". Ughhhh

Definitely makes me think of that law about how "the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it."

@timnitGebru I’m seeing this lack of rigor in the industry, as well. Reproducibility used to be top of the list. Not even a concern, anymore.

I hope it changes, soon.

@timnitGebru What is the potential for building AI-generated potemkin villages?

@timnitGebru and now we come to the root of the matter — and what other disciplines have been pointing out for years. Unlike mathematics, biology, chemistry, or physics, “computer science” isn’t actually science — not on the applied side anyway. It’s a bunch of code we throw together that we loosely model on established science (see neuroscience) and then claim total expertise as though “computer science” has some actual scientific method.

“AI” may finally break the illusion.

@timnitGebru

I'm pretty sure this is Testing in Prod.

@timnitGebru Yep, data leakage is everywhere. Personal data is everywhere.

I keep a set of tasks / prompts I'm sure would be very unlikely to come up from somebody else (not saying it's impossible, but I find it unlikely for several reasons) - all JUST for the purpose of heuristically validating (language) models myself. Not gonna share or publish. Neither to brag, nor to hype.

Simply, because I know these days that everything on the web will be polluting the models.

@timnitGebru it's funny how money changes situations