I am really confused in the field of machine learning right now. Like really. This is the field that likes to gate keep other fields who are not "technical" or whatever, and we're seeing papers advertising evaluation datasets that are the outputs of other models.

Like machine translation training and evaluation datasets that are the outputs of other machine translation systems.

What happened to the BASIC concept of not testing on your training set?

Or anything related to learning theory?

@timnitGebru I’m seeing this lack of rigor in the industry, as well. Reproducibility used to be top of the list. Not even a concern, anymore.

I hope it changes, soon.