Mastodawn

David Karam Apr 12, 2023

I am really confused in the field of machine learning right now. Like really. This is the field that likes to gate keep other fields who are not "technical" or whatever, and we're seeing papers advertising evaluation datasets that are the outputs of other models.

Like machine translation training and evaluation datasets that are the outputs of other machine translation systems.

What happened to the BASIC concept of not testing on your training set?

Or anything related to learning theory?

Show thread

Abstract Tesseract Apr 12, 2023

@timnitGebru I have been seeing this too! And then when someone calls them on it, they reply with a bunch of complicated-looking formulas to explain why "statistically, if you take all this fancy math into account, it's totally OK to do this". Ughhhh

Definitely makes me think of that law about how "the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it."

Show thread

jack the nonabrasive Apr 12, 2023

@timnitGebru I’m seeing this lack of rigor in the industry, as well. Reproducibility used to be top of the list. Not even a concern, anymore.

I hope it changes, soon.

Show thread

Gershon Bialer Apr 12, 2023

@timnitGebru What is the potential for building AI-generated potemkin villages?

Show thread

Dave Spector Apr 12, 2023

@timnitGebru and now we come to the root of the matter — and what other disciplines have been pointing out for years. Unlike mathematics, biology, chemistry, or physics, “computer science” isn’t actually science — not on the applied side anyway. It’s a bunch of code we throw together that we loosely model on established science (see neuroscience) and then claim total expertise as though “computer science” has some actual scientific method.

“AI” may finally break the illusion.

Show thread

TracingVRL by A.J. Fish Apr 12, 2023

@timnitGebru Beware peer-review rings https://retractionwatch.com/2022/09/28/exclusive-hindawi-and-wiley-to-retract-over-500-papers-linked-to-peer-review-rings/

Show thread

SpaceLifeForm Apr 12, 2023

@timnitGebru

I'm pretty sure this is Testing in Prod.

Show thread

Fabian Transchel Apr 12, 2023

@timnitGebru Yep, data leakage is everywhere. Personal data is everywhere.

I keep a set of tasks / prompts I'm sure would be very unlikely to come up from somebody else (not saying it's impossible, but I find it unlikely for several reasons) - all JUST for the purpose of heuristically validating (language) models myself. Not gonna share or publish. Neither to brag, nor to hype.

Simply, because I know these days that everything on the web will be polluting the models.

Show thread

Jacob Kramer-Duffield Apr 12, 2023

@timnitGebru it's funny how money changes situations

Show thread

paul Apr 15, 2023

@timnitGebru