New blog post on the NeurIPS'21 experiment re authors' perceptions of their own papers!

https://blog.ml.cmu.edu/2022/11/22/neurips2021-author-perception-experiment/

Key findings:

1) Authors significantly overestimate their papers' chances of acceptance. By like a LOT.

>

How do Authors' Perceptions about their Papers Compare with Co-authors’ Perceptions and Peer-review Decisions?

Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan(NeurIPS 2021 Program Chairs) Charvi Rastogi, Ivan Stelmakh, Zhenyu Xue, Hal Daumé III, Emma Pierson, and Nihar B. Shah There is a considerable body of research on peer review. Within the machine learning community, there

Machine Learning Blog | ML@CMU | Carnegie Mellon University
Alternative interpretation: Above a relatively low threshold, acceptance is randomized due to lack of space leading to loss of predictive power in that regime. @hal

@ted_dunning I may be missing something (correct me!), but I think in order to get that, most respondents would've had to interpret the question as about the QUALITY of the paper, rather than its CHANCE of acceptance.

It's entirely possible that what you're saying is true - in which case, if one believed their paper was "good enough" they should have answered ~30% - but that's not what happened, which at least suggests people don't *think* that's the case.

Who knows what lurks in the hearts of authors?

I have never been sure about how people truly interpret questions. My users have confounded me far too many times for me to have illusions that the question asked is the question answered.

@hal

@ted_dunning Yup, that's entirely possible. We hoped that giving them the past rate would help the interpretation, but it's definitely possible they misinterpreted.

If that's the case, there's still a big gap because if we really believe everything over a threshold is random, then no one should be saying anything over say 50%, but clearly a lot of people are.

There is definitely a gap, but my first interpretation is that people are complicated and they assume that any questioner is complicated. And then they estimate what you really meant by your question in some complicated way based on their estimate of your estimate of their mental state.

I still love y'all's work here and the graph speaks volumes. It also makes me re-think what I think about publications. That's probably true of others as well.

@hal

@ted_dunning "people are complicated" --- something about truer words... :)

But yes, I agree there should be a lot of room for interpretation given how these darned complicated people interpreted things :

My coming of age moment in this respect was when I was first analyzing the behavior of people relative to music.

I found that if you looked at how much of a song people let play before hitting skip versus our estimate of how much they liked the song that the behavior was very non-intuitive.

Skipping after less than 15 seconds generally seemed to indicate radical dislike of an entire genre. Country music for a heavy metal fan. Or metal for a classical music listener.

1/2

@hal

That made sense. People can determine rough genre in a few hundred milliseconds.

But people skipped their absolute favorite songs frequently after about 30-60 seconds had played. Quizzing users about this indicated that they weren't even quite aware of doing this, but it seemed that they knew the songs well enough that this was enough to get the high.

This behavior had clear ramification for building a recommender.

And none of it much carried over to video watching.

2/2

@hal

@ted_dunning that’s an amazing example! both surprising but also i can totally see how it’s true