New blog post on the NeurIPS'21 experiment re authors' perceptions of their own papers!

https://blog.ml.cmu.edu/2022/11/22/neurips2021-author-perception-experiment/

Key findings:

1) Authors significantly overestimate their papers' chances of acceptance. By like a LOT.

>

How do Authors' Perceptions about their Papers Compare with Co-authors’ Perceptions and Peer-review Decisions?

Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan(NeurIPS 2021 Program Chairs) Charvi Rastogi, Ivan Stelmakh, Zhenyu Xue, Hal Daumé III, Emma Pierson, and Nihar B. Shah There is a considerable body of research on peer review. Within the machine learning community, there

Machine Learning Blog | ML@CMU | Carnegie Mellon University
2) Miscalibration is lower for more "senior" authors ("seniority" measured by their role in the conference), and slightly higher for women (note also that women are less likely to be senior in this data/definition, but we controlled for this in the analysis).

3. For authors who submit more than one paper, when asked to RANK their papers by scientific merit, most of the time (93%) their ranking agrees with their estimated probabilities of acceptance. In 7% of cases they report that the paper they rank as having higher merit they say has lower chance of acceptance.

>

4. (AND I FOUND THIS MOST FASCINATING!) The amount of disagreement between CO-AUTHORS in terms of the perceived relative scientific contribution of their papers is SIMILAR to the amount of disagreement between authors and reviewers.

That is - even though we worry a lot about REVIEWER disagreement, there seems to be just about as much AUTHOR disagreement about the same paper!

>

5. About half of authors report that their perception of their own paper changed after seeing the initial reviewers. Additionally, among both accepted and rejected papers, over 30% of authors report that their perception became more positive.

>

Conclusions:

Vast overestimates of probability of acceptance suggests we should recalibrate expectations (one way or the other).

Disagreements around around paper quality suggest that assessing paper quality is not only extremely noisy, but lacks an objective right answer.

>

It was super fun working with a host of great people on this experiment:

NeurIPS 2021 Program Chairs Alina Beygelzimer, Yann N. Dauphin, Percy Liang and Jennifer Wortman Vaughan...

... and colleagues Charvi Rastogi, Ivan Stelmakh, Zhenyu Xue, Emma Pierson, and Nihar Shah.

/end

@hal Reading the blog post, this point really struck me. I would never assume there could be an objectively right answer for assessing paper quality, but are there those in the NeurIPS community who think there could be?
@JoFrhwld Honestly IDK. To me, it's kinda obvious that there's no right answer. (We iterated MANY times even on how to ask the question and it's certainly imperfect.) BUT at the same time when I hear "grumble reviewers grumble", and when I myself grumble, I often skim over this point, perhaps too much.
@hal In other words -- graduate students shouldn't be discouraged because, on average, they aren't going to paper into NeurIPS until their fourth try... It's a crapshoot. A game of chance for all but the worst papers, which are properly filtered out by quality.
@tedpavlic i definitely agree that students (and others) shouldn’t be discouraged. the cited neurips’2020 experiment talks explicitly about randomness in the review process and i think is a much better study to base this type of conclusion on - it specifically looked at randomness in reviews. i don’t quite conclude this from ours which is much more about author *impressions* of the process rather than something about how random the process is itself