This week, Science published a stunningly irresponsible news story entitled "Fake scientific papers are alarmingly common" and claiming that upward of 30% of the scientific literature is fake.

https://www.science.org/content/article/fake-scientific-papers-are-alarmingly-common

Below, the first two paragraphs of the story.

Headline and intro notwithstanding, the story itself later notes that the detector doesn't actually work and flags nearly half of real papers as fake. Does the reporter just not understand that?

h/t @Hoch

Fake scientific papers are alarmingly common

But new tools show promise in tackling growing symptom of academia’s “publish or perish” culture

The numbers from this story are based on a laughable "fake paper detector" that literally consists of the following ONLY. Do the authors:

1) use private (non-institutional) email addresses and/or have a hospital affiliation,

and

2) have no international coauthors.

That's it.

If these criteria are met, the paper is deemed a "potential red-flag fake publication" and counted toward that 30% tally.

Spin notwithstanding, the technical details within preprint itself make it abundantly clear that the method doesn't work.

In a "juiced" test set with as many fake papers as real ones, the indicators that they use have a sensitivity of 86% and a false alarm rate of 44%.

Yes, they flag 44% of the known real papers as fake.

That's not a detector, it's a coinflip.

This should be a profound embarrassment to everyone involved with the preprint and Science story alike.

https://www.medrxiv.org/content/10.1101/2023.05.06.23289563v1.full.pdf

To test their indicator, the authors conjecture that a valid indicator should meet three criteria based on a questionnaire sent to authors:

(i) Authors of fake publications are reluctant to provide critical information as revealed by their response – or non-response – to the questionnaire by the editor,

(ii) the number of fake publications increases steadily over time, and

(iii) journals with a low to medium impact factor are most affected.

There's a huge problem here.

If non US/EU authors are more likely to use non-institutional email addresses, the detector will pick up these authors disproportionally.

And indeed we expect non US/EU authors to

i) have lower response rates, due to language issues not to mention (deserved, apparently!) distrust

ii) make up an increasing fraction of publication share

iii) publish at higher rates in low and mid impact journals

So all three test hypotheses fail to distinguish between fake papers and non US/EU authorship.

And that's the worst part of it.

While I don't want to imply anything about the motivations of the authors, *their paper has racist consequences.*

Their paper implements a detector that they themselves show doesn't work, and that we have every reason to expect disproportionally flags papers from Asia and the Global South—and then concludes that these area contribute the most fake papers.

Figure 3 is a disgrace.

I'm astonished that Science credulously boosted this rubbish, and even more surprised that Gerd Gigerenzer put his name on it.

/fin

Further thoughts in the continued thread here: https://fediscience.org/@ct_bergstrom/110359692279086139
Carl T. Bergstrom (@[email protected])

Attached: 1 image Continued: it's striking that the authors didn't even use conventional machine learning procedures to develop their classifier. Rather, they chose features that made sense to them as indicators. These were not even indicators of fake papers, but rather indicators of non-response to a survey —which of course is a very different thing than authorship of a fake paper.

FediScience.org
@ct_bergstrom One thing I still don’t get: How can a detector with 44% false positives end up reporting some 30% hits? Am I missing something or is the dissonance really just that profound?
@jpelckolsen I suppose the 44% was on the small test set?
@ct_bergstrom shouldn’t that still raise a flag? Say I make a “human-or-cat detector” that flags an individual as a cat if it weighs less than 30 kg. With my household as the test set that would give a sensitivity of 100% and a specificity of 50% (like the detector in the article) meaning I expect to flag half of all humans as cats. If I then apply that detector to my workplace I’d find 0% cats, but how could that be? I expect at least 50% of my colleagues to be false positives
@jpelckolsen I agree it's inconsistent and I was guessing there's some difference between test sets. But who knows....

@ct_bergstrom I suspect since Sabel works in *clinical* neuropsychology, if he would only be using this algorithm on papers in his own field, so hospital affiliation would be a must rather than a should?

Rest of this sus af.

@ct_bergstrom

A colleague, who should know better when it comes to data analysis, circulated this article across our internal science network.

I would direct them to your thread above, but work blocks much of the Fediverse (but not Twitter).

This and other misuse of statistics in practice somewhat annoy me, as some directly impacts my work (and I'm in no way a statistician, nor scientist).

@IceNine
> I would direct them to your thread above, but work blocks much of the Fediverse (but not Twitter)

WTF?!? Have you raised a ruckus with them about how ridiculous that is?

@ct_bergstrom

@strypey

It's probably a good thing, largely keeps me off during work. 😆

@IceNine
Not off Titter...

@strypey

Yeh, but I don't use that account any more. Was unamused to see the likes of who got unbanned, and what is now considered acceptable.

@IceNine
Good for you :) I only ever used my Titter account as a sockpuppet, echoing my public posts here.
@IceNine @strypey Pretty much. No desire to supply free content to a network that caters to convicted seditionists and wannabe Oswald Moselys.
Oliver D. Reithmaier (@[email protected])

If your tool produces high rates of false positives, and other tools produce the same results as yours, the idea to get should not be that your tool is "as good as any". It's that your tool and its concept fucking suck. What a disgrace of an article & what a waste of money. https://www.science.org/content/article/fake-scientific-papers-are-alarmingly-common

Infosec Exchange
@ct_bergstrom And the headline implies that neuroscience is the totality of science. Many geologists and and astronomers and ecologists will be surprised to learn that they are not doing science.
@ct_bergstrom that applies in EU as well. Everything you listed is true for Poland and likely most of countries in the Union.

@ct_bergstrom this collapse of US/EU with "Real Science" deserving of publication

is strikingly well-aligned with current GOP/Tory talking points about "Real Americans/Britons" deserving of government attention and support

@ct_bergstrom The criteria are just silly. They are trying to use non-content, easily observed criteria without ever answering the question of sufficiency. I would question the very idea of non-content criteria, although perhaps with some careful work one might find some that are useful. The amazing thing here is that this is basically a fake paper about fake papers that suffers from many of the content deficiencies of fake papers...

@ct_bergstrom

Weak sensitivity paired with no specificity equals meaningless results.

Edit: just looked at the paper. It is worse than I imagined. Doesn't even have face validity, no confidence intervals, etc. I think you could get similar results with, I don't know, a detector that uses the length of the first authors name.

@ct_bergstrom

So basically a d’ of 1.18? That’s it? Not a very robust classifier.

@ct_bergstrom The method is makes me very unhappy.

There's some congruence between the results and our expectations, but ... I do not believe this task can be responsibly attempted without *analyzing the text and data*. No citation analysis is attempted either. This is nothing like enough.

I would also like an explanation as to why this figure is so much goddamn higher than other estimates, because I can think of several other attempts to quantify this problem, and while many of them are far higher than we should be comfortable with, they're also far *lower* than this.

This is not a responsible document. It is one thing to accuse lots of people of faking studies (I, uh, have done this) but it's another to accuse half of
some nation states of doing it on the basis of a method I would diplomatically describe as 'thin'.

@jamesheathers 100% agreement with all of that.

(Well, they do try a post-hoc test to see if suspected papers by their methods cite other suspected papers by their methods, but it's generous to call that citation analysis.)

@ct_bergstrom Absolutely, and that represents a lot of money left on the table. We have a reasonable idea of what citation patterns can be observed if mills are involved, AND the data is quite accessible. I think if you're going to sling big timber around like this, there's no excuse not to use it.

@ct_bergstrom

Not even a LITTLE AI bullshit? So disappointing.

@ct_bergstrom

I always use a personal email address because institutional email addresses are inherently short-term. And I only have international co-authors on about a third of my papers.

According to this "detector", 2/3s of my papers are fake 🤡

@ct_bergstrom not gonna lie that my favorite thing about this post is its "two things...1)... and 3)" structure
@ct_bergstrom I will move to Santorini and work as an International Coauthor for anyone who wants to pass the detector. For a totally reasonable fee of course.
@ct_bergstrom so any papers where I used a personal email address because they crossed between PhD/postdocs where I knew I’d lose access/didn’t yet have a new email weren’t real?
@ct_bergstrom I like to use my private email because that's going to live with me indefinitely. Guess I'm fake.
@ct_bergstrom Love to know that using my gmail address after graduation could be seen as a signal that our paper was fake 🙄
@ct_bergstrom There goes whatever remaining shed of credibility _Science_ might have had.

@ct_bergstrom I just had a look at the paper and I'm wondering if you (or someone!) understand what "international coauthor/partner" actually means to them. Are they basing this on specific countries? Like there has to be a *western* co-author? Or would their tool suggest that a paper with all U.S. authors is more likely to be fake? (Their methods are also just... not well described. :-\ )

Anyway yeah this is baaaaad, thank you for the thread.

@ct_bergstrom Doesn’t the use of private mail addresses discriminate against Junior faculty? Which Postdoc on a two year contract lists an institutional mail address when the publication might take three years? And then than you’re surprised you didn’t get an answer? 🤨
@ct_bergstrom wow I'm red flagged! I started using my Gmail because I wasn't about to use my postdoc addresses that only lasted a year or two each