The strain on scientific publishing 📄:

The publishing sector has a problem. Scientists are overwhelmed, editors are overworked, special issue invitations are constant, research paper mills, article retractions, journal delistings… JUST WHAT IS GOING ON!?

Myself, pablo, @paolocrosetto and Dan have spent the last few months investigating just that.
https://arxiv.org/abs/2309.15884

A thread🧵1/n

#AcademicChatter #PublishOrPerish #Elsevier #Springer #MDPI #Wiley #Frontiers #PhDAdvice #PhDChat #SciComm

The strain on scientific publishing

Scientists are increasingly overwhelmed by the volume of articles being published. Total articles indexed in Scopus and Web of Science have grown exponentially in recent years; in 2022 the article total was approximately ~47% higher than in 2016, which has outpaced the limited growth - if any - in the number of practising scientists. Thus, publication workload per scientist (writing, reviewing, editing) has increased dramatically. We define this problem as the strain on scientific publishing. To analyse this strain, we present five data-driven metrics showing publisher growth, processing times, and citation behaviours. We draw these data from web scrapes, requests for data from publishers, and material that is freely available through publisher websites. Our findings are based on millions of papers produced by leading academic publishers. We find specific groups have disproportionately grown in their articles published per year, contributing to this strain. Some publishers enabled this growth by adopting a strategy of hosting special issues, which publish articles with reduced turnaround times. Given pressures on researchers to publish or perish to be competitive for funding applications, this strain was likely amplified by these offers to publish more articles. We also observed widespread year-over-year inflation of journal impact factors coinciding with this strain, which risks confusing quality signals. Such exponential growth cannot be sustained. The metrics we define here should enable this evolving conversation to reach actionable solutions to address the strain on scientific publishing.

arXiv.org

First, things first: growth in articles published each year has outpaced the scientists doing the publishing. With #PublishOrPerish, we all face an ever-increasing workload (writing, reviewing, editing…). It’s been rough.

Strain itself is neutral: this could be a welcome change! Are we becoming more efficient? Are we combatting biases (academic racism, positive result bias)?

If that’s all it were, the solution to strain would be to build a better infrastructure.

But… well… it’s not. 2/n

We see that certain groups are major drivers of this article growth, in some cases seemingly out of nothingness. This includes your classic publishers like #Elsevier and #Springer, but also the upstarts #Frontiers and… most significantly #MDPI.

In numbers, there were nearly 1 million more articles per year published in 2022 (2.8m) compared to 2016 (1.9m). MDPI takes the lion’s share at 27% of that growth, with Elsevier (16%) a distant 2nd.

How did we get to this point? 3/n

I could be nuanced (it's in the preprint!). But let’s be frank: it’s special issues.

“Dear Dr ___, your preeminent work in [FIELDYOUDONTWORKIN] drew our attention to your [COPYPASTEPAPERTITLE] and we were thoroughly aroused. We invite you to submit to special issue with us, who love your preeminence. Yours faithfully, [AROUSED].”

The figure speaks for itself. With my leftover characters, instead I wanna ask y’all to send me screenshots of your favourite SI invitations! Hit me! 😀 4/n

So still… is it worth it? Strain itself is neutral. Maybe these special issues are just giving a voice to authors with less privilege?

Or maybe not. The publishers hosting special issues drastically reduced their turnaround times (TATs: submission to acceptance) - and let’s be clear, that’s INCLUDING revisions. 5/n

Now, it’s not our place to judge what an average TAT is supposed to be, but we’re very confident it’s not 37 days across all research fields. Experiment requests in fruit flies take weeks, whereas mice will take months.

TATs are also supposed to vary from article to article: some articles are great on 1st draft, some need a little TLC, and some need… a lot… Yet #MDPI journals in particular, across the board, accept everything in a blistering 37 days with almost no variation. 6/n

But it’s not just #MDPI: #Frontiers and #Hindawi also grew their share of special issues. One might argue: “These are just labels publishers use. The peer review process is the same.”

Au contraire mon ami : no it’s not. Special issues have lower TATs. They’re intended to be lax. They’re for authors to voice ideas that could turn out to be wrong, but advance the conversation in the field. That’s what they used to be at least… and what made them “special.” But I digress… 7/n

We also looked at rejection rates (RRs), with some caveats: we took a publisher’s word at what their RRs were, and don’t know underlying methods. But we figured RRs will at least be calc’d consistently within groups. We compared relative RRs over time and RRs compared to proportions of special issues.

Again, #MDPI was the maverick, with a unique decline in RRs over time. Not only that, but in both #Hindawi & MDPI, more special issues means lower RRs. The review process *is not* the same. 8/n

Lastly let’s talk #ImpactFactor (IF). Reminder: IF = avg cites/doc articles in a journal receive within 1st 2y. IF values total cites.

IFs are going up 📈: they’re literally being inflated like a currency. So if you see a journal celebrating its year-over-year increase in IF, you’ve gotta normalize for inflation. This inflation accompanies the huge crush of special issues from earlier. But(!) a citation network-adjusted rank (Scimago Journal Rank, SJR) hasn’t changed accordingly. What gives? 9/n

Well, SJR is complex, but the main thing is it doesn’t reward self-citations, or circular citations from so-called “citation cartels.”

In other words:

** IF just cares about total citations, but doesn’t pay attention to where they come from.
** SJR pays attention, and doesn’t reward you or your buddies for reciprocal back scritchies

10/n

Then there’s Goodhart’s law: “when a measure becomes a target, it ceases to be a good measure.”

We use IFs and publications as a measure, but now they’re targets. Many studies on consequences, such as @abalkinaanna ‘s work on paper mills:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/leap.1574

And then there’s this: https://fediscience.org/@MarkHanson/111104919139171425

That’s what you get from #PublishOrPerish 🤷‍♂️ 11/n

We developed a new metric that we call “Impact Inflation.” Impact Inflation is the ratio of Impact Factor to Scimago Journal Rank (IF/SJR). Because IF values total cites (no matter the source), but SJR fails to reward authors aggressively self/co-citing, IF can become extremely inflated compared to SJR for journals hosting citation cartels.

Key point: Impact Inflation is a metric that shows to what extent a journal has succumbed to Goodhart’s law. And well… once again #MDPI leads the pack. 12/n

Talking within-journal self-cites, once again #MDPI has the highest rates

What’s more we also see groups like #Hindawi have higher Impact Inflation, but normal self-cite levels. What gives?

Well, SJR also weights a citation based on where it comes from, and because MDPI journals aren’t well-cited (except by themselves), their citations aren’t worth much. And because MDPI growth came out of nowhere, they’re now exporting huge numbers of citations to others, including a penchant for Hindawi 13/n

So where does that leave us? Well, it’s easy to talk about #MDPI because… scroll up. But fundamentally we need to address strain. We’re all overworked, and we can’t let this go on.

Our metrics tell us this growth isn’t rigorous science. Special issues are lowering standards, which nets groups like MDPI more articles, and more money 💱. We don’t have revenue data, but for-profit gold OA ties revenues to articles published. So it’s no surprise that some groups are gonna spam engines of growth 14/n

Science needs accountability. The public needs to trust “peer-reviewed” papers have some minimum standard. These crazy-prolific special issues are damaging the authority and integrity of science.

It’s also costly: millions of scientists writing, reviewing, editing, and for what? These extra ~1m annual articles aren’t necessary. What’s more: we’re under-describing the strain because we’re only using journals indexed in both Scopus and Web of Science. Surprise! It’s actually even worse 🙃 15/n

That said: we’re just four white guys who all got fascinated with the craziness of the publishing sector. But you, the reader, can help. Publishing scientific articles can’t be like ordering fast food: “I’d like one special issue article please, hold the critiques.”

Special issues need to be a rare treat. A “sometimes” food. And when you’re invited to publish in one, or host one, that invite shouldn’t come from an algorithm. We should try to establish this basic #ResearchCulture 16/n

You know who CAN make a difference though? Funders, Universities, Academies of Science, @wellcometrust, @ukrio, @snsf_ch @DORAssessment etc… we need your help!

We need policies that treat special issues differently because they are. We need guidelines from #COPE on a reasonable minimum rigour for #peerreview. We need standard reporting of key metrics like RRs, profit margins, etc… We need leadership, and thank you for all you’ve already done and all you’re going to do. We’re up to chat! 17/n

Now the mushy stuff: Pablo, @paolocrosetto Dan, it’s been an incredible privilege to work with you all on this. I learned a ton through this project on coding and reproducible research practices.

Also: thanks for putting up with me, I know I’m a lot. As we’ve heard many times from folks over the last months: “this work you guys are doing is really important.” I believe it. Still banned from the text though. 18/n

A last point: we really hummed and hawed about if and how we could release scripts and data, but we just can’t right now. Lawyers told us not to. We’re like… 99% sure we didn’t do anything risqué, but these things can’t be rushed. We’ll update the preprint if/once we’ve confirmed everything. Sorry about that, but hope it’s understandable. 19/n

That’s it. If you’ve made it this far, thanks for reading. I hope this work can help you start some much-needed conversations at your local level.

If you want to chat, I’ll personally be happy to try and carve out some time to. If so, best to send me an email. Let’s work together to be the change we want to see in #Science
20/20 - end

Thread on article: “The strain on scientific publishing” - out now.

Tagging @TheConversationUK @dingemansemark @OverlyHonestEditor @galtiernicolas @DORAssessment @petersuber @ElisabethBik @brembs @mattjhodgkinson @danielbolnick @deevybee @ct_bergstrom

Boosts much appreciated!

@MarkHanson awesome work and great thinking. Love the insights and will probably think about this topic a lot now
@MarkHanson This was a great read! However, I would be very careful about this chart in particular: using double y axes to show an association can be very fraught, and especially the "strain" annotation here sets off all kinds of alarm bells in my data viz brain. Here's some more reading on the topic: http://daydreamingnumbers.com/blog/dual-axis-charts/
What to keep in mind when creating dual axis charts?

In this article, let's talk about the common confusions that arise with dual axis charts. Understanding these can help us design them with more care.

Daydreaming Numbers
@MarkHanson great stuff! Every blue moon I give a talk on the dark side of science, and try to keep the information updated. This will be a great addition to the library!

@MarkHanson Super thread! Thank you!!

I would just be carful in interpreting the Papers vs PhD "strain" figure. This interaction seems pretty strongly influenced by COVID (papers). What if this section 👇 were the norm (like before covid)??

@MarkHanson really fascinating!

Is Copernicus (EGU) a journal you’ve looked at?

@MarkHanson I mean, I guess? But somehow ironically, this also ties neatly into a whole different issue of what's borked about publishing as well. The lack of transparency and reproducibility. 😞
@adamhsparks Oh I know. We will make the data available to peer reviewers and we really hope to update the preprint as soon as we've got the all-clear. But we web scraped a ton of data that makes this very tricky across international boundaries. We're confident we'll be protected within UK Fair Dealings (they have an explicit section on text mining for non-commercial research), but as our author group is spread out, we didn't have definite protections from elsewhere and lawyers told us "wait."
@MarkHanson I'm impressed by the effort and the work. I understand you can't do it all. I just happen to be focused on openness and reproducibility. https://doi.org/10.1094/PHYTO-10-21-0430-PER

@MarkHanson you ahould not have any law issue with releasing the code. If you do, you probably can modify the cide to exclude data of it.

For the data, hou can probably push it to zenodo without making it public? (there ia a special cield for that, people then need to request access)

Very interesting points will try to find time to read the preprint!

@jcolomb It's because we web scraped. Two things: 1) we won't put out web scraping scripts because, quite literally, someone asked us for them to adapt to steal user private info. 2) the data we collected from them are protected under UK Fair Dealings, but we are multinational and it's not 100% clear whether or not we compiled a database using 'copyrighted' material.

Might make dummy datasets & release analysis scripts with them so you can at least run the code. But fwoo... lots of extra work.

@MarkHanson Scimago ranking is open already, no?
@andrei_chiffa yep, it's great! Although we found out that the data download they provide doesn't include the within-journal self cite data. We had to separately web scrape those to acquire them, as Scimago didn't provide them even after request. But still, Scimago is great👌

@MarkHanson
Appreciate the work here, but not sure the prescriptions follow. Special editions are just one manifestation of a general pattern in for profit publishing - the obvious incentive to push as many through for as much of a processing charge as the prestige of the brand can bear. Why treat a symptom as a target for intervention? And why abandon the idea of infrastructure building? That seems to to take the underlying landscape as fixed, out of our control, which lends itself to the perennial appeal to funders, which to me is an abdication of our responsibility, agency, and a misattribution of the root causes and power dynamics in play.

One could see the same results and conclude faculty need militant unions to push back against the evaluation systems that make these metrics meaningful, or to organize alternative means of publication. Bibliometrics as you note are intrinsically gamed, but treating the very structure of bibliometrics (citations in traditional journal articles) as inevitable seems to do little to challenge that game, even with a new derived metric that metricizes how gamed one metric is vs another.

@jonny @MarkHanson To add to Jonny's comment, this is good empirical work but more needs to be done to situate the recent shifts, both historically & within the larger institutional dynamics of science. Publishers reflect/refract the incentive structures of their customers (both authors & institutions), which means publishers aren't best analyzed as the only entities with agency.

This is a useful place to start contextualizing what has lead us to the current conjucture: https://doi.org/10.1007/s12108-016-9315-z

@timelfen @jonny on this point, it's perfect.

We hummed and hawed as well about how much we wanted to contextualize, and how much interpretation we wanted to give. And maybe it's due to spending the last few years in Lausanne, but we tried to be neutral Switzerland here.

Our goal is to enable this conversation to be evidence-based, providing data it sorely needed. We're happy to support others to write opinions/reviews on the topic, and will even support with figures as we can. Email if so!

@MarkHanson @jonny I appreciate this Mark. It's hard to argue again having more evidence!

What I think I'm reacting to is not the evidence, it's the frame in which it becomes meaningful. For me, you can't frame this by asking What are publishers doing to science? because they don't operate as external to science. And w/ an internalist framing, the range of forces & actors in play is much greater.

In short, there's no neutral frame & evidence becomes meaningful only in relation to its frame.

@jonny perhaps I'll respond by saying: read the paper! The Mastodon thread was written for a public excitement, but we've got the nuances in the proper paper.

Also SIs are the plurality of strain by a large margin, and our data suggest they are treated differently (across publishers no less, even the 'classic' ones), so the model of "as many articles as possible through SIs" is just not sustainable, nor expected to produce articles similar to standard issues.

@MarkHanson @ukrio @snsf_ch @DORAssessment Don’t remember once having been in a committee that a publication from said publishers ever mattered, but then I also don’t sit in every panel, of course…
(and I completely ignore their requests to publish or review).
@MarkHanson interesting thread. Is it reasonable to say that peer-based publish or perish pressures (jobs, grants, annual reviews, etc.) are the cause and the predatory journal practices are the effect (they simply feed because there’s something to feed on)?
@AllenNeuroLab yes for sure. There is a market 'want' for access to publications, and so the market provides. That's what #PublishOrPerish does to the system, and why the conversation this #PeerReviewWeek needs to drive real change to lessen that pressure!

@MarkHanson

Your IF calculation is slightly off: you don't devide just by 'articles', but by the articles the publisher negotiated with WoS are 'citable':
https://bjoern.brembs.net/2016/01/just-how-widespread-are-impact-factor-negotiations/

This means there is IF inflation built in and it has been known for decades (need to dig out references) that IF tends to scale with journal size.

Just how widespread are impact factor negotiations?

Over the last decade or two, there have been multiple accounts of how publishers have negotiated the impact factors of their journals with the “Institute for Scientific Information” (ISI), both before it was bought by Thomson Reuters and after. This […] <a class="more-link" href="https://bjoern.brembs.net/2016/01/just-how-widespread-are-impact-factor-negotiations/">↓ Read the rest of this entry...</a>

bjoern.brembs.blog

@brembs Thanks! On those refs, IF has also been shown to decline following rapid growth (eg for PLOS ONE), so it's not so clear cut. One of our colleagues commented that our IF inflation observation doesn't make sense given past literature (refs), and sounds like you're saying the opposite.

In response to both sides, see Fig5supp figures where we took great care to break down the factors leading to the inflation we're seeing between 2016-2022.

@MarkHanson

The references I was refering to are actually older and I also noticed the mega-journals not working as a perfect replicate of the older references 😆

So, yes, it is probably not completely clear-cut. Given the median/mean situation, the scaling with article number makes some intuitive sense, but there may be (nonlinear?) situations where this does not hold?

@brembs I mean my honest take is just that journal citations/document increased from 42->45 over 2018-2021, and then on top of that all these mega journals are pumping out articles like crazy, and the combination has created an imperfect storm of more citations output + more documents = over literally all journals there is an inflation of IFs.

We also broke down journals by size and that's not part of it. Not sure we ever did medians... but we probably should!

@MarkHanson

In biomedicine, one can see huge inflation due to the pandemic. It may be a big enough signal so you see it generally? Can you tease that out? My suspicion is that it should jump out at you?

@MarkHanson

Phrased differently, if you account for pandemic publications and citations, how much of an effect is left?

@brembs See Fig. 5supp3 where we discuss the impact of COVID-19!

Long story short: 2023 data will give a better view.

The growth in references per article began with the rise in SIs in 2018, which predates the pandemic. And then in 2022, despite many relaxations and people getting back to wet lab work, we still saw growth (but a reduced rate). If it was ALL COVID, we'd have expected a return to normalcy, not a further increase (even if reduced growth). So quantifying best awaits 2023 data...

@MarkHanson

Interesting! Yes, makes a lot of sense to wait. Would be interesting to tease out how much of the growth is coming from where!

@MarkHanson

Indeed, there's a flagrant conflict of interest: each paper accepted pays APCs, each paper rejected costs time and money to handle and pays nothing.

For journals to be honest, they should charge for submission to acknowledge the processing cost – presumably a smaller sum than current APCs – or not charge at all: decouple income from accepting papers.

#ScientificPublishing

@albertcardona @MarkHanson that would be even worse! It would just reverse the incentive not get rid of it, and lead to even more wasted time from scientists (an externality as far as publishers economics are concerned).

@neuralreckoning @albertcardona @MarkHanson I still think science funders are the only people who can stop this mess. If HHMI or Wellcome or ERC said "you can have this grant money on one condition: you publish all resulting research on our OA platform".

They are the only player in this mess who can't claim "it's not my fault, the system made me do it"

@cian @neuralreckoning @MarkHanson

That is an ideal approach, but one that requires courage – a commodity in short supply among academic circles.

Lorna Finlayson, The Sycophant — Sidecar

On academics.

Sidecar

@kofanchen @cian @neuralreckoning @MarkHanson

The question is always: what can we do that will matter long-term without excessive short-term suffering. I don't know.

@neuralreckoning @MarkHanson

Was talking about community journals. For-profit journals shouldn't exist to begin with.

@albertcardona @MarkHanson society journals have the same problem. They often fund the operations of the society including the annual meeting, etc.

@neuralreckoning @MarkHanson

That is indeed true. Hiding costs of other events and activities in the article processing charge.