The strain on scientific publishing šŸ“„:

The publishing sector has a problem. Scientists are overwhelmed, editors are overworked, special issue invitations are constant, research paper mills, article retractions, journal delistings… JUST WHAT IS GOING ON!?

Myself, pablo, @paolocrosetto and Dan have spent the last few months investigating just that.
https://arxiv.org/abs/2309.15884

A thread🧵1/n

#AcademicChatter #PublishOrPerish #Elsevier #Springer #MDPI #Wiley #Frontiers #PhDAdvice #PhDChat #SciComm

The strain on scientific publishing

Scientists are increasingly overwhelmed by the volume of articles being published. Total articles indexed in Scopus and Web of Science have grown exponentially in recent years; in 2022 the article total was approximately ~47% higher than in 2016, which has outpaced the limited growth - if any - in the number of practising scientists. Thus, publication workload per scientist (writing, reviewing, editing) has increased dramatically. We define this problem as the strain on scientific publishing. To analyse this strain, we present five data-driven metrics showing publisher growth, processing times, and citation behaviours. We draw these data from web scrapes, requests for data from publishers, and material that is freely available through publisher websites. Our findings are based on millions of papers produced by leading academic publishers. We find specific groups have disproportionately grown in their articles published per year, contributing to this strain. Some publishers enabled this growth by adopting a strategy of hosting special issues, which publish articles with reduced turnaround times. Given pressures on researchers to publish or perish to be competitive for funding applications, this strain was likely amplified by these offers to publish more articles. We also observed widespread year-over-year inflation of journal impact factors coinciding with this strain, which risks confusing quality signals. Such exponential growth cannot be sustained. The metrics we define here should enable this evolving conversation to reach actionable solutions to address the strain on scientific publishing.

arXiv.org

First, things first: growth in articles published each year has outpaced the scientists doing the publishing. With #PublishOrPerish, we all face an ever-increasing workload (writing, reviewing, editing…). It’s been rough.

Strain itself is neutral: this could be a welcome change! Are we becoming more efficient? Are we combatting biases (academic racism, positive result bias)?

If that’s all it were, the solution to strain would be to build a better infrastructure.

But… well… it’s not. 2/n

We see that certain groups are major drivers of this article growth, in some cases seemingly out of nothingness. This includes your classic publishers like #Elsevier and #Springer, but also the upstarts #Frontiers and… most significantly #MDPI.

In numbers, there were nearly 1 million more articles per year published in 2022 (2.8m) compared to 2016 (1.9m). MDPI takes the lion’s share at 27% of that growth, with Elsevier (16%) a distant 2nd.

How did we get to this point? 3/n

I could be nuanced (it's in the preprint!). But let’s be frank: it’s special issues.

ā€œDear Dr ___, your preeminent work in [FIELDYOUDONTWORKIN] drew our attention to your [COPYPASTEPAPERTITLE] and we were thoroughly aroused. We invite you to submit to special issue with us, who love your preeminence. Yours faithfully, [AROUSED].ā€

The figure speaks for itself. With my leftover characters, instead I wanna ask y’all to send me screenshots of your favourite SI invitations! Hit me! šŸ˜€ 4/n

So still… is it worth it? Strain itself is neutral. Maybe these special issues are just giving a voice to authors with less privilege?

Or maybe not. The publishers hosting special issues drastically reduced their turnaround times (TATs: submission to acceptance) - and let’s be clear, that’s INCLUDING revisions. 5/n

Now, it’s not our place to judge what an average TAT is supposed to be, but we’re very confident it’s not 37 days across all research fields. Experiment requests in fruit flies take weeks, whereas mice will take months.

TATs are also supposed to vary from article to article: some articles are great on 1st draft, some need a little TLC, and some need… a lot… Yet #MDPI journals in particular, across the board, accept everything in a blistering 37 days with almost no variation. 6/n

But it’s not just #MDPI: #Frontiers and #Hindawi also grew their share of special issues. One might argue: ā€œThese are just labels publishers use. The peer review process is the same.ā€

Au contraire mon ami : no it’s not. Special issues have lower TATs. They’re intended to be lax. They’re for authors to voice ideas that could turn out to be wrong, but advance the conversation in the field. That’s what they used to be at least… and what made them ā€œspecial.ā€ But I digress… 7/n

We also looked at rejection rates (RRs), with some caveats: we took a publisher’s word at what their RRs were, and don’t know underlying methods. But we figured RRs will at least be calc’d consistently within groups. We compared relative RRs over time and RRs compared to proportions of special issues.

Again, #MDPI was the maverick, with a unique decline in RRs over time. Not only that, but in both #Hindawi & MDPI, more special issues means lower RRs. The review process *is not* the same. 8/n

Lastly let’s talk #ImpactFactor (IF). Reminder: IF = avg cites/doc articles in a journal receive within 1st 2y. IF values total cites.

IFs are going up šŸ“ˆ: they’re literally being inflated like a currency. So if you see a journal celebrating its year-over-year increase in IF, you’ve gotta normalize for inflation. This inflation accompanies the huge crush of special issues from earlier. But(!) a citation network-adjusted rank (Scimago Journal Rank, SJR) hasn’t changed accordingly. What gives? 9/n

Well, SJR is complex, but the main thing is it doesn’t reward self-citations, or circular citations from so-called ā€œcitation cartels.ā€

In other words:

** IF just cares about total citations, but doesn’t pay attention to where they come from.
** SJR pays attention, and doesn’t reward you or your buddies for reciprocal back scritchies

10/n

Then there’s Goodhart’s law: ā€œwhen a measure becomes a target, it ceases to be a good measure.ā€

We use IFs and publications as a measure, but now they’re targets. Many studies on consequences, such as @abalkinaanna ā€˜s work on paper mills:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/leap.1574

And then there’s this: https://fediscience.org/@MarkHanson/111104919139171425

That’s what you get from #PublishOrPerish šŸ¤·ā€ā™‚ļø 11/n

We developed a new metric that we call ā€œImpact Inflation.ā€ Impact Inflation is the ratio of Impact Factor to Scimago Journal Rank (IF/SJR). Because IF values total cites (no matter the source), but SJR fails to reward authors aggressively self/co-citing, IF can become extremely inflated compared to SJR for journals hosting citation cartels.

Key point: Impact Inflation is a metric that shows to what extent a journal has succumbed to Goodhart’s law. And well… once again #MDPI leads the pack. 12/n

Talking within-journal self-cites, once again #MDPI has the highest rates

What’s more we also see groups like #Hindawi have higher Impact Inflation, but normal self-cite levels. What gives?

Well, SJR also weights a citation based on where it comes from, and because MDPI journals aren’t well-cited (except by themselves), their citations aren’t worth much. And because MDPI growth came out of nowhere, they’re now exporting huge numbers of citations to others, including a penchant for Hindawi 13/n

So where does that leave us? Well, it’s easy to talk about #MDPI because… scroll up. But fundamentally we need to address strain. We’re all overworked, and we can’t let this go on.

Our metrics tell us this growth isn’t rigorous science. Special issues are lowering standards, which nets groups like MDPI more articles, and more money šŸ’±. We don’t have revenue data, but for-profit gold OA ties revenues to articles published. So it’s no surprise that some groups are gonna spam engines of growth 14/n

Science needs accountability. The public needs to trust ā€œpeer-reviewedā€ papers have some minimum standard. These crazy-prolific special issues are damaging the authority and integrity of science.

It’s also costly: millions of scientists writing, reviewing, editing, and for what? These extra ~1m annual articles aren’t necessary. What’s more: we’re under-describing the strain because we’re only using journals indexed in both Scopus and Web of Science. Surprise! It’s actually even worse šŸ™ƒ 15/n

That said: we’re just four white guys who all got fascinated with the craziness of the publishing sector. But you, the reader, can help. Publishing scientific articles can’t be like ordering fast food: ā€œI’d like one special issue article please, hold the critiques.ā€

Special issues need to be a rare treat. A ā€œsometimesā€ food. And when you’re invited to publish in one, or host one, that invite shouldn’t come from an algorithm. We should try to establish this basic #ResearchCulture 16/n

You know who CAN make a difference though? Funders, Universities, Academies of Science, @wellcometrust, @ukrio, @snsf_ch @DORAssessment etc… we need your help!

We need policies that treat special issues differently because they are. We need guidelines from #COPE on a reasonable minimum rigour for #peerreview. We need standard reporting of key metrics like RRs, profit margins, etc… We need leadership, and thank you for all you’ve already done and all you’re going to do. We’re up to chat! 17/n

Now the mushy stuff: Pablo, @paolocrosetto Dan, it’s been an incredible privilege to work with you all on this. I learned a ton through this project on coding and reproducible research practices.

Also: thanks for putting up with me, I know I’m a lot. As we’ve heard many times from folks over the last months: ā€œthis work you guys are doing is really important.ā€ I believe it. Still banned from the text though. 18/n

A last point: we really hummed and hawed about if and how we could release scripts and data, but we just can’t right now. Lawyers told us not to. We’re like… 99% sure we didn’t do anything risquĆ©, but these things can’t be rushed. We’ll update the preprint if/once we’ve confirmed everything. Sorry about that, but hope it’s understandable. 19/n

That’s it. If you’ve made it this far, thanks for reading. I hope this work can help you start some much-needed conversations at your local level.

If you want to chat, I’ll personally be happy to try and carve out some time to. If so, best to send me an email. Let’s work together to be the change we want to see in #Science
20/20 - end

Thread on article: ā€œThe strain on scientific publishingā€ - out now.

Tagging @TheConversationUK @dingemansemark @OverlyHonestEditor @galtiernicolas @DORAssessment @petersuber @ElisabethBik @brembs @mattjhodgkinson @danielbolnick @deevybee @ct_bergstrom

Boosts much appreciated!

Let's try this again...

Thread on article: ā€œThe strain on scientific publishingā€ - out now.

Tagging @TheConversationUK @OverlyHonestEditor @GaltierNicolas @DORAssessment @ElisabethBik @brembs @mattjhodgkinson @danielbolnick @deevybee

Boosts much appreciated!

@gpollara here's the unrolled thread: https://mastoreader.io?url=https%3A%2F%2Fmed-mastodon.com%2F%40gpollara%2F111150485824430373

Next time, kindly set the visibility to 'Mentioned people only' and mention only me (@mastoreaderio). This ensures we avoid spamming others' timelines and threads unless you intend for others to see the unrolled thread link as well.

Thank you!

Masto Reader

@MarkHanson AH -that did not work well! Hmph. Try again....
@gpollara I didn't even know that was a thing you could try. Ah well...
@MarkHanson second attempt failed too - somewhere along the way the thread must have got broken (not blaming you!). Example of where it worked very well... https://mastoreader.io/?url=https%3A%2F%2Fmed-mastodon.com%2F%40gpollara%2F111085487379770529
Masto Reader

@gpollara Honestly it's the first thread I posted in a looong while, and it was a huge thread. But also I do think each post is a reply to the previous post... I wonder if instance version etc... could play a role?
@MarkHanson out of curiosity, how did you make your thread? I have come across https://fedica.com/dash/ this week - v impressed with it - can plan and schedule whole threads
Logging In

@gpollara painstakingly replying to each one of my previous posts. Thanks for the tips!

@MarkHanson that feels painful. Fedica looks promising - even auto numbers threads. All I need is something worth saying that needs a thread.... Feels a bit like staring at a long blank form = empty mind syndrome!!!

Anyway, if you use it, let me know how you get on

@gpollara Will do! I bookmarked it. Because of the plan to post across three social media sites, I did a word doc with posts split at least...
@MarkHanson @dingemansemark @petersuber @ct_bergstrom When I was a mature age psychology student, 1980s, ā€œSpecial Issuesā€ genuinely were special. This paper-mill + predatory publication situation is nightmarish.
@MarkHanson @dingemansemark @petersuber @ct_bergstrom ks for this very nice thread and put data and words on a feeling! I suggest looking toward @PeerCommunityIn and @PeerCommunityJournal for non-profit and alternative way of publishing šŸ™‚ I would have been curious to see the ratio between article production and money gain from the publisher perspective.

@AudreyBras I'm a recommender for PCI infections! Indeed, having niggling doubts now about what we do with this preprint:

We thought we had a plan (send to big publisher for credibility/visibility needed), but the response has been awesome. I, personally, am wondering if we could in fact send it to a Diamond OA like PCI without handicapping our ultimate goal of visibility/conversation-driving. The message it would send, but the potential audience we'd lose, it's such a weird dance right now.

@MarkHanson I totally understand. I am a young career researcher and it's very difficult to turn away from the big publishing system even though knowing all the problem associated with it.
I suppose one way could be trying the PCI friendly journal as an alternative option šŸ¤”
@MarkHanson awesome work and great thinking. Love the insights and will probably think about this topic a lot now
@MarkHanson This was a great read! However, I would be very careful about this chart in particular: using double y axes to show an association can be very fraught, and especially the "strain" annotation here sets off all kinds of alarm bells in my data viz brain. Here's some more reading on the topic: http://daydreamingnumbers.com/blog/dual-axis-charts/
What to keep in mind when creating dual axis charts?

In this article, let's talk about the common confusions that arise with dual axis charts. Understanding these can help us design them with more care.

Daydreaming Numbers
@janeadams for sure! The only idea here is to show that at same time article growth was ramping up, PhDs awarded was plateau'ing. In Fig1supp1 we present it as a single-line ratio using a couple datasets. #DataViz 1/2

@janeadams

The double axis in this one figure is important though in a data viz sense: it allows a direct overlay with the years. The charts in fig B/D below are too abstracted to be meaningful without first showing the double-axis figures. So we validate (B/D above), but double axis is key here IMO #DataViz

Also funny you mention this, as depending on who you ask, this is a great example of a useful dual axis line chart! xD
https://bsky.app/profile/steveharoz.com/post/3kajjfgk4vs2x

2/2

Steve Haroz (@steveharoz.com)

Great example of a dual-axis line chart!

Bluesky Social

@gpollara here's the unrolled thread: https://mastoreader.io?url=https%3A%2F%2Fmed-mastodon.com%2F%40gpollara%2F111150499315170685

Next time, kindly set the visibility to 'Mentioned people only' and mention only me (@mastoreaderio). This ensures we avoid spamming others' timelines and threads unless you intend for others to see the unrolled thread link as well.

Thank you!

Masto Reader

@MarkHanson great stuff! Every blue moon I give a talk on the dark side of science, and try to keep the information updated. This will be a great addition to the library!

@MarkHanson Super thread! Thank you!!

I would just be carful in interpreting the Papers vs PhD "strain" figure. This interaction seems pretty strongly influenced by COVID (papers). What if this section šŸ‘‡ were the norm (like before covid)??

@MKiebs Thanks! In supp, we discuss this in a couple ways. Fig1supp1 shows this using OECD + India + China or vs. UNESCO researchers-per-million, but trends remain.

Fig. 5supp3 discusses the effect of COVID, and while it's probably part of it, the growth in articles (and trends in Fig5supp3) began in 2018->2019, and .continued in 2021->2022 despite relaxations and returns to work. So COVID may have contributed somewhat (how much, 2023 data will tell!), but it's certainly not the whole story šŸ™‚

@MarkHanson really fascinating!

Is Copernicus (EGU) a journal you’ve looked at?

@atthenius thanks! I've never heard of them, but I guess Copernicus publishes many journals? I'd check journals you're interested in on Scimagojr.com

There you can get the numbers for within-journal self cites, or the ratio of IF (2-yr cites/doc) vs SJR, which is our "Impact Inflation" score. And from there, can compare to Fig 5 numbers to see where your journals sits!

@MarkHanson

They are a pretty big publisher in Europe for all their climate and environmental things. Copernicus is where Europe distributes their climate info.

And certainly of much higher renown than mdpi or frontiers. Similar to Wiley (AGU) us except they have an open access model.

CPD: Total cites to self sites looks like 9 or 10:1 in eyeballing it for CPD. H index of 90.

GMD; total to self of 30:1. H index 115.

ACP: total to self cites 20:1. hindex 23.

@atthenius those all seem like reasonable levels of within-journal self-citation, though maybe CPD is a bit on the high side. Has it increased in recent years?

Beyond self-cites, to better know the full citation behaviour (including from cliques of journals), it'd be interesting to know SJR and the Scimago cites/doc (2 years) stats, or Clarivate IF instead of cites/doc (same thing). Not sure what these journals are from their acronyms, otherwise I'd check their IF/SJR myself.

@MarkHanson

Ran out of space:
Climate of the Past
Geoscientific Model Development
Atmospheric Chemistry and Physics

https://publications.copernicus.org/open-access_journals/open_access_journals_a_z.html

Copernicus Publications - Journals A-Z

@atthenius
For 2022

COP citperdoc2yr/SJR = 4.357/2.025 = 2.15

That's great. COP published by the European Geosciences Union, hosted by Copernicus? We wouldn't have collected under Copernicus as we used Publisher designations per Scimago, and said any journal with a Publisher listing distinct from the host company was editorially independent.

Rinse and repeat for your journals of interest! Feel free to take the same approach to whatever journal you're interested in. šŸ™‚

https://www.scimagojr.com/journalsearch.php?q=4400151418&tip=sid&clean=0

Climate of the Past

@MarkHanson I mean, I guess? But somehow ironically, this also ties neatly into a whole different issue of what's borked about publishing as well. The lack of transparency and reproducibility. šŸ˜ž
@adamhsparks Oh I know. We will make the data available to peer reviewers and we really hope to update the preprint as soon as we've got the all-clear. But we web scraped a ton of data that makes this very tricky across international boundaries. We're confident we'll be protected within UK Fair Dealings (they have an explicit section on text mining for non-commercial research), but as our author group is spread out, we didn't have definite protections from elsewhere and lawyers told us "wait."
@MarkHanson I'm impressed by the effort and the work. I understand you can't do it all. I just happen to be focused on openness and reproducibility. https://doi.org/10.1094/PHYTO-10-21-0430-PER

@adamhsparks it's worth it!

It was a weird question too: our other data come from Scimago (Scopus), and so even that has in legal-eze "... you can't republish your own database out of our data."

So even the stuff that's freely available I'm not sure we can directly share! We have our scripts to assemble the collection of .csv files, but the public data are copyrighted.

That said: https://www.scimagojr.com/comparejournals.php

Can take a gander on Scimago to see what we mean on CitesPerDoc/SJR and self-cite rates.

SJR Compare Journals

Francesca Gino’s data fraud case and lawsuit against Harvard, explained

A Harvard professor accused of research fraud brings a multimillion-dollar lawsuit against the university and her accusers. What comes next?

Vox

@MarkHanson you ahould not have any law issue with releasing the code. If you do, you probably can modify the cide to exclude data of it.

For the data, hou can probably push it to zenodo without making it public? (there ia a special cield for that, people then need to request access)

Very interesting points will try to find time to read the preprint!

@jcolomb It's because we web scraped. Two things: 1) we won't put out web scraping scripts because, quite literally, someone asked us for them to adapt to steal user private info. 2) the data we collected from them are protected under UK Fair Dealings, but we are multinational and it's not 100% clear whether or not we compiled a database using 'copyrighted' material.

Might make dummy datasets & release analysis scripts with them so you can at least run the code. But fwoo... lots of extra work.

@MarkHanson Scimago ranking is open already, no?
@andrei_chiffa yep, it's great! Although we found out that the data download they provide doesn't include the within-journal self cite data. We had to separately web scrape those to acquire them, as Scimago didn't provide them even after request. But still, Scimago is greatšŸ‘Œ
@MarkHanson hm, I see - so the per-journal IF/Scimago is still not reachable. It’s a pity - there are a couple of publications I really would like to argue we pull off our list of ā€œvenues of interestā€ for which this value I genuinely expect to be a good argument in favor of that.
@andrei_chiffa this much I could share privately if you'd like to send an email. Email in preprint on 1st page. Anything Scimago we're confident we can at least share upon request.

@MarkHanson
Appreciate the work here, but not sure the prescriptions follow. Special editions are just one manifestation of a general pattern in for profit publishing - the obvious incentive to push as many through for as much of a processing charge as the prestige of the brand can bear. Why treat a symptom as a target for intervention? And why abandon the idea of infrastructure building? That seems to to take the underlying landscape as fixed, out of our control, which lends itself to the perennial appeal to funders, which to me is an abdication of our responsibility, agency, and a misattribution of the root causes and power dynamics in play.

One could see the same results and conclude faculty need militant unions to push back against the evaluation systems that make these metrics meaningful, or to organize alternative means of publication. Bibliometrics as you note are intrinsically gamed, but treating the very structure of bibliometrics (citations in traditional journal articles) as inevitable seems to do little to challenge that game, even with a new derived metric that metricizes how gamed one metric is vs another.

@jonny @MarkHanson To add to Jonny's comment, this is good empirical work but more needs to be done to situate the recent shifts, both historically & within the larger institutional dynamics of science. Publishers reflect/refract the incentive structures of their customers (both authors & institutions), which means publishers aren't best analyzed as the only entities with agency.

This is a useful place to start contextualizing what has lead us to the current conjucture: https://doi.org/10.1007/s12108-016-9315-z

@timelfen @jonny on this point, it's perfect.

We hummed and hawed as well about how much we wanted to contextualize, and how much interpretation we wanted to give. And maybe it's due to spending the last few years in Lausanne, but we tried to be neutral Switzerland here.

Our goal is to enable this conversation to be evidence-based, providing data it sorely needed. We're happy to support others to write opinions/reviews on the topic, and will even support with figures as we can. Email if so!

@MarkHanson @jonny I appreciate this Mark. It's hard to argue again having more evidence!

What I think I'm reacting to is not the evidence, it's the frame in which it becomes meaningful. For me, you can't frame this by asking What are publishers doing to science? because they don't operate as external to science. And w/ an internalist framing, the range of forces & actors in play is much greater.

In short, there's no neutral frame & evidence becomes meaningful only in relation to its frame.

@timelfen @jonny ah, we might both agree/disagree then. I think the idea here is that peer-reviewed science is some nebulous thing that has a set of boundaries where it's no longer "peer-reviewed." And maybe the core of the paper is that publisher policies, actively or passively, are stretching those boundaries, and as a community we need to define where something is no longer "peer-reviewed" rigorously enough.

We'll happily talk ears off on utopian publishing futures. Spoilers: this ain't it.

@MarkHanson Let me float an idea (coming from an editor of a social-science journal consisting of nothing but special issues): The nature of SIs *can* be different from regular journal issues, w/ their nod to *collective* modes of inquiry, w/ work brought together & (potentially) reflecting or building on other issue contributions. If done conscientiously, articles in an SI go through a different form of PR, from contributors/editors. Should these articles always also need external reviews?
@MarkHanson What I'm trying to open up w/ this example, from a different publishing culture, is that there can't be really be 1 standard of vigorous PR, because (a) articles aren't all striving for the same goal (i.e., discovery vs. validation-oriented); (b) they aren't all embedded in the same communicative situation; (c) PR itself now pursues various forms & goals. The plurality already exists, for justifiable & unjustifiable reasons. Purification through standards can be a dangerous strategy.
@MarkHanson Your group has done great work to show the growth of SIs as a business strategy amongst APC-based publishers in concert w/ publication strategy for certain academics. For me, coming from a different corner of scholarship, where the publishing cultures, infrastructures, & pathologies are quite different, my main utopian longing is for a greater recognition of the plurality at the heart of a complex human system like science+publishing, which mitigates against the binary: PR = yes/no.