Mastodawn

STOP DISINFORMATION Desinformación Aug 31, 2025

Delving into #LLM-assisted writing in #biomedicalpublications through excess vocabulary Dmitry Kobak orcid.org/0000-0002-56..., Rita González-Márquez orcid.org/0009-0005-68..., [...] , and Jan Lause www.science.org/doi/10.1126/... Science.org

ORCID

ORCID

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

#Abstract Large language models #LLMs like #ChatGPT can generate & revise text with human-level performance. These models come with clear limitations can produce inaccurate information & reinforce existing biases.

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

Yet, many scientists use them for their scholarly writing. But how widespread is such #LLM usage in the #academicliterature? To answer this question for the field of #biomedicalresearch, we present an #unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

abstracts from 2010 to 2024 indexed by #PubMed & show how the appearance of #LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

This lower bound differed across disciplines, countries, journals, reaching 40% for some #subcorpora. We show that #LLMs have had an unprecedented impact on #scientificwriting in #biomedicalresearch, surpassing the effect of major world events such as the COVID pandemic

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

#RESULTS: Excess words indicate widespread LLM usage For comparison, using the set of four excess content words from 2021, covid, pandemic, coronavirus, sars (any scientific paper on COVID-19 likely contained at least one of these four words in its abstract) yielded frequency gap Δ = 0.069.

Show thread

STOP DISINFORMATION Desinformación

This shows that the #LLMusage in 2024 was at least two times higher than the size of COVID-related literature at its peak in. Lower bounds differed between subcorpora Given these potential explanations for the heterogeneity in the lower bound of #LLM use for #scientificediting,

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

our results indicate widespread usage in most #PubMed-indexed fields, countries, journals,including the most prestigious ones. We argue that the true #LLM usage in #biomedicalpublishing may be closer to the highest lower bounds

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

we observed, as those may be corpora where #LLM usage is the most naïve & the easiest to detect. These estimates are above 30%, which is in line with recent surveys on researchers’ use of #LLMs for manuscript writing.

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

Our results show how those self-reported behaviors translate into real-world #LLMusage in final publications may be closer to the highest lower bounds we observed, as those may be corpora where LLM usage is the most naïve & the easiest to detect.

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

These estimates are above 30%, which is in line with recent surveys on researchers’ use of LLMs for manuscript writing. Our results show how those self-reported behaviors translate into real-world LLM usage in final publications.

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

We hypothesize that this effect is much smaller & much slower. Similarly, we cannot distinguish the influence of different LLMs.

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

Related work: Our results go beyond other studies on detecting #LLMfingerprints in #academicwriting. Gray described a twofold increase in frequency for the words intricate and meticulously in 2023, while Liang et al. identified pivotal, intricate, showcasing

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

& realm as the top #LLM-preferred words based on a corpus of #LLM-generated text. In contrast, our study performed a systematic search for LLM marker words based on excess usage in published #scientifictexts. We found 379 style words with highly elevated frequencies in 2024

Show thread

STOP DISINFORMATION Desinformación Aug 31, 2025

Our work shows that LLM usage for scientific writing is on the rise despite these substantial limitations. How should the academic community deal with this development?