Delving into #LLM-assisted writing in #biomedicalpublications through excess vocabulary Dmitry Kobak orcid.org/0000-0002-56..., Rita González-Márquez orcid.org/0009-0005-68..., [...] , and Jan Lause www.science.org/doi/10.1126/... Science.org

ORCID
ORCID

#Abstract Large language models #LLMs like #ChatGPT can generate & revise text with human-level performance. These models come with clear limitations can produce inaccurate information & reinforce existing biases.
Yet, many scientists use them for their scholarly writing. But how widespread is such #LLM usage in the #academicliterature? To answer this question for the field of #biomedicalresearch, we present an #unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical
abstracts from 2010 to 2024 indexed by #PubMed & show how the appearance of #LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs
This lower bound differed across disciplines, countries, journals, reaching 40% for some #subcorpora. We show that #LLMs have had an unprecedented impact on #scientificwriting in #biomedicalresearch, surpassing the effect of major world events such as the COVID pandemic
#RESULTS: Excess words indicate widespread LLM usage For comparison, using the set of four excess content words from 2021, covid, pandemic, coronavirus, sars (any scientific paper on COVID-19 likely contained at least one of these four words in its abstract) yielded frequency gap Δ = 0.069.
This shows that the #LLMusage in 2024 was at least two times higher than the size of COVID-related literature at its peak in. Lower bounds differed between subcorpora Given these potential explanations for the heterogeneity in the lower bound of #LLM use for #scientificediting,
our results indicate widespread usage in most #PubMed-indexed fields, countries, journals,including the most prestigious ones. We argue that the true #LLM usage in #biomedicalpublishing may be closer to the highest lower bounds
we observed, as those may be corpora where #LLM usage is the most naïve & the easiest to detect. These estimates are above 30%, which is in line with recent surveys on researchers’ use of #LLMs for manuscript writing.
Our results show how those self-reported behaviors translate into real-world #LLMusage in final publications may be closer to the highest lower bounds we observed, as those may be corpora where LLM usage is the most naïve & the easiest to detect.
These estimates are above 30%, which is in line with recent surveys on researchers’ use of LLMs for manuscript writing. Our results show how those self-reported behaviors translate into real-world LLM usage in final publications.
We hypothesize that this effect is much smaller & much slower. Similarly, we cannot distinguish the influence of different LLMs.
Related work: Our results go beyond other studies on detecting #LLMfingerprints in #academicwriting. Gray described a twofold increase in frequency for the words intricate and meticulously in 2023, while Liang et al. identified pivotal, intricate, showcasing
& realm as the top #LLM-preferred words based on a corpus of #LLM-generated text. In contrast, our study performed a systematic search for LLM marker words based on excess usage in published #scientifictexts. We found 379 style words with highly elevated frequencies in 2024
Our work shows that LLM usage for scientific writing is on the rise despite these substantial limitations. How should the academic community deal with this development?