Mastodawn

Andrew Mar 26, 2024

I have a preprint out estimating how many scholarly papers are written using chatGPT etc? I estimate upwards of 60k articles (>1% of global output) published in 2023. https://arxiv.org/abs/2403.16887

How can we identify this? Simple: there are certain words that LLMs love, and they suddenly start showing up *a lot* last year. Twice as many papers call something "intricate", big rises for "commendable" and "meticulous".

#bibliometrics #scholcomm #chatgpt

ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature

The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature. For the publishing year 2023, it is found that several of those keywords show a distinctive and disproportionate increase in their prevalence, individually and in combination. It is estimated that at least 60,000 papers (slightly over 1% of all articles) were LLM-assisted, though this number could be extended and refined by analysis of other characteristics of the papers or by identification of further indicative keywords.

arXiv.org

Show thread

Steffen Christensen Mar 26, 2024

@generalising Fantastic work, Andrew!
Thank you so much. Now I can search web data for posts, searches, and media using the same token words. :)

Show thread

Steffen Christensen Mar 26, 2024

@generalising I'm number 2!!

Show thread

Andrew Mar 26, 2024

@Wikisteff Credit where it's due - I took the sample list from an earlier study! https://arxiv.org/abs/2403.07183 (p 15, 16) I think this is a bit of an idiosyncratic list due to the peer-review context (hence it's all adjectives/adverbs, almost all positive) and there will definitely be other distinctive terms, some unpredictable - it would be quite interesting to do some larger analysis to try and find them.

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

arXiv.org

Show thread

Steffen Christensen Mar 26, 2024

@generalising It's a fantastic idea!
I used fine-grained stylometrics to identify the unique-ish "fists" of posters and their proxy accounts in Twitter posts in 2022 to do some hypothesis testing of co-authorship amongst accounts in the aftermath of the 2022 Convoy Protest here in Ottawa, but I hadn't thought of using them for bibliometrics and AI!
It's a genius move! :)

Show thread

Miranda Gray Mar 27, 2024

@Wikisteff Why had I heard of that work before? Fascinating!

Show thread

Steffen Christensen Mar 27, 2024

@mirgray There's a LOT you can do with stylometrics. I'm still kind of hoping that LLMs can be used to identify the fists of individual authors in their training data reliably, as clearly the data are in there ("please write a sonnet about how bias in decision AIs is a challenging issue in the style of William Shakespeare").

Show thread

ACCOUNT MOVED

@Wikisteff @mirgray Could you please consider creating a programming language named “Shakespeare”?