29 Followers
49 Following
42 Posts
Rik van der Lingen writes about chemistry

Molecular machines are in the New Yorker.

A case for why the 2016 #chemistry Nobel Prize (Feringa, who gets a mention here, Stoddart, Sauvage), then considered more speculative than most, may actually be useful

ft. some descriptions of chemistry & scientists that I'm on the fence about

https://www.newyorker.com/magazine/2024/06/24/rise-of-the-nanomachines

#Science

How Will Nanomachines Change the World?

Dhruv Khullar writes about nanotechnology, which can already puncture cancer cells and drug-resistant bacteria. What will it do next?

The New Yorker
Part of the secret to curating organic reactions is maintaining a large list of synonyms. You can write down iron in a surprisingly lot of ways
Not sure if it is good or bad the way the graph is going.

I have a preprint out estimating how many scholarly papers are written using chatGPT etc? I estimate upwards of 60k articles (>1% of global output) published in 2023. https://arxiv.org/abs/2403.16887

How can we identify this? Simple: there are certain words that LLMs love, and they suddenly start showing up *a lot* last year. Twice as many papers call something "intricate", big rises for "commendable" and "meticulous".

#bibliometrics #scholcomm #chatgpt

ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature

The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature. For the publishing year 2023, it is found that several of those keywords show a distinctive and disproportionate increase in their prevalence, individually and in combination. It is estimated that at least 60,000 papers (slightly over 1% of all articles) were LLM-assisted, though this number could be extended and refined by analysis of other characteristics of the papers or by identification of further indicative keywords.

arXiv.org
this is the very first version of the "Europe" Wikipedia article from 23 years ago, back when you could just write "some more" when you got sick of typing
https://reagle.org/joseph/2010/wp/redux/EuropE/980109295.html
EuropE

My new journal article, "Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles" has now been published in JLSC. If our findings are correct, there is a serious preservation deficit in the digital scholarly record.

https://www.iastatedigitalpress.com/jlsc/article/id/16288/

Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles

Introduction: Digital preservation underpins the persistence of scholarly links and citations through the digital object identifier (DOI) system. We do not currently know, at scale, the extent to which articles assigned a DOI are adequately preserved. Methods: We construct a database of preservation information from original archival sources and then examine the preservation statuses of 7,438,037 DOIs in a random sample. Results: Of the 7,438,037 works examined, there were 5.9 million copies spread over the archives used in this work. Furthermore, a total of 4,342,368 of the works that we studied (58.38%) were present in at least one archive. However, this left 2,056,492 works in our sample (27.64%) that are seemingly unpreserved. The remaining 13.98% of works in the sample were excluded either for being too recent (published in the current year), not being journal articles, or having insufficient date metadata for us to identify the source. Discussion: Our study is limited by design in several ways. Among these are the facts that it uses only a subset of archives, it only tracks articles with DOIs, and it does not account for institutional repository coverage. Nonetheless, as an initial attempt to gauge the landscape, our results will still be of interest to libraries, publishers, and researchers. Conclusion: This work reveals an alarming preservation deficit. Only 0.96% of Crossref members (n = 204) can be confirmed to digitally preserve over 75% of their content in three or more of the archives that we studied. (Note that when, in this article, we write “preserved,” we mean “that we were able to confirm as preserved,” as per the specified limitations of this study.) A slightly larger proportion, i.e., 8.5% (n = 1,797), preserved over 50% of their content in two or more archives. However, many members, i.e., 57.7% (n = 12,257), only met the threshold of having 25% of their material in a single archive. Most worryingly, 32.9% (n = 6,982) of Crossref members seem not to have any adequate digital preservation in place, which is against the recommendations of the Digital Preservation Coalition.

Journal of Librarianship and Scholarly Communication
Database of chemical reactions

The site offers an open-access database of chemical reactions, specified with reagents, catalysts, ligands, reactants and products

NNNS chemical reaction database

When it comes to removing a silyl group from an alkyne chemists are addicted to TBAF? In the blog! #chemistry #chemiverse

https://kmt.vander-lingen.nl/article/1022/Alkyne_protiodesilylation_by_the_numbers

Database of chemical reactions

The site offers an open-access database of chemical reactions, specified with reagents, catalysts, ligands, reactants and products

NNNS chemical reaction database
It bothers me so much that most used fonts have no easily visible differences between "I" and "l" -- there's quite a difference between "Weird Al" and "Weird AI" as only one gets permission for and pays to reuse the work from a talented artist.

Now available on Figshare: the USPTO 2023 organic reaction SMILES batch! 137K records! #chemiverse #chemistry

https://doi.org/10.6084/m9.figshare.24921555

Reaction SMILES USPTO year 2023

Collection of reaction SMILES (reactants, reagents, solvents, products) from USPTO as published in 2023. 137K lines total. Data scraping by custom design. Data extraction by OSCAR (semantic) and ChatGPT (LLM), molecule identification by OPSIN and custom synonym list. All SMILES are RDKit-safe. Please note that the data have been collected in an automated process, the dataset is certainly not without errors.

figshare