Also, the map doesn't show #Crimea as part of #Ukraine, but instead is coloured in the same colour as russia. If they didn't want to include Crimean data, then colour it in grey, for example... Was this #ISSI2023 paper reviewed by anyone? Apart from Gerasimov? #CrimeaIsUkraine #shameontheoccupiers

Strange methodology: "We noted that only Ukrainians had surnames ending in chuk, iuk or skyi, and only Russians had surnames ending in (e.g., Lenin, Putin, Stalin). These lists of names with definite heritage enabled us to mark many of the names of the researchers in each oblast as either Russian (RU) or Ukrainian (UA)."

👉 https://doi.org/10.5281/zenodo.8280543

For decades, the Soviets forced Ukrainians to change their surnames to Russian. Excuse me, but these #ISSI2023 paper reek of Kremlin narratives. 🤬

Researchers in Ukraine: output, international collaboration, impact, heritage and sex, 1995 to 2021

The invasion of Ukraine by the Russian army in February 2022 caused us to look back at Ukrainian science before the war and appraise its strengths and weaknesses, so that after peace and reconstruction it may flourish again and be better attuned to the new era. We examined its publications recorded in the Web of Science (WoS) from 1995 to 2021, and evaluated their quantity and impact, their major fields and Ukraine’s international partners. We also looked at the outputs of the 26 individual oblasts or regions, and the heritage of the researchers in each, Ukrainian or Russian, based on their surnames and given names. Ukrainian scientific output, at about 0.5% of that of the world, was high relative to its wealth, but its impact was rather low, probably because of a lack of contestable funding. It was skewed towards the physical sciences and away from biology and medicine, and so may have led to poorer health in Ukraine than in its geographical neighbours. Its researchers were more likely to be of Ukrainian than Russian heritage the further west they were located. Women were well represented in Ukrainian science and formed more than half the total in most oblasts.

Zenodo
Despite this copy of Carpeaux's Ugolino near the entrance to #issi2023, we were fed really well

An interesting talk at #issi2023 by David Schindler, Erjia Yan, Sascha Spors, and Frank Krüger. They studied software used in retracted papers and compared it with the software used by the control group of similar non-retracted papers.

Retracted papers:

A) More often than the controls use commercial and closed source software instead of free and open source

B) More often do not cite software, providing just an informal mention.

My take: the good software practices might correlate with the good scientific practices overall.

Ana-Maria Istrate presented our work https://arxiv.org/abs/2209.00693 at #issi2023. A wonderful talk by a great coauthor.
A large dataset of software mentions in the biomedical literature

We describe the CZ Software Mentions dataset, a new dataset of software mentions in biomedical papers. Plain-text software mentions are extracted with a trained SciBERT model from several sources: the NIH PubMed Central collection and from papers provided by various publishers to the Chan Zuckerberg Initiative. The dataset provides sources, context and metadata, and, for a number of mentions, the disambiguated software entities and links. We extract 1.12 million unique string software mentions from 2.4 million papers in the NIH PMC-OA Commercial subset, 481k unique mentions from the NIH PMC-OA Non-Commercial subset (both gathered in October 2021) and 934k unique mentions from 3 million papers in the Publishers' collection. There is variation in how software is mentioned in papers and extracted by the NER algorithm. We propose a clustering-based disambiguation algorithm to map plain-text software mentions into distinct software entities and apply it on the NIH PubMed Central Commercial collection. Through this methodology, we disambiguate 1.12 million unique strings extracted by the NER model into 97600 unique software entities, covering 78% of all software-paper links. We link 185000 of the mentions to a repository, covering about 55% of all software-paper links. We describe in detail the process of building the datasets, disambiguating and linking the software mentions, as well as opportunities and challenges that come with a dataset of this size. We make all data and code publicly available as a new resource to help assess the impact of software (in particular scientific open source projects) on science.

arXiv.org
A very insightful presentation by Jian Qin at #issi2023 about bibliometrics of datasets. Datasets are a novel kind of publications, separate from papers, and require adjustments in our approach.
Caroline Wagner at #issi2023 discussed citation patterns for single authored papers by males and females. Looks like the fraction of single papers is higher for females in early years of carrier. The citations are higher for males in "hard" sciences, not so in "soft" ones.
@nalsi's excellent study presented at #ISSI2023 conference shows that one-third of STEM reseach output in China is published in languages OTHER than English
Derek de Solla Price Award talk at #issi2023 by Kevin Boyack & Richard Klavans stressed the necessity of global view for local problems. If we do clustering on a subset, we get different and worse results than if we cluster the set and project into the subset.
Congratulations to Kevin Boyack and Richard Klavans for receiving this year's Derek J. de Solla Price Medal at #issi2023