Maneage -- Managing data lineage

The #Maneage #reproducibility system for scientific research papers that starts from a minimal POSIX-like host OS does not yet build [1] the #GNUCLibrary = #GLibC . We have a draft implementation building glibc *after* #GCC [2]; and an alternative proposal arguing that building glibc *first* and gcc second would be more long-term sustainable [[1] comment18].

Should GLibC be built first? Why (or why not)?

[1] https://savannah.nongnu.org/task/?15390
[2] https://gitlab.com/maneage/project-dev/-/blob/glibc/reproduce/software/make/core-gnu.mk#L718

build glibc first
50%
build gcc first
50%
Poll ended at .

Is a peer-reviewed #Maneage paper [1] (software + full results) reproducible from scratch?

Same author+machine; OS Debian stable updated 2021...2025.

Reproduction to final pdf by merge to current maneage 'software/' +minor hacks +disable a few verifications [2].

Result: final pdf has small but scientifically negligible diffs [3]; (due to python/numpy int or float changes?).

#Reproducibility

[1] https://peerj.com/articles/11856
[2] https://codeberg.org/boud/subpoisson/src/branch/202503_reproduce_merge_maneage commit f554c7e9
[3] https://codeberg.org/boud/subpoisson/commit/f554c7e9d5fc224d01b9db1126427afe7fb37784

Anti-clustering in the national SARS-CoV-2 daily infection counts

The noise in daily infection counts of an epidemic should be super-Poissonian due to intrinsic epidemiological and administrative clustering. Here, we use this clustering to classify the official national SARS-CoV-2 daily infection counts and check for infection counts that are unusually anti-clustered. We adopt a one-parameter model of $\phi _i^{\prime}$ϕi′ infections per cluster, dividing any daily count ni into $n_i/ _i^{\prime}$ni/ϕi′ ‘clusters’, for ‘country’ i. We assume that ${n_i}/\phi _i^{\prime}$ni/ϕi′ on a given day j is drawn from a Poisson distribution whose mean is robustly estimated from the four neighbouring days, and calculate the inferred Poisson probability $P_{ij}^{\prime}$Pij′ of the observation. The $P_{ij}^{\prime}$Pij′ values should be uniformly distributed. We find the value $\phi_i$ϕi that minimises the Kolmogorov–Smirnov distance from a uniform distribution. We investigate the (ϕi, Ni) distribution, for total infection count Ni. We consider consecutive count sequences above a threshold of 50 daily infections. We find that most of the daily infection count sequences are inconsistent with a Poissonian model. Most are found to be consistent with the ϕi model. The 28-, 14- and 7-day least noisy sequences for several countries are best modelled as sub-Poissonian, suggesting a distinct epidemiological family. The 28-day least noisy sequence of Algeria has a preferred model that is strongly sub-Poissonian, with $\phi _i^{28} < 0.1$ϕi28<0.1 . Tajikistan, Turkey, Russia, Belarus, Albania, United Arab Emirates and Nicaragua have preferred models that are also sub-Poissonian, with $\phi _i^{28} < 0.5$ϕi28<0.5 . A statistically significant (Pτ < 0.05) correlation was found between the lack of media freedom in a country, as represented by a high Reporters sans frontieres Press Freedom Index (PFI2020), and the lack of statistical noise in the country’s daily counts. The ϕi model appears to be an effective detector of suspiciously low statistical noise in the national SARS-CoV-2 daily infection counts.

PeerJ

@leibnizopenscience

The paper seems to have missed a powerful workflow language: #Make [1], with #GNUMake [2] being the canonical example. It's stable and nearly half a century old. Learn and use it now and your scientific grandchildren will be able to reproduce your workflow in 2075 [3]. #Maneage [3][4] uses Make for *both* reproducible software + reproducible workflows.

[1] https://en.wikipedia.org/wiki/Make_%28software%29

[2] https://gnu.org/software/make

[3] https://maneage.org

[4] https://doi.org/10.1109/MCSE.2021.3072860

Make (software) - Wikipedia

An official PhD course in #ReproducibleResearchPapers will start next week [1]. Unofficial participation is welcome in the #ManeageCommunity room [2] (curated homeservers [3]), where much of the practical sessions will take place (days/times TBD). The focus is on #ReproducibleAstronomy, but #Maneage is (in principle) usable in any field of science.

@academicchatter #OpenScience #Reproducibility #Astronomy

[1] https://usosweb.umk.pl/kontroler.php?_action=katalog2%2Fprzedmioty%2FpokazPrzedmiot&kod=7404-REPASTR&lang=en

[2] https://matrix.to/#/#maneage_community:matrix.org

[3] https://servers.joinmatrix.org

Reproducibility in astronomy research papers - Uniwersytet Mikołaja Kopernika w Toruniu

@Pol

FOSS is criterion 8 of the eight #Maneage criteria for long-term archivable reproducibility [1][2].

Proprietary software is not reproducible because it "typically cannot be distributed, inspected, or modified by others. [It is], thus, reliant on a single supplier (even without payments) and prone to proprietary obsolescence. [f]"

@zimoun

[1] https://maneage.org

[2] https://doi.org/10.1109/MCSE.2021.3072860 = https://arxiv.org/abs/2006.03018 = https://zenodo.org/records/6533902

[f] https://www.gnu.org/proprietary/proprietary-obsolescence.html

Maneage -- Managing data lineage

Can #cosmology research papers satisfy all 8 #Maneage reproducibility criteria?

At least 3 cosmology papers have been published using the #Maneage shell/make template.

All welcome in my online+f2f seminar (BBB) [1] tomorrow (CET) at 10:15 UTC = 11:15 CET Monday 11 March 2024.

* pdf [2]
* Matrix [3]

#OpenScience #Reproducibility
@cosmology

[1] https://astro.umk.pl/en/institute/general-seminar - https://vc.umk.pl/b/mar-byg-8yu-z15

[2] https://cosmo.torun.pl/~boud/Roukema20240311IANCU.pdf

[3] https://matrix.to/#/#maneage_community:matrix.org

General Seminar - Institute of Astronomy - Nicolaus Copernicus University in Toruń

General Seminar, Institute of Astronomy, Uniwersytet Mikołaja Kopernika w Toruniu.

@ethanwhite @tpoisot

In #Maneage [1], level 4+ is different:

* in analysis/ we use 'make' for the higher-level workflow, encouraging bash scripts for details;

* in software/ we use 'make' to build all the software with sha512sum checks on the downloads, starting from a minimal unix-like system;

* the makefiles initialize.mk and paper.mk are the workflow for the paper

Fully reproduce:
./project configure
./project make

Example: [2]

[1] https://maneage.org
[2] https://zenodo.org/record/7792910

Maneage -- Managing data lineage

@civodul @khinsen

We use #CosmicVoids in [1][2], which in N-body sims are traced by low num-densities of particles => high noise. Full #Maneage controls + fixed seed rng's. We still have intramachine + (higher) intermachine randomness. Statistical upper limits to results OK. But still untraced sources of randomness.

Any clues for remaining randomness [2]?

#Reproducibility #ArXiv_2304_00591 #OpenScience

[1] Frozen record: https://zenodo.org/record/7792910

[2] Live git: https://codeberg.org/mpeper/lensing

Detecting cosmic voids via maps of geometric optics parameters

lensing-e4f7af0.pdf  - article in pdf format void_matches*.dat - plain text results files corresponding to Table 3 and Figures 2, 4, 6, 8. lensing-e4f7af0-journal.tar.gz - source package for producing the article pdf, together with the reproducibility package, but without the git history; appropriate for ArXiv lensing-e4f7af0-git.bundle - git source package that can be unbundled with 'git clone lensing-e4f7af0-git.bundle' and used for reproducibility: to download data, do calculations, analyse them, plot them and produce the article pdf software-e4f7af0.tar.gz - this should contain all the software, apart from a minimal POSIX-compatible system and LaTeX packages, needed for compiling and installing the software used in producing this paper lensing-e4f7af0-snapshot.tar.gz - source files of the project; these should be enough, provided that external software packages can be downloaded, to reproduce the full project The authors grant a perpetual, non-exclusive licence to distribute this pdf preprint. All the other materials here are free-licensed, as stated in the individual files and packages.

Zenodo