if you use #stocat for random sampling large files without having to materialize permutations (as with shuf), I've added just added a new flag -s/--seed so that you can make your sampling repeatable, e.g., in replication packages: https://gitlab.com/zacchiro/stocat
Stefano Zacchiroli / stocat · GitLab

stochastic cat (as in: the UNIX filter), selecting lines with uniform probability

GitLab

@mike nope, shuf outputs all lines in a randomized order, which is now what the spec required here.

BTW, since that message I've implemented #stocat to address this and submitted it for inclusion into #moreutils: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961638

#961638 - proposal: stocat - probabilistic cat - Debian Bug report logs

My fellow italian-speaking hackers will further appreciate the name. « Lo sai chi ti printa tanto? #STOCAT ! » (quasi-cit.)
#961638 - proposal: stocat - probabilistic cat - Debian Bug report logs