As a prototype, I have built a SciDataFlow Asset for the NYGC high-coverage 1000 Genomes data. You can see it here: https://github.com/scidataflow-assets/nygc_gatk_1000G_highcov
In just 2 lines, you can retrieve a Data Manifest from SciDataFlow-Assets and retrieve all 1000 Genome data concurrently:

GitHub - scidataflow-assets/nygc_gatk_1000G_highcov: NYGC high-coverage 1000 Genomes GATK Calls
NYGC high-coverage 1000 Genomes GATK Calls. Contribute to scidataflow-assets/nygc_gatk_1000G_highcov development by creating an account on GitHub.
GitHubWriting and sharing a Data Manifest = making your scientific data an asset.
Please contribute, and I welcome any feedback!
Since SciDataFlow's Data Manifest serves as a minimal recipe for easy data retrieval & sharing, it makes it effortless to download and incorporate data into your work.
SciDataFlow-Assets is a community-led effort to build these recipes for core datasets.
https://github.com/scidataflow-assets

SciDataFlow-Assets
Little recipes to download scientific data assets into your project. - SciDataFlow-Assets
GitHubThe data produced by a project is in essence a scientific "asset". Yet, all too often these data assets are lost and/or cannot be easily reused by others. We need to change this!
Effective science isn't about a final publication; it's about the availability of data generated by research for reanalysis and reuse.
A healthy scientific workflow should make it trivial to incorporate prior data into your work.
Enter SciDataFlow's new simple feature: Assets⬇️
However, the classic BGS theory (black line) is quite inaccurate (points are true values) when mutations are only weakly selected against. This has potential impacts on our model estimates. We use a whole new theoretic approach that works under weak selection (colored lines).
Previous work established that BGS is the dominant process generating large-scale patterns in genetic variability across chromosomes in humans. This signal is shaped by the spatial distribution of conserved regions and recombination rates along the genome.
Our simulations show our approach to interference does lead to more accurate predictions of genetic diversity. This is suggestive evidence that interference could be occurring in humans, but further work is needed. Overall, we still have a lot to learn about selection in humans!
By extending our method to approximate how selection in one region can impact selection in others ("selective interference") and refitting everything, we find this model fits as well and brings substitution rates into agreement with divergence levels (blue range in image above).
But, there is a problem: since our method also predicts substitution rates, we can compare these to observed divergence across features (teal and green ranges). We find our method (and previous BGS approaches) predicts far too low a substitution rate for very conserved regions.