Effective science isn't about a final publication; it's about the availability of data generated by research for reanalysis and reuse.

A healthy scientific workflow should make it trivial to incorporate prior data into your work.

Enter SciDataFlow's new simple feature: Assets⬇️

The data produced by a project is in essence a scientific "asset". Yet, all too often these data assets are lost and/or cannot be easily reused by others. We need to change this!

Since SciDataFlow's Data Manifest serves as a minimal recipe for easy data retrieval & sharing, it makes it effortless to download and incorporate data into your work.

SciDataFlow-Assets is a community-led effort to build these recipes for core datasets.

https://github.com/scidataflow-assets

SciDataFlow-Assets

Little recipes to download scientific data assets into your project. - SciDataFlow-Assets

GitHub

As a prototype, I have built a SciDataFlow Asset for the NYGC high-coverage 1000 Genomes data. You can see it here: https://github.com/scidataflow-assets/nygc_gatk_1000G_highcov

In just 2 lines, you can retrieve a Data Manifest from SciDataFlow-Assets and retrieve all 1000 Genome data concurrently:

GitHub - scidataflow-assets/nygc_gatk_1000G_highcov: NYGC high-coverage 1000 Genomes GATK Calls

NYGC high-coverage 1000 Genomes GATK Calls. Contribute to scidataflow-assets/nygc_gatk_1000G_highcov development by creating an account on GitHub.

GitHub

Writing and sharing a Data Manifest = making your scientific data an asset.

Please contribute, and I welcome any feedback!