I’m looking for some *really big* (ideally millions of rows) biological datasets for a “Data Science in Biology” course.

Ideally they should be:

* archived with a DOI
* have an associated paper or two, with some cool questions
* be messy observational data, or collated across many studies

If you have any pointers, I’d be extremely grateful! Please boost!

@RobLanfear
You might check out the Drosophila Evolution over Space and Time dataset (DEST) that we put together. It is a large population genomic dataset of pool-seq for flies and contains spatial and temporal samples, organized metadata, and is easily accessible in a variety of formats

https://academic.oup.com/mbe/article/38/12/5782/6361628
https://dest.bio

Drosophila Evolution over Space and Time (DEST): A New Population Genomics Resource

Abstract. Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome data sets from natural population

OUP Academic