I’m looking for some *really big* (ideally millions of rows) biological datasets for a “Data Science in Biology” course.

Ideally they should be:

* archived with a DOI
* have an associated paper or two, with some cool questions
* be messy observational data, or collated across many studies

If you have any pointers, I’d be extremely grateful! Please boost!

@RobLanfear Flow Cytometry data typically has millions of rows. Here's a website with lots of public datasets, including manuscript links:

https://flowrepository.org/public_experiment_representations

FlowRepository

FlowRepository is a public database of flow cytometry experiments where you can query and download data collected and annotated according to the MIFlowCyt standard. It supports storage, annotation, analysis, and sharing of flow cytometry datasets.