I’m looking for some *really big* (ideally millions of rows) biological datasets for a “Data Science in Biology” course.

Ideally they should be:

* archived with a DOI
* have an associated paper or two, with some cool questions
* be messy observational data, or collated across many studies

If you have any pointers, I’d be extremely grateful! Please boost!

@RobLanfear massive amounts of distribution data in GBIF, which also gives DOIs to the data sets used by thousands of published papers that used GBIF data: https://www.gbif.org/
GBIF

Global Biodiversity Information Facility. Free and Open Access to Biodiversity Data.