I’m looking for some *really big* (ideally millions of rows) biological datasets for a “Data Science in Biology” course.

Ideally they should be:

* archived with a DOI
* have an associated paper or two, with some cool questions
* be messy observational data, or collated across many studies

If you have any pointers, I’d be extremely grateful! Please boost!

@RobLanfear

I don't know what kind of biology data you are looking for, but there are two large ecology repositories that may be of interest.

Check https://www.movebank.org and specifically data repository for data that are published with papers: https://www.movebank.org/cms/movebank-content/data-repository.

Another one is https://www.gbif.org/data produced by @gbif.

Movebank