I’m looking for some *really big* (ideally millions of rows) biological datasets for a “Data Science in Biology” course.

Ideally they should be:

* archived with a DOI
* have an associated paper or two, with some cool questions
* be messy observational data, or collated across many studies

If you have any pointers, I’d be extremely grateful! Please boost!

@RobLanfear You might try iNaturalist. It's huge (tens of millions of entries) has a few thousand papers based on it, is really messy in a lot of ways, and your students can add their own datapoints.
https://dx.doi.org/10.15468/ab3s5x
iNaturalist Research-grade Observations

Observations from iNaturalist.org, an online social network of people sharing biodiversity information to help each other learn about nature. Observations included in this archive met the following requirements: * Published under one of the following licenses or waivers: 1) http://creativecommons.org/publicdomain/zero/1.0/, 2) http://creativecommons.org/licenses/by/4.0/, 3) http://creativecommons.org/licenses/by-nc/4.0/ * Achieved one of following iNaturalist quality grades: Research * Created on or before 2025-06-03 15:00:33 -0700 You can view observations meeting these requirements at https:…