I’m looking for some *really big* (ideally millions of rows) biological datasets for a “Data Science in Biology” course.

Ideally they should be:

* archived with a DOI
* have an associated paper or two, with some cool questions
* be messy observational data, or collated across many studies

If you have any pointers, I’d be extremely grateful! Please boost!

@RobLanfear
Another vote for GBIF (AKA the Global #Biodiversity Information Facility) - find them in the fediverse at @gbif
If you're particularly looking for messy data then you can examine the issues attached to each record which flag problems like flipped coordinates etc. All datasets assigned DOIs and custom downloads assigned DOIs too. Cited uses available for exploration here: https://www.gbif.org/resource/search?contentType=literature&literatureType=journal&relevance=GBIF_USED&peerReview=true
Theres also an #OpenData Ambassadors scheme, people listed here: https://www.gbif.org/composition/6iHKXo8pUyRPJ2Ut0683Z8/ambassadors
Resources

Search for resources in Global Biodiversity Information Facility. Free and Open Access to Biodiversity Data.