I’m slowly and politely getting a large dataset (on the order of hundreds of millions of rows) via its API, so I wrote a script that logs progress and helps restart after errors. That logfile is going to end up being larger than the cross-national survey dataset I analyzed for my first empirical paper.
@kjhealy at least you are logging. I had one dataset I was collecting.(I think on Mastodon clients?) that broke and I had no idea.
I had another one (using a API I “found” by inpsecting a website) that I forgot was running and collected way more data than I intended. 🙃