“We just launched a 16TB archive of every dataset that has been available on data.gov since November. This will be updated day by day as new datasets appear. It can be freely copied, and we're sharing the code behind it to help others make their own archives of data they depend on.” Harvard Library Innovation Lab (via BlueSky)

https://lil.law.harvard.edu/blog/2025/02/06/announcing-data-gov-archive/

https://bsky.app/profile/harvardlil.bsky.social/post/3lhjzh7f54226

#archival

Announcing the Data.gov Archive | Library Innovation Lab

@molly0xfff hey, thanks for doing this! I’m super interested to see what you’re collected. I’ll just need to increase my storage capacity by a factor of thirty.
@molly0xfff make it a torrent so we can all share!!
@notanonymous26 @molly0xfff As someone who thinks BitTorrent is underrated and should be used more... I don't think it's a good fit for a giant dataset updated daily.

@nicolas17 @notanonymous26 @molly0xfff

The people who invented IPFS ~10 years ago were aiming at exactly that problem: https://en.wikipedia.org/wiki/InterPlanetary_File_System

InterPlanetary File System - Wikipedia

@nicolas17 @notanonymous26 @molly0xfff I wish there was something like BitTorrent that could handle this, like Resilio Sync but open.
@molly0xfff I’m depending on libraries to save us from totally slipping into a new dark ages.

@molly0xfff

great contribution --pro democracy

@molly0xfff Thank you for securing vital information. 👏🏻 
@molly0xfff I've been wondering where Harvard would come down in this coup. Glad to see them doing at least some of the right thing.
@molly0xfff
Thank you, and it was great to hear you on Fast Politics too, really interesting chat 👍🏻
@molly0xfff
A modern equivalent of the famous Library of Alexandria.
@molly0xfff language is funny. “Launched” an archive? Thx for sharing

@molly0xfff

Ob
@lavaeolus
das schon gesehen hat?

@sinensetin @molly0xfff Rate mal wer das Spiegeln dieser 14TB nach UK und Australien bereits organisiert hat :)
(bereits abgeschlossen)
@molly0xfff Just wanted to say … I love that logo. And their vital work, too!
@molly0xfff Dont you mean to assist Dump & Musken to collect MORE information on the people of this country?
@molly0xfff fuck yes, totally fucking mirroring this

@freya @molly0xfff have you succeeded in doing a full mirror?

What's the best method to get *everything* in one go?

@jfbucas @molly0xfff nope. No clue how to do a full morror, the docs on the page reference aws(1) which doesn't exist / work, and the domain they reference doesn't resolve
@molly0xfff Is Harvard saving any copies/backups of these to storage locations within the EU?
@molly0xfff can't seem to mirror it with rclone, us-west-2. thing doesn't resolve?
@freya recommend getting in touch with the LIL folks! (I was briefly affiliated but no longer am after my fellowship ended) https://lil.law.harvard.edu/contact/
Contact | Library Innovation Lab

Sign up for our newsletter or shoot us an email if you’re interested in any of our projects. All of our code is open source. Send a pull request on GitHub if you want to collaborate.

@molly0xfff
Drive by shout out to the @datasette community for all they do to help #data #journalism