Looking back exactly one year ago, a certain part of the #Fediverse was involved with crucial data rescue operations. This was because the political data cleansing in the US started to unfold and a group of dedicated people chose to form a guerilla data rescue collective, which we now know as @SafeguardingResearch and #SciOp.

Today, I took the opportunity to take a look back at 2025's events while also covering present and future developments of this movement in the form of a lecture and hands-on within my #LIS studies course on e-Publishing.

It was such a relief to once again talk about positive movements in those dark times utilizing decentralized technologies for social impact, and to pay tribute to the marginalized groups who are keeping up their fights every day.

In the discussion afterwards, among Master students of Library and Information Sciences (most of us do also already work in libraries) we talked about how one can publish scientific works in order to prevent getting taken down due to ideological data cleansing.

We all agreed up on that FAIR #OpenAccess should be mandatory for all publications. I proposed a three-way approach for actually storing the publication data simultaneously: 1. on a repository of a trustworthy organization, 2. on a (own) publicly accessable website, and 3. on a repository powered by decentralized technologies of ones choice. I reflected up on a thread I read here back in April 2025, where among others @jonny and @nichtich had talked about this topic: https://neuromatch.social/@jonny/114310419885059486. Another take-away was to publish files as raw as possible, e.g. in plain text, Markdown or HTML. In addition to that, I would add that files (and metadata) should be stored tamper-resistant to ensure data integrity, and PDF files should be PDF/A of that kind that #DigiPres organizations currently recommend to be at least a bit more future-proof.

To take myself for an example, back a couple of years ago, starting with my bachelor thesis, I began to mirror all my publications to the decentralized storage network #IPFS, which also gives me data integrity and tamper resistance by content-addressing. Depending on the type of work I do also upload my stuff to my own website or Zenodo. I at least dual-store my work in raw text and PDF. The first setting or config I adjust with my word processing software is the PDF/A export setting, so that it will save files to PDF/A per default.

#SafeguardingResearch #DataRescue #DigitalPreservation #OpenScience #Science #GLAM #Libraries #DigitalHumanities #AcademicChatter #Research
jonny (good kind) (@jonny@neuromatch.social)

to any academics who want their work to outlast their current hype cycle, the next publisher downsizing, or government book burning: - make your work available for free. - as a PDF or HTML document (or, fine, JATS) - from a URL that requires no login or javascript to access. having your work only available behind some heinous publisher paywall or 'enhanced reader' is a straightforward statement that you don't care about your work being part of the shared pursuit of human understanding. it is, in fact, your job to make sure that your work available. sincerely, someone trying to preserve your work.

neurospace.live
Have any #DataRescue burning questions?? We have just the thing... We're hosting an #AMA. Join us on January 30, 6-8pm Eastern through bluesky! Please submit your questions through our form. If you can't make it, there is an option to receive a response via email!

DRP AMA 2026 Questions
DRP AMA 2026 Questions

This form is to collect questions for the Data Rescue Project's first "Ask Me Anything!" to be held on Bluesky January 30, 6-8pm Eastern. If you can't attend or on Bluesky, don't worry-- you can receive a response via email! Please note: All questions will be reviewed, and may be edited for clarity and length. Depending on volume received, we may not answer every question on the AMA. If you aren't already, don't forget to follow us on Bluesky. See you online!

Google Docs

The Record: Spotify disables accounts after open-source group scrapes 86 million songs from platform . “The spokesperson added that Anna’s Archive did not contact them before publishing the files. They also said it did not consider the incident a ‘hack’ of Spotify. The people behind the leaked database systematically violated Spotify’s terms by stream-ripping some of the music from the […]

https://rbfirehose.com/2025/12/25/the-record-spotify-disables-accounts-after-open-source-group-scrapes-86-million-songs-from-platform/
The Record: Spotify disables accounts after open-source group scrapes 86 million songs from platform | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

Anna’s Archive: Backing up Spotify. “Anna’s Archive normally focuses on text (e.g. books and papers). We explained in ‘The critical window of shadow libraries’ that we do this because text has the highest information density. But our mission (preserving humanity’s knowledge and culture) doesn’t distinguish among media types. Sometimes an opportunity comes along outside of text. This is […]

https://rbfirehose.com/2025/12/21/annas-archive-backing-up-spotify/
Anna’s Archive: Backing up Spotify | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

Flickr Blog: Building Flickr Archives with Data Lifeboat. “With Data Lifeboat, you can create an archive to document a specific time and place, share memories of an event, or curate a collection of perspectives from around the globe. Simply put, conscious archiving with Data Lifeboat can allow you to create and share your own slice of history with future viewers from this vast collection. Here […]

https://rbfirehose.com/2025/12/20/flickr-blog-building-flickr-archives-with-data-lifeboat/
Flickr Blog: Building Flickr Archives with Data Lifeboat | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

St. Louis Magazine: Moonlighting librarians save the RFT’s online archive from its post-porn purge. “A newly available digital archive that encompasses much of the recent history of the Riverfront Times went live yesterday. It is the brainchild of Joshua Lawrence and Jaclyn Crow, two St. Louisans with a passion for local history…. The database currently has about 2,000 articles from the […]

https://rbfirehose.com/2025/12/06/st-louis-magazine-moonlighting-librarians-save-the-rfts-online-archive-from-its-post-porn-purge/

St. Louis Magazine: Moonlighting librarians save the RFT’s online archive from its post-porn purge | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

Slaw: The Data Rescue Project: Preserving Government Data Is a Tech & Community Issue. “Precursors to the Data Rescue Project such as the End of Term Web Archive, which captures federal government data after presidential administration transitions, the 2017 Data Refuge Project, and the Environmental Data & Governance Initiative (EDGI), laid the groundwork for 2025 preservation efforts, but […]

https://rbfirehose.com/2025/12/01/the-data-rescue-project-preserving-government-data-is-a-tech-community-issue-slaw/

The Data Rescue Project: Preserving Government Data Is a Tech & Community Issue (Slaw) | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

Making 10M government PDF documents searchable https://flowingdata.com/2025/11/26/making-10m-government-pdf-documents-searchable/

"The code for GovScape is open source and available on GitHub."

#OpenData #OpenGov #OCR #DataRescue #GovDocs

Making 10M government PDF documents searchable

Government organizations love to distribute documents as PDF files. They are easy to forward and to print. The problem is when you want to find and access them later among millions of other files. …

FlowingData
We often discuss how public data influences our everyday lives whether we acknowledge it or not. This week's guest article highlights your daily interactions with public data: www.datarescueproject.org/guest-post-a... #PublicData #DataRescue

Guest Post: A Day in the Life ...
Guest Post: A Day in the Life with Federal Government Data

Today, we have the fourth post in the series from Claire McKay Bowen and Aaron R. Williams to help diverse audiences understand and support the federal statistical system. Everyone living in the United States is part of this vast statistical ecosystem and benefits from it—both directly and indirectly. Check

Data Rescue Project