Ars Technica: Wikipedia blacklists Archive.today, starts removing 695,000 archive links. “In the course of discussing whether Archive.today should be deprecated because of the DDoS, Wikipedia editors discovered that the archive site altered snapshots of webpages to insert the name of the blogger who was targeted by the DDoS. The alterations were apparently fueled by a grudge against the blogger […]

https://rbfirehose.com/2026/02/21/ars-technica-wikipedia-blacklists-archive-today-starts-removing-695000-archive-links/
Ars Technica: Wikipedia blacklists Archive.today, starts removing 695,000 archive links

Ars Technica: Wikipedia blacklists Archive.today, starts removing 695,000 archive links. “In the course of discussing whether Archive.today should be deprecated because of the DDoS, Wikipedia…

ResearchBuzz: Firehose
De nombreux chercheur·es utilisent #ArchiveToday pour archiver des réseaux sociaux qu'ils citent dans leurs articles... Cela semble compromis désormais
#Wikipedia blacklists Archive.today, starts removing 695,000 archive links
https://arstechnica.com/tech-policy/2026/02/wikipedia-bans-archive-today-after-site-executed-ddos-and-altered-web-captures
#iloveinternetarchive
#webarchiving
#archives
Wikipedia blacklists Archive.today, starts removing 695,000 archive links

If DDoSing a blog wasn't bad enough, archive site also tampered with web snapshots.

Ars Technica
À lire aussi sur la question de l'#IA et la #waybackmachine Mark Graham
Generative #AI presents real challenges in today’s information ecosystem. But preserving the time-honored role of #libraries and #archives in society has never been more important. We’ve worked alongside news organizations for decades. Let’s continue working together in service of an open, referenceable, and enduring #web
#webarchiving
https://www.techdirt.com/2026/02/17/preserving-the-web-is-not-the-problem-losing-it-is
#iloveinternetarchive
Preserving The Web Is Not The Problem. Losing It Is.

Recent reporting by Nieman Lab describes how some major news organizations—including The Guardian, The New York Times, and Reddit—are limiting or blocking access to their content in the Internet Ar…

Techdirt
#WaybackMachine Director Pushes Back on AI Scraping Fears Driving Archive Blocks
https://blog.archive.org/2026/02/18/wayback-machine-director-pushes-back
As reported by Nieman Lab last month, some major media organizations—including The #NewYorkTimes, #TheGuardian, and #Reddit—have started blocking the Wayback Machine from archiving their sites over unfounded concerns about AI scraping.
Mike Masnick in #Techdirt explained why this is “a mistake we’re going to regret for generations.”
limiting #webarchiving threatens our shared #digitalhistory.
Wayback Machine Director Pushes Back on AI Scraping Fears Driving Archive Blocks | Internet Archive Blogs

Hmm, HTTP response headers are still encoded in latin-1

https://github.com/Kludex/starlette/pull/1236

#TIL #WebDevelopment #Unicode #WebArchiving

Hi I’ll be covering this #workflow for backing up WARCs from Archive-it to a state run LOCKSS program at the upcoming @dpc_chat workflows webinar. I also have info about running these websites offline. https://docs.google.com/document/d/14FZzbfICaddW1wJP8N1CQE6YZOHXjtT_ouAy1-YvsM0/edit?usp=sharing #digipres #webarchiving #warc #digitalpreservation Have a look!
Process for backing up WARCs from Archive-it to MDPN

Process for backing up WARCs from Archive-it to MiDPN Backup from Archive-It, partially based on this article: How to find and download your WARC files with WASAPI – Archive-It Help Center Basic Process: Get crawl ID(s) for particular seed Get WARC.gz file and page count Use "curl" to get list...

Google Docs

RE: https://mastodon.social/@cutterkom/115926148559409105

Update on the dataset that contains PII of trans persons living in the US: @SafeguardingResearch stopped distributing it via bitorrent after I reported it: https://sciop.net/datasets/nyc-trans-oral-history

Why? "Resilience makes p2p file sharing is such a compelling technology not only for pirated content, but also for scientific data and public records. But is it suitable for the life stories of marginalized people living in a country whose own government is persecuting them?"

https://katharinabrunner.de/2026/01/archival-demiground-thoughts-on-preserving-trans-oral-history

#webarchiving

Library of Congress: From Print Volumes to Digital Scholarship: The Handbook of Latin American Studies Web Archive. “Since the 1930s, the Handbook of Latin American Studies has documented scholarship on Latin America and the Caribbean. In this interview, Tracy North describes how that long-standing mission now extends to web archiving, ensuring long-term access to web-based research materials. […]

https://rbfirehose.com/2026/02/09/from-print-volumes-to-digital-scholarship-the-handbook-of-latin-american-studies-web-archive-library-of-congress/
From Print Volumes to Digital Scholarship: The Handbook of Latin American Studies Web Archive (Library of Congress)

Library of Congress: From Print Volumes to Digital Scholarship: The Handbook of Latin American Studies Web Archive. “Since the 1930s, the Handbook of Latin American Studies has documented sch…

ResearchBuzz: Firehose

Internet Archive and Partners Select Local Newsrooms from Across the US to Participate in the Today’s News for Tomorrow Program

https://fed.brid.gy/r/https://blog.archive.org/2026/02/06/internet-archive-and-partners-select-local-newsrooms-from-across-the-us-to-participate-in-the-todays-news-for-tomorrow-program/

Internet Archive and Partners Select Local Newsrooms from Across the US to Participate in the Today’s News for Tomorrow Program | Internet Archive Blogs

"In a bizarre act of cultural vandalism they've not just removed the entire site (including the archives of previous versions) but they've also set every single page to be a 302 redirect to their closure announcement."
https://fedi.simonwillison.net/@simon/116015180016712361

#webarchiving is an act of resistance against cultural vandalism.

Simon Willison (@simon@simonwillison.net)

The CIA just stopped publishing their World Factbook and took every page, including the archived copies of previous versions! This sucks. It was public domain, so I recovered the 2020 edition (the last one published as a zip file) and shared it to GitHub https://simonwillison.net/2026/Feb/5/the-world-factbook/

Mastodon