Your periodic reminder that just because a URL is saved at archive.org doesn't mean it's going to stay there.

Last year, I wrote a series about proxy services marketed to cybercriminals, and that relied heavily on Archive.org links to document various connections. After my story ran, the person that those links concerned asked Archive to remove those links from their database, which they did. The person in question came back and said hey, what you said in your story is wrong because there's no supporting evidence and you must remove this. Archive.org confirmed they removed all of the pages at the request of the domain holder, and that was that.

If you stumble upon a page that is in archive.org and you want to make sure there is a record that won't be deleted at some point, consider saving the page to archive.today/archive.ph

Alternatively, of course, you could save the page locally, using something like Firefox's built-in full page screenshot (right click on page). Better yet, save the Archive.org pages you want locally.

@briankrebs I wonder if the Internet Archive would be willing to host *hashes* of removed content.

@varx @briankrebs that would still present the issue of having a central point of attack (if you control the archive, you can change any hash at any time).

Only solution is a distributed blockchain.

@ligma @briankrebs Nothing so fancy needed. The attack scenario here is legal, not technical; if the Archive can defend holding hashes, then you yourself can host the hashed content.
@varx @briankrebs the problem you’re missing with having a centralized database of hashes is that they can simply be tampered with by anyone who has admin access. So you’re still trusting a very small group of people to safeguard the proof of authenticity for potentially valuable evidence, and that presents a very obvious attack vector for those with the means to execute it.

@ligma I think you're trying to solve a different problem.

The problem I'm trying to solve is this: The Internet Archive is a party who I already trust; I trust them both as a timestamping service ("this document existed in this form at this time", essentially like a notary) and as competent administrators. They are presumably being legally coerced via copyright claims into removing content, not into *altering* it. The removal is the problem to solve.

Blockchains are unnecessarily complicated solutions to the deletion problem and still don't solve the fundamental issue of "who will be the notary". They can say "this hash existed at this time" but they can't say "this hash was a true representation of X".

@varx I understand that, but what if the Internet Archive WASN'T trustworthy?

What if you had a digital document that ended up becoming evidence in a high profile court case, and the defendant managed to scrub the hash that authenticates it from the Archive (via corruption, bribery, extortion, whatever)?

Your document would be considered a potential forgery and the defendant walks free. THAT's the problem I was trying to solve in addition to yours.

@varx

> Blockchains are unnecessarily complicated solutions to the deletion problem and still don't solve the fundamental issue of "who will be the notary".

Wrong. A blockchain IS a necessary ingredient to the deletion problem IF you want to make deletions as difficult as possible to discourage them from happening, whether purposeful or accidental.

A *distributed* blockchain additionally solves the problem of who will be notary: whomever gets to produce the next block.

@varx

> They can say "this hash existed at this time" but they can't say "this hash was a true representation of X".

Not true for the system I proposed. In my proposal, the block producer(s) would only see pairs of URLs and their alleged hash as input, and would have to download and checksum their content independently before signing a block. Hence the requirement for a common archive format if those URLs represent web pages.

Block producers would be incentivized to keep each other honest.