Revisiting bsdiff as a tool for digital preservation
by @beet_keeper
I introduced bsdiff in a blog in 2014. bsdiff compares the differences between two files, e.g. broken_file_a and corrected_file_b and creates a patch that can be applied to broken_file_a to generate a byte-for-byte match for corrected_file_b.
On the face of it, in an archive, we probably only care about corrected_file_2 and so why would we care about a technology that patches a broken file?
In all of the use-cases we can imagine the primary reasons are cost savings and removing redundancy in file storage or transmission of digital information. In one very special case we can record the difference between broken_file_a and corrected_file_b and give users a totally objective method of recreating corrected_file_b from broken_file_a providing 100% verifiable proof of the migration pathway taken between the two files.
#ac3 #archives #audio #audiovisual #audit #authenticity #av #bash #bsdiff #checksums #code4lib #corruption #corruptionIndex #digipres #digitalArchiving #digitalForensics #digitalLiteracy #digitalPreservation #digitalStorage #diplomatics #fileFormats #glitch #glitchAudio #glitchart #integrity #preservationAnalysis #preservationMetadata #provenance #sensitivityIndex #storage
File format building blocks: primitives in digital preservation
by @beet_keeper
A primitive in software development can be described as:
a fundamental data type or code that can be used to build more complex software programs or interfaces.
– via https://www.capterra.com/glossary/primitive/ (also Wiki: language primitives)
Like bricks and mortar in the building industry, or oil and acrylic for a painter, a primitive helps a software developer to create increasingly more complex software, from your shell scripts, to entire digital preservation systems.
Primitives also help us to create file formats, as we’ve seen with the Eyeglass example I have presented previously, the file format is at its most fundamental level a representation of a data structure as a binary stream, that can be read out of the data structure onto disk, and likewise from disk to a data structure from code.
For the file format developer we have at our disposal all of the primitives that the software developer has, and like them, we also have “file formats” (as we tend to understand them in digital preservation terms) that serve as our primitives as well.
Continue reading “File format building blocks: primitives in digital preservation”…
#archives #digipres #digitalPreservation #digitalPreservationEssentialism #diplomatics #eyeglass #eygl #fileFormats #informationRecordsManagement #irm #json #jsonid #openData #openSource #rdm #researchData #researchDataManagement #xml
The sensitivity index: Corrupting Y2K
by @beet_keeper
In December I asked “What will you bitflip today?” Not long after, Johan’s (@bitsgalore) Digtial Dark Age Crew released its long lost hidden single Y2K — well, I couldn’t resist corrupting it.
Fixity is an interesting property enabled by digital technologies. Checksums allow us to demonstrate mathematically that a file has not been changed. An often cited definition of fixity is:
Fixity, in the preservation sense, means the assurance that a digital file has remained unchanged, i.e. fixed — Bailey (2014)
It’s very much linked to the concept of integrity. A UNESCO definition of which:
The state of being whole, uncorrupted and free of unauthorized and undocumented changes.
Integrity is massively important at this time in history. It gives us the guarantees we need that digital objects we work with aren’t harboring their own sinister secrets in the form of malware and other potentially damaging payloads.
These values are contingent on bit-level preservation, the field of digital preservation largely assumes this; that we will be able to look after our content without losing information. As feasible as this may be these days, what happens if we lose some information? Where does authenticity come into play?
Through corrupting Y2K, I took time to reflect on integrity versus authenticity, as well as create some interesting glitched outputs. I also uncovered what may be the first audio that reveals what the Millennium Bug itself may have sounded like! Keen to hear it? Read on to find out more.
Continue reading “The sensitivity index: Corrupting Y2K”…
#ac3 #archives #audio #audiovisual #authenticity #av #bash #checksums #code4lib #corruption #corruptionIndex #digipres #digitalArchiving #digitalLiteracy #digitalPreservation #diplomatics #fileFormats #flac #glitch #glitchart #glitchaudio #integrity #mp3 #sensitivityIndex #wav