Hacking the DROID Signature File for Characterization

by @beet_keeper

Identification of a format can be approached from many angles. Often a magic

number will be used at the beginning of a file. This may be strengthened by the addition of similarly consistent bytes at the end of a file or indeed any part of the bitstream inbetween. Using the sample file format we created last week the magic number to identify it is specified


#Characterization #DigitalPreservation #DROID #FileFormats #LinkedData #MagicNumbers #PRONOM #SPARQL
Non, un enseignant irlandais n'a pas été arrêté pour avoir refusé d'utiliser le pronom "iel" en cours - Les Surligneurs

Les Surligneurs

Now is the time for a shout-out to @exponentialdecay for his file format signature development utility (https://ffdev.info/) and the SPARQL endpoint to #PRONOM data (btw, I can't find the URL any more...!).

And also to @richardlehane for Siegfried + Roy, always incredibly useful tools.

#digipres

Signature development utility 2.0

Signature development utility 2.0

@Thorsted @leninoc @britpunk80

is there a chance that someone would show up at the #PRONOM drop-in session this week??

I have a few questions on LaTeX files, PDF/A and Virtual Instruments files 😀 !

whole room of people creating #pronom signatures, amazing! at Searching for a signature workshop by @Thorsted and @britpunk80 #ipres2025
Hey #PRONOM folks! I was wondering whether some signatures of source code files (e.g., https://www.nationalarchives.gov.uk/PRONOM/fmt/938 ) could be improved by adding a pattern based on the shebang (https://en.wikipedia.org/wiki/Shebang_(Unix))? Would there be side effects?
PRONOM | Search by format

PRONOM is an online technical registry providing impartial and definitive information about file formats, software products and other technical components required to support long-term access of electronic records.

A couple exciting updates in the #fileformat space this week. First #PRONOM released a new signature, v121, with 29 new PUIDs, 29 new signatures and 18 updates. Including an APK signature! @BertrandCaron #digipres https://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml 1/2

Shattering the eyeglass: Using Kaitai Structs to dissect the eyeglass’ contents


by @beet_keeper

In my post from 2012: Genesis of a File Format, I created a new file format – the Eyeglass file format. The format provides a mechanism to persist information about a patient’s eye health following a checkup at an opticians. Today in 2023 we can use the format to understand how to make use of Kaitai Structs for understanding file formats.

Given the disclaimer that I am not actually an optician and that the format is purely illustrative, let’s look at the eyeglass again below.

Continue reading “Shattering the eyeglass: Using Kaitai Structs to dissect the eyeglass’ contents”

#code #coding #digipres #digitalLiteracy #digitalPreservation #fileFormat #fileFormatAnalysis #fileFormats #kaitai #pronom #yyyy

What information is in a file format identification report?


by @beet_keeper

In early 2022, I was finally able to get around to writing a paper that I had been thinking about for the better part of a decade. The paper, “Fractal in Detail: What Information Is in a File Format Identification Report?” was published in the Code4Lib journal Issue 53.

The paper takes a deep dive into the fractal contents of file format identification reports exported from tools like Siegfried and DROID.

Let’s take a brief look the article and its contents below.

Continue reading “What information is in a file format identification report?”

#code4lib #code4libJournal #digipres #digitalPreservation #droid #fileFormatAnalysis #fileFormatIdentification #fileFormats #filedriller #formatIdentification #freud #linting #metadata #preservationMetadata #pronom #puid #puids #siegfried #staticAnalysis #technicalMetadata

Architecture of The-FR.org


by @beet_keeper

Last week I blogged about the publication of a new linked data format registry based on the work I did previously at The National Archives, UK.

Where the work goes, we will have to see. Open sourcing it was an important goal of the short sprint. Partly because I hope it demonstrates an architecture that can be adopted for a similar registry, and it may also provide a code-base that can be adapted for similar, linked open data projects. This blog provides an overview of that architecture…

Continue reading “Architecture of The-FR.org”

#Data #formatRegistry #LinkedOpenData #PRONOM #PRONOMLite #SoftwareArchitecture #SPARQL #theFrOrg