One for the #reproducibility nerds

Is there an accepted standard (or just some good examples) of how to include provenance metadata within computationally produced images

My specific use case is graphs generated by plotly (python) saved as png where I'd like to record some things like date, software version, data version in the image metadata

There are exif elements for date and "software" but is there a better solution than just shoving info in the description and/or title field? Something that can survive passage through a presentation an added bonus!

(And yes, I do want this in the image metadata, not as a separate file, I'll use the same info to write out a separate manifest)

@cameronneylon Interesting question!

I know about https://commons.wikimedia.org/wiki/Commons:Attribution_Generator for #CreativeCommons images (verbatim) reuse and I see that e.g. #MediaWiki extension https://www.mediawiki.org/wiki/Extension:Chart produces SVG files without any such metadata inside (presumably because they're not supposed to be downloaded?).

Commons:Attribution Generator - Wikimedia Commons

@cameronneylon I don't think there is an "accepted standard". Probably the most generic you could use is Dublin Core.

There is a XMP dc namespace https://www.exiftool.org/TagNames/XMP.html#dc which could be used to add metadata to the image directly.

> Something that can survive passage through a presentation an added bonus!

I wouldn't trust any processing steps to preserve image metadata though. I think most softwares just strip an image clean before using it.

XMP Tags

@quachpas dc certainly makes sense. And if I did it super properly if I end up in XML there's things like PROV as well

Some brief experimentation showed that XMP metadata seems to be preserved for (otherwise untouched) images that have been inserted into Google Slides and OO Impress presentations and then exported back out (Google doesn't make it easy but I found a way)

Obviously not to be relied on but a potentially useful backup in some cases.

@cameronneylon OME-TIFF/OME-XML might have something useful to you.

https://ome-model.readthedocs.io/en/stable/index.html#ome-tiff

OME Data Model and File Formats 6.3 Documentation โ€” OME Data Model and File Formats documentation

@cameronneylon You know about EXIF, so you know about embedding metadata in the image file itself. EXIF defines zillions of possible fields, not just free-text fields like description.

What's not working? No EXIF field whose title/purpose seems right for the provenance you're trying to store? PNG processing software won't handle the metadata right? PNGs need to be compatible with older systems that don't support EXIF stored in the files?

What is going wrong?

@paco Nothing going wrong (well not yet anyway). Was wondering whether someone had already done something or if there was any standard approach, specifically for scientific images where people tend to have strong opinions about how things should be done.

Should be able to use pillow or pngmeta to add the metadata. Just a question of whether I can add value by choosing a specific format or location

@cameronneylon Cool. I won't offer an opinion, then, becaus I don't really have useful experience to guide it. But I will probably lurk in the replies because it is a fascinating question.

@cameronneylon Maybe
https://github.com/adobe/XMP-Toolkit-SDK/blob/main/docs/XMPSpecificationPart2.pdf#G4.1133230 suggests stEvt:action of created, stEvt:when of the date of creation, and stEvt:softwareAgent with the software itself which I suppose would include its version.

There are also (from https://help.accusoft.com/ImageGear/v26.3/iptc-metadata-structure.html ) IPTC fields

62 "DigitalCreationDate" STRING 8
63 "DigitalCreationTime" STRING 11
65 "OriginatingProgram" STRING 32
70 "ProgramVersion" STRING 10

Otherwise, maybe https://schema.org/isBasedOn with a URL to the software and data basis for creation?

XMP-Toolkit-SDK/docs/XMPSpecificationPart2.pdf at main ยท adobe/XMP-Toolkit-SDK

The XMP Toolkit allows you to integrate XMP functionality into your product or solution - adobe/XMP-Toolkit-SDK

GitHub
@datum @cameronneylon XMP is where my mind went as well.

@cameronneylon Astronomy uses fits files for storing data, and usually has a lot of info in the header about the processing history.

https://en.wikipedia.org/wiki/FITS

FITS - Wikipedia