Now the lightning talk on datasheets & data-envelopes presented by @sclaeyssens and Antoine Isaac at #FF2025
The slides of the presentation are available at
https://zenodo.org/records/17725565
#datasheets #data-envelopes #ML #collectionsasdata
Write it Down! Fostering Responsible Reuse of Cultural Heritage Data with Interoperable Dataset Descriptions
Abstract Cultural heritage institutions have seen a surge in the creation of datasets ready for computational use, while researchers increasingly experiment with datasets through computational processing and AI-assisted methods. For both groups, issues of transparency have sparked interest in developing documentation practices cutting across the artificial intelligence/machine learning (AI/ML) and the digital cultural heritage (DCH) sector, aiming to provide better information on e.g., the purpose, composition, reusability, collection processes and provenance, or societal biases reflected in datasets. The publication of Datasheet for Datasets (Gebru et al., 2021) and the Collections as Data movement (Padilla et al. 2023) have sparked the definition of guidelines for dataset creators and publishers who want to follow FAIR and CARE principles and make it easier for one to reuse their data in a responsible, well-informed manner. Gathering CH professionals, technical experts and humanities scholars from the Europeana Research and EuropeanaTech communities, the Datasheets for Digital Cultural Heritage working group has adapted existing ML documentation approaches to the DCH case. As a first outcome, a template (Alkemade et al, 2023) has sought to address the complexities of DCH datasets, shaped by layered curatorial decisions, often subject to evolving and non-linear trajectories. In the spirit of the common European data space for cultural heritage (2025), which is being deployed under the stewardship of the Europeana Initiative, the working group has then supported professionals interested in applying the template in their institutional context (see for example Lehmann et al., 2024) and fostered exchanges with other initiatives emerging at the European level and exploring suitable ways to describe datasets. One key initiative in this regard concerns the proposal for Data-Envelopes for Cultural Heritage (Luthra et al., 2024), which has focused specifically on providing machine-readable descriptions of datasets, especially considering the W3C Data Catalogue Vocabulary (DCAT) that is used in many data portals. The goal of this collaboration is both to validate and further refine the existing templates following a community-led approach, and to investigate how to ensure (human-machine) interoperability in the data space, which aims to establish a diverse data offer (including datasets suitable for AI applications, as illustrated by the AI4Culture platform (2025)) as well as making use of DCAT. Our contribution will report on the following ongoing work: Alignment with DCAT: DCH datasheet fields are being mapped to DCAT to enable machine-readability Alignment between DCH datasheets and data-envelopes, establishing conceptual and structural compatibility, and supporting future integration with other legal, technical and ethical frameworks. Gathering a set of exemplary dataset descriptions Creation of (prototype) tooling to support and simplify the creation, reuse and integration of descriptions into existing workflows. We also plan to discuss new items that will begin before the conference: Identify possible connections with data research plans and data management plans. This may extend to interoperability with emerging European Cultural Heritage Cloud (ECHOES, 2025). Establish a modular structure for descriptions, aiming at operationalising the templates by defining building blocks, including a ‘core’ common to most DCH collections and a series of ‘profiles’, tailored to research data management and AI/ML workflows (e.g., AI Model Research Documentation Sheet (AIRDocS) (Oberbichler, 2025) Providing guidance to use these modules and possibly develop custom ones. While some components remain under active development (e.g. prototype, profiles and guidelines for their development), we present this work in progress to foster dialogue and invite broader engagement from the Fantastic Futures community. References AI4Culture project (2025). AI4Culture, Empowering Cultural Heritage through Artificial Intelligence. https://ai4culture.eu Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Irollo, A., Lehmann, J., Neudecker, C., Osti, G., & van Strien, D. (2023, September 25). Datasheets for Digital Cultural Heritage Datasets—Template v.1. Zenodo. https://zenodo.org/records/8375034 Alkemade, H., Claeyssens, S., Colavizza, G., Freire, N., Lehmann, J., Neudecker, C., Osti, G., & Van Strien, D. (2023). Datasheets for Digital Cultural Heritage Datasets. Journal of Open Humanities Data, 9, 17. https://doi.org/10.5334/johd.124 Common European data space for cultural heritage (2025), Welcome to the Common European data space for cultural heritage. https://www.dataspace-culturalheritage.eu/en ECHOES project (2025), ECCCH, The Cultural Heritage Cloud, https://www.echoes-eccch.eu/ Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for Datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723 Luthra, M., & Eskevich, M. (2024). Data-Envelopes for Cultural Heritage: Going beyond Datasheets. In I. Siegert & K. Choukri (Eds.), Proceedings of the Workshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024 (pp. 52–65). ELRA and ICCL. https://aclanthology.org/2024.legal-1.9 Lehmann, J., & Schneider, S. (2024). Metadata of the "Alter Realkatalog" (ARK) of Berlin State Library (SBB). https://doi.org/10.5281/zenodo.13284442 Oberbichler, S. (2025). AI Model Research Documentation Sheet (AIRDocS). https://doi.org/10.5281/zenodo.15046713 Padilla, T., Scates Kettler, H., Varner, S., & Shorish, Y. (2023). Vancouver Statement on Collections as Data. https://zenodo.org/records/8342171 Pushkarna, M., Zaldivar, A., & Kjartansson, O. (2022). Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. 2022 ACM Conference on Fairness, Accountability, and Transparency, 1776–1826. https://doi.org/10.1145/3531146.3533231 World Wide Web Consortium. (2024). Data Catalog Vocabulary (DCAT) - Version 3. https://www.w3.org/TR/vocab-dcat-3/