"Measuring the Impact of Digital Collections: Digital Content Reuse Assessment Framework Toolkit"
https://doi.org/10.5860/ital.v44i4.17404
#ResearchData #DigitalCollections #libraries
Measuring the Impact of Digital Collections: Digital Content Reuse Assessment Framework Toolkit | Information Technology and Libraries

Southern Methodist University: From rails to revolutions: New windows into the past in digital collections. “What do a sugar railway in Cuba, U.S. soldiers hunting Pancho Villa, and a priest blessing a taxi in Mexico have in common? They’re all part of a fascinating array of newly digitized materials now available in SMU’s Digital Collections.”

https://rbfirehose.com/2025/11/08/from-rails-to-revolutions-new-windows-into-the-past-in-digital-collections-southern-methodist-university/

From rails to revolutions: New windows into the past in digital collections (Southern Methodist University) | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz
#Tainacan: With nearly 2,000 active installations and over 50,000 downloads, the project has moved beyond "Beta." Having been tested by thousands, Tainacan version 1.0.0 is now launched, marking the completion of the initial roadmap. The primary goal of providing a #freesoftware solution for managing #digitalcollections and cultural objects in Brazilian institutions has been achieved, paving the way for broader applications.
#Museums of the Fediverse, give it a try!
https://tainacan.org/en/tainacan-1-0-0-an-open-source-flexible-and-powerful-software-for-creating-digital-archives-in-wordpress/
Tainacan 1.0.0 – An Open Source, Flexible and Powerful software for creating Digital Archives in WordPress – Tainacan

🎉Thrilled to share the publication of our #OpenAccess book 'Opening up our Heritage: Opportunities in Digitising and Promoting Cultural and Research #Collections', a collection of 19 chapters written by librarians and researchers on the #digitisation and promotion of cultural and scientific #heritage.

👉 HTML: https://e-publish.uliege.be/opening-up-our-heritage
👉 PDF and ePub: https://e-publish.uliege.be/opening-up-our-heritage/front-matter/free-download-buy/

#DigitalArchives #DigitalCollections #Preservation #Libraries #Metadata #Discoverability #OpenScience #Pressbooks

Report: "Funding for the #California Digital Newspaper Collection Was Restored—but UC Riverside Laid Off All of the Employees Responsible for the Project Anyway" (via Coachella Valley Independent) https://cvindependent.com/2025/07/losing-our-history-funding-for-the-california-digital-newspaper-collection-was-restored-but-uc-riverside-laid-off-all-of-the-employees-responsible-for-the-project-anyway/ #newspapers #digitalcollections #history #CDNC
Losing Our History? Funding for the California Digital Newspaper Collection Was Restored—but UC Riverside Laid Off All of the Employees Responsible for the Project Anyway

The CDNC includes content from hundreds of newspapers that have been published throughout the state, going back as far as 1846. As of this writing, there 23,449,221 pages in the CDNC archive—but the staff that managed the project was terminated.

Coachella Valley Independent

Institutional Books: A 242B token dataset from Harvard Library's collections

https://arxiv.org/abs/2506.08300

#HackerNews #InstitutionalBooks #HarvardLibrary #TokenDataset #OpenData #DigitalCollections

Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability

Large language models (LLMs) use data to learn about the world in order to produce meaningful correlations and predictions. As such, the nature, scale, quality, and diversity of the datasets used to train these models, or to support their work at inference time, have a direct impact on their quality. The rapid development and adoption of LLMs of varying quality has brought into focus the scarcity of publicly available, high-quality training data and revealed an urgent need to ground the stewardship of these datasets in sustainable practices with clear provenance chains. To that end, this technical report introduces Institutional Books 1.0, a large collection of public domain books originally digitized through Harvard Library's participation in the Google Books project, beginning in 2006. Working with Harvard Library, we extracted, analyzed, and processed these volumes into an extensively-documented dataset of historic texts. This analysis covers the entirety of Harvard Library's collection scanned as part of that project, originally spanning 1,075,899 volumes written in over 250 different languages for a total of approximately 250 billion tokens. As part of this initial release, the OCR-extracted text (original and post-processed) as well as the metadata (bibliographic, source, and generated) of the 983,004 volumes, or 242B tokens, identified as being in the public domain have been made available. This report describes this project's goals and methods as well as the results of the analyses we performed, all in service of making this historical collection more accessible and easier for humans and machines alike to filter, read and use.

arXiv.org
Article: “An Interface to View Collections of Visual Art” presents LadeCA.View—a visual tool to explore, describe, and analyze large image collections in the digital humanities.
https://link.springer.com/article/10.1007/s42803-022-00061-8
#DigitalHumanities #VisualCulture #DigitalArtHistory #InterfaceDesign #DigitalCollections #LadeCA #MuseumTech
An interface to view collections of visual art - International Journal of Digital Humanities

Art experts prefer being able to look at the individual images they are working on in the course of their research. However, if one were to look at digitally accessible images in the field of visual art, one would be dealing with billions of images; no one can handle visually examining such huge numbers of images one at a time. Therefore, art experts need special tools to examine and describe artworks in the context of other artworks. We used our experience from previous projects and interviews with members of the target group (art historians, curators, art dealers, and artists) to identify the central issues these experts encounter when working with large image collections and to determine the functionality and properties a system must offer to support their work. The results led to the customized interface LadeCA.View, which is now used in several projects. LadeCA.View enables experts to describe an exhibition or a collection of visual art in such a way that a user can obtain an overview of the intention, content, and structures of the exhibition or collection within a short period of time without looking at each image individually. LadeCA.View can also be used as an interface to probe more deeply into a collection or exhibition. In this paper we show the functions and visualizations of the interface and explain the design decisions. Furthermore, we outline LadeCA.View’s scope of applicability using three case studies

SpringerLink

Another great opportunity to join us at the University of Glasgow, as we invest and expand our digital capacity in #digitallibraries, #digitalarchives, #DigitalHumanities and #digitalcollections!

This particular post is Digital Research Collections Coordinator, overseeing the management and development of curated digital collections. Potential for some interesting #data wrangling too! Details below.

Digital Research Collections Co-ordinator https://www.jobs.gla.ac.uk/job/digital-research-collections-co-ordinator #Glasgow

Digital Research Collections Co-ordinator · University of Glasgow

Job PurposeTo manage and maintain efficient, high-quality workflows, processes and procedures for the delivery of the Digital Research Collections service.To...