Library Catalogues as Data: Research, Practice and Usage edited by Paul Gooding, Melissa Terras, Sarah Ames (Facet Publishing, 2025) is now in the press and coming soon! I contributed to the chapter "Effects of Open Science and the Digital Transformation on the Bibliographical Data Landscape" together with other members of @bibliodatawg. More info:
https://www.facetpublishing.co.uk/page/detail/library-catalogues-as-data/?k=9781783306589
Library Catalogues as Data

This book brings together leading practitioners and academic voices to discuss a range of topics surrounding library information and data.

Facet Publishing
@kiru oh, this looks interesting. If you're interested in case studies actually using catalogue data for analysis, I may #selfplug my recent paper on early modern dissertations in French libraries: https://doi.org/10.5334/johd.307 @bibliodatawg
Early Modern Dissertations in French Libraries: The EMDFL Dataset | Journal of Open Humanities Data

Journal of Open Humanities Data
@stefan_hessbrueggen @bibliodatawg Thanks a lot! Congratulation, it is very interesting, I will take a deeper look. I have two comments at first sight: "Table 2. Mapping BNF MARC fields to ‘categories’, the human-readable meaning for a given MARC field" -- these fields are not MARC but UNIMARC, and similary "Table 3: Mapping SUDOC MARC fields to ‘categories’, the human-readable meaning for a given MARC field..." - these are neither MARC, but PICA fields.
@kiru oh. Demonstrating my lack of knowledge about metadata formats. Not an excuse, but I *think* the documentation of SRU interfaces did not clarify the exact standard (or I didn't notice it). @bibliodatawg
@stefan_hessbrueggen @bibliodatawg Don't worry, it is not that important. A question: did you resolve place name strings with inflections, and multiple place names, e.g. "Augustæ Vindelicorum Et Wirceburgi" or "Posonij"?
places · main · Stefan Hessbrueggen / Early Modern Dissertations in French Libraries · GitLab

GitLab.com

GitLab
@kiru I think that the WorldCat Identities based on data mining the database is (was? getting a 404) a great use of the wealth of data in catalogs. This kind of thinking requires us to stop seeing a catalog as an inventory of separate records but as a rich knowledge environment. Doing so would also help us understand what additional data we need, and what isn't serving us well.
@kcoyle Yes, that is true. But I would extend this concepts to other kinds of identifiers, e.g. I just learn some days ago how intensively KBR (ex Royal Library of Belgium) injects ISNI identifiers to their catalogue. The questions are: 1) what about coverage? 2) can external researchers uses WorldCat or ISNI or other library related APIs (e.g. for concepts: DDC, UDC etc.)? I am happy to see that in recent MARC standard updates there are more place for identifiers and provenance information.
@kiru It's hard for me to see ISNI or VIAF as data that catalogers need to add to MARC records. I would like to see some attempts to assign those algorithmically. Edward Betts did some interesting coding for the Open Library by combining names and book titles to identify an author.
@kiru As for DDC/UDC/LCC, it should be possible to search hierarchically, seeing what the library has at each node. We can't expect users to magically zero in on a precise category; classification should be all about exploring from general to specific.
@kcoyle Again: I agree. We should have smarter interface for numerical and highly formulated classification schemes, but in order to do that they should be open access datasets. I could imagine different nice user interfaces to build on their hierarchical structure at background, but on the front end the user would communicate with human terms.
@kiru Ah, yes, open access. Library of Congress is but Dewey is not. It really SHOULD be.
@kcoyle Not manually. I agree with you that in scale it can be done only programmatically with a human-in-the-loop approach to supervise the results. See e.g. @SvenLieber work at the BELTRANS project, e.g. https://zenodo.org/records/7372986. I am also doing similar things to extract place names, translators etc. from non authorised MARC fields. Do you have a reference for Edward's work?
A LITL more quality: Improving the correctness and completeness of library catalogs with a Librarian-In-The-Loop Linked Data Workflow

A presentation given at the Semantic Web in Libraries (SWIB) conference on November 28, 2022. The presentation focuses on what data quality even is, which quality procedures we have at the Royal Library of Belgium and how using your data will likely reveal (subtle) more quality issues. I briefly present the BELTRANS research project and how it motivated our Librarian-In-The-Loop workflow, in which a librarian is presented with CSV files of possibly wrong data. The incorrectness is not apparent in a record, but is revealed when displaying the record next to another record linked via schema:sameAs. The librarian can indicate what is wrong and correct with different CSV files which are used to correct the data in a semi-automatic fashion. We present the use case, the workflow and lessons learned!

Zenodo
@kiru @SvenLieber he didn’t write about it, but he’s on here so you might be able to speak to him