Hey #library folks 👋 ,
do you want to cluster your book editions with the well-known Work-set algorithm from #OCLC, but you don't find a suitable reusable tool?
I recently faced this issue while working on the #BELTRANS project at KBR (Royal Library of Belgium). All I found were many research papers describing the clustering and a few implementations that required me to install 2010-style Java software stacks.
So I decided to write an easily reusable small #Python script that follows the ideas of the Work-set algorithm: clustering based on descriptive keys. Nothing more, nothing less.
Check my blog post for more information and have a look at the script.
➡️ blog post: https://doi.org/10.59350/4hd4r-1tk44
➡️ script: https://doi.org/10.5281/zenodo.10011416
Clustering Book editions | Sven Lieber
What do the books “The invention of Nature” and “De uitvinder van de natuur” have in common? Well, they are both different versions of the same work “The invention of nature” by Andrea Wulf, whether it is in a different format or a different language. In this blog post I will briefly introduce the advantages of keeping work-level records in library catalogs. Furthermore, I will introduce a fast Python implementation (DOI: 10.5281/zenodo.10011416) which we used in the BELTRANS project to identify the works in a corpus of book translations #FRBRization.