In Richard Wallis's #LD4_2023 talk on a linked data discovery & management system for the Singapore national library & archives, I'm particularly impressed by the entity reconciliation process he's discussing. With messy data, you often have many different text strings in different records describing the same entity, so it's vital that those be normalized. He describes some interesting work designing similarity assessments between different strings in the bibliographic & authority data. #LD4
One thing that makes it work in his system is that librarians who notice problems with the data reconciliation can fix the data easily enough that special case processing is not required. (It's itself an impressive design that allows that to happen; I've worked with other systems where it's hard enough for people who notice a problem to correct the data and make the correction stick that it often doesn't happen.) #LD4_2023 #LD4