Mastodawn

Jörg Preisendörfer Jun 24, 2024

A package implementing #AutoCal was published on #PyPI:

The source sits on #GitHub in a repository administered by the main author of the paper (who is not in the #Fediverse so far):

🔗 https://github.com/ae3000/matchain

Also, there's no implementation in #CommonLisp so far. 😇

4/4

🌺

🏷️ #InstanceMatching #RecordLinkage #OntologyMatching #ArtificialIntelligence #MatChain #WorldAvatar #DigitalTwin #WebSem #LinkedData #KnowledgeGraph #MachineLearning #DeepLearning #Python #Lisp

matchain

Record linkage - simple, flexible, efficient.

PyPI

Show thread

Jörg Preisendörfer Jun 24, 2024

From the abstract:

›We also select an unsupervised state-of-the-art matcher from the field of #DeepLearning for a thorough comparison.

Our results show that neither #AutoCal nor the state-of-the-art matcher is superior regarding matching quality while AutoCal has only moderate hardware requirements and runs 2.7 to 60 times faster.‹

3/4

🌺

🏷️ #InstanceMatching #RecordLinkage #OntologyMatching #ArtificialIntelligence #MatChain #PyPI #WorldAvatar #DigitalTwin #WebSem #LinkedData #KnowledgeGraph

Show thread

Jörg Preisendörfer Jun 24, 2024

From the abstract:

›We introduce #AutoCal, a new #InstanceMatcher which does not require #LabelledData and runs out of the box for a wide range of domains without tuning method-specific parameters.

AutoCal achieves results competitive to recently proposed unsupervised matchers from the field of #MachineLearning.‹

2/4

🌺

🏷️ #InstanceMatching #RecordLinkage #OntologyMatching #ArtificialIntelligence #MatChain #PyPI #WorldAvatar #DigitalTwin #Python #WebSem #LinkedData #KnowledgeGraph

Jörg Preisendörfer Jun 24, 2024

May I kindly draw your attention to this scientific paper in the #JournalOfWebSemantics, since my fate was to read many versions of it and to comment extensively:

›A simple and efficient approach to #unsupervised #InstanceMatching and its application to #LinkedData of #PowerPlants‹

→ https://doi.org/10.1016/j.websem.2024.100815

1/4

🌺

🏷️ #MachineLearning #InstanceMatching #RecordLinkage #OntologyMatching #ArtificialIntelligence #AutoCal #MatChain #PyPI #WorldAvatar #DigitalTwin #Python #WebSem #KnowledgeGraph

Maciej Beręsewicz May 10, 2024

Are you dealing with #RecordLinkage or #deduplication You can check out the #rstats {blocking} package, which can help you significantly improve your pipeline and reduce FDR due to blocking. It uses several nice #ANN algorithms via excellent {rnndescent}, {RcppHNSW} and {RcppAnnoy} packages. It is nicely integrated with the {reclin2} package and works well with {fastLink}. Feel free to check it out in your applications and send us feedback! https://github.com/ncn-foreigners/blocking

GitHub - ncn-foreigners/blocking: An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.

An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms. - ncn-foreigners/blocking

GitHub

CPIPR Feb 16, 2024

Commonly used methods for linking CPS ASEC files do not address how to link the ASEC oversample records across years, leading to smaller linked sample sizes. A new paper demonstrates how to recover the linkable oversample cases in the 2005-2020 ASEC, resulting in about 150,000 more linked records (30% increase in the overall linked sample size).
https://pubmed.ncbi.nlm.nih.gov/38264507/
#Data #Statistics #Methods #DataScience #RecordLinkage

Research Note on Linking CPS ASEC Files - PubMed

Measuring change over time in areas such as family structure, employment, income, and poverty is of great interest to social scientists. The panel component of the Current Population Survey (CPS) affords the opportunity to observe short-term change in these areas. The Annual Social and Economic supp …

PubMed

Dr. Michael Hägele Oct 13, 2023

Verbesserung des #RecordLinkage für die #Gesundheitsforschung in Deutschland: Es ist aktuell schwer, #Gesundheitsdaten aus verschiedenen Quellen zu verknüpfen, was die Gesundheitsversorgung behindert. Ein #WhitePaper benennt Schwachstellen und zeigt Lösungswege. https://e-health-com.de/details-news/white-paper-verbesserung-des-record-linkage-fuer-die-gesundheitsforschung-in-deutschland/

White Paper: Verbesserung des Record Linkage für die Gesundheitsforschung in Deutschland

In Deutschland ist es aktuell schwer, Gesundheitsdaten aus verschiedenen Quellen zu verknüpfen. Das behindert die Gesundheitsversorgung deutlich – auch gerade im Vergleich zu europäischen Nachbarländern. Ein neues White Paper benennt Schwachstellen und zeigt Lösungswege auf.

Claudia Solis-Lemus Feb 11, 2023

New talk alert! #ElZoominario: short scientific talks by #LatinxInSTEM

Watch Brenda Betancourt 🇨🇴 talk about #RecordLinkage 💽 and traditions in Tolima, #Colombia

https://youtu.be/8aRcH_LYr7E

#scicomm #statistics #DataScience #databases

El Zoominario: Introduction to record linkage and its applications -- Brenda Betancourt

YouTube

ZfdG Jan 26, 2023

Im ersten ZfdG-Beitrag des Jahres stellen Jan Michael Goldberg & Marcel Mernitz einen automatisierten Ansatz zum #RecordLinkage in prosopographischen Datenbeständen am Beispiel historischer Quellen Leipzigs vor: https://zfdg.de/2023_001
Der Programmcode ist in Python realisiert worden und frei verfügbar: https://git.hab.de/forschungsdaten/zeitschrift-fuer-digitale-geisteswissenschaften/goldberg-record

@DHd #Genealogy #OpenAccess #DigitalHumanities

Automatisiertes Record Linkage in prosopographischen Datenbeständen am Beispiel historischer Quellen Leipzigs | ZfdG - Zeitschrift für digitale Geisteswissenschaften

Dieser Beitrag stellt einen automatisierten Ansatz zum Record Linkage in prosopographischen Datenbeständen vor. Der Programmcode ist in Python realisiert worden und frei verfügbar.

Zane Selvans Dec 1, 2022

I love when people use cool tools I've never heard of in our take-home interview question, like Splink! #RecordLinkage #DataScience #OpenSource

https://github.com/moj-analytical-services/splink

GitHub - moj-analytical-services/splink: Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends - moj-analytical-services/splink

GitHub