today's venue de-duplication news includes / 1,500 freshly deprecated duplicate records, 15,000 new overture place concordances and 800 new all the places venue concordances in the whosonfirst-data-venue-us-ma repository / and working implementations of the location and vector database interfaces using duckdb which will be merged in to the main branch shortly / background – https://whosonfirst.org/blog/2024/08/16/dedupe/
De-duplicating Who's On First venues with vector embeddings
Using four different Who’s On First venue repositories for testing, I have been able to first deprecate about 45,000 duplicate records and then, second, derive over 100,000 concordances with Overture Data place records, 8,000 concordances with All The Places venues and another 500 concordances with ILMS museum records. There are almost certainly still bugs, or at least “gotchas”, but importantly the work so far passes the “better than yesterday” test.