Who's On First

89 Followers
37 Following
47 Posts
version 0.0.4 of the https://github.com/whosonfirst/go-dedupe/tree/main/embeddings#llamafileembedder package which we are using to de-duplicate venues has been released / adding support for generating text embeddings using mozilla's https://github.com/Mozilla-Ocho/llamafile application / background – https://whosonfirst.org/blog/2024/08/16/dedupe/
go-dedupe/embeddings at main · whosonfirst/go-dedupe

Go package for resolving duplicate "place" (or venue) locations. - whosonfirst/go-dedupe

GitHub
version 0.0.2 of the https://github.com/whosonfirst/go-dedupe package which we are using to de-duplicate venues has been released / adding support for storing and querying locations using duckdb and the vss extension / background – https://whosonfirst.org/blog/2024/08/16/dedupe/
GitHub - whosonfirst/go-dedupe: Go package for resolving duplicate "place" (or venue) locations.

Go package for resolving duplicate "place" (or venue) locations. - whosonfirst/go-dedupe

GitHub
@whosonfirst @thisisaaronland Great work and write up. We were kicking around the potential of embeddings to do similar work at Overture. Will be sharing this there...
today's venue de-duplication news includes / 1,500 freshly deprecated duplicate records, 15,000 new overture place concordances and 800 new all the places venue concordances in the whosonfirst-data-venue-us-ma repository / and working implementations of the location and vector database interfaces using duckdb which will be merged in to the main branch shortly / background – https://whosonfirst.org/blog/2024/08/16/dedupe/
De-duplicating Who's On First venues with vector embeddings

Using four different Who’s On First venue repositories for testing, I have been able to first deprecate about 45,000 duplicate records and then, second, derive over 100,000 concordances with Overture Data place records, 8,000 concordances with All The Places venues and another 500 concordances with ILMS museum records. There are almost certainly still bugs, or at least “gotchas”, but importantly the work so far passes the “better than yesterday” test.

Who's On First
there is a kitchen-sink "wof" command line tool that I've been working on / which deserves its own long and twisty blog post soon / but for now I'll just say that / it has learned how to produce geoparquet databases from one or more who's on first repositories – https://github.com/whosonfirst/wof-cli?tab=readme-ov-file#example-geoparquet
GitHub - whosonfirst/wof-cli: Command-line tool for common Who's On First operations.

Command-line tool for common Who's On First operations. - whosonfirst/wof-cli

GitHub
"Using four Who’s On First different venue repositories for testing, I have been able to first deprecate about 45,000 duplicate records and then, second, derive over 100,000 concordances with Overture Data place records, 8,000 concordances with All The Places venues and another 500 concordances with ILMS museum records. There are almost certainly still bugs, or at least “gotchas”, but importantly the work so far passes the “better than yesterday” test." – https://whosonfirst.org/blog/2024/08/16/dedupe/
De-duplicating Who's On First venues with vector embeddings

Using four different Who’s On First venue repositories for testing, I have been able to first deprecate about 45,000 duplicate records and then, second, derive over 100,000 concordances with Overture Data place records, 8,000 concordances with All The Places venues and another 500 concordances with ILMS museum records. There are almost certainly still bugs, or at least “gotchas”, but importantly the work so far passes the “better than yesterday” test.

Who's On First
"Who’s On First shapefile downloads in QGIS and on HDX" by @kelsoscorner – https://whosonfirst.org/blog/2024/07/18/more-shapefiles/
Who’s On First shapefile downloads in QGIS and on HDX

Shapefiles are the resurgent vinyl music format for digital mapping

Who's On First
there is now a handy "wof-cli" tool / for performing common operations on one or more who's on first documents / from the command line – https://github.com/whosonfirst/wof-cli
GitHub - whosonfirst/wof-cli: Command-line tool for common Who's On First operations.

Command-line tool for common Who's On First operations. - whosonfirst/wof-cli

GitHub
have you ever wanted to produce a geoparquet database file / from all repositories containing who's on first style documents in a github organization / now you can – https://github.com/whosonfirst/go-whosonfirst-geoparquet
GitHub - whosonfirst/go-whosonfirst-geoparquet: Go package to produce planetlabs/gpq -compatible input to generate GeoParquet files.

Go package to produce planetlabs/gpq -compatible input to generate GeoParquet files. - GitHub - whosonfirst/go-whosonfirst-geoparquet: Go package to produce planetlabs/gpq -compatible input to gene...

GitHub
recent changes to the go-whosonfirst-spatial-pmtiles package result in container files that 30% smaller / recent changes to tippecanoe means that container can create a protomaps tiles database for reverse-geocoding with global coverage / that is 50% smaller (4GB instead of 8GB) – https://github.com/whosonfirst/go-whosonfirst-spatial-pmtiles
GitHub - whosonfirst/go-whosonfirst-spatial-pmtiles: Go package to implement the whosonfirst/go-whosonfirst-spatial interfaces using a Protomaps .pmtiles database.

Go package to implement the whosonfirst/go-whosonfirst-spatial interfaces using a Protomaps .pmtiles database. - GitHub - whosonfirst/go-whosonfirst-spatial-pmtiles: Go package to implement the who...

GitHub