https://overturemaps.org/ released an astonishing GIS dataset yesterday that includes 60m "place of interest" listings (businesses, attractions etc) under a VERY permissive license

It's 8GB of data and the quality from an initial spot-check seems to be very high. I wrote about how I've been exploring it so far here:

https://til.simonwillison.net/overture-maps/overture-maps-parquet

Home - Overture Maps Foundation

Overture Maps Foundation

I used DuckDB to extract data from the released parquet files, then loaded that into SQLite so I could use it with @datasette

Here's a demo I built with just the data from the places city for the city of Half Moon Bay - 931 listings in total: https://hmb-overture-demo.vercel.app/hmb/places

hmb: places: 931 rows

@simon @datasette great write-up, thanks.

What's your impression as a resident of Half Moon Bay?

Is the data correct? How fresh is it? How does it compare with OSM?

@simon @datasette umm, is there actually a restaurant / cinema on the water in the middle of the bay?
@simon @datasette in fairness I guess the do give it a low confidence score, so could exclude all that fall below a certain threshold
@freyfogle @datasette it looks like that's the Mavericks surf competition, so yeah that's the right spot
@simon @datasette sure, but why is it labeled as a resturant?
@freyfogle @datasette yeah those categories are very clearly off!
@freyfogle @datasette I haven't compared to OSM yet - my initial spot check for my favourite places all looked correct to me
@simon Thanks for sharing out the Parquet querying parts of this!
@simon I mean all of it is great, but getting data of Parquet is more difficult than it should be, so I appreciate that part especially :)
@simon kinda sucks that you need to download the whole thing. Maybe using the Athena or Azure routes would allow faster selects than DuckDB?

@seav You don't have to download the whole thing for a bunch of operations - but the "find places within this bounding box" thing does seem to be too much for the remote HTTP mechanism to handle quickly

A problem I have is that I don't have good instincts yet for figuring out if a query is likely to work well over remote Parquet or not

@simon @seav This is awesome! There are some tricks we can use to structure the parquet files that allow more efficient bounding box queries using remote predicate pushdown. It’s all pretty new for spatial parquet data. We’ll probably look at this for future releases along with easier country/region partitioning.
@jwass2000 @[email protected] That would be fantastic - I'm very new to Parquet/DuckDB myself so any extra documentation from Oversight illustrating the kinds of queries you can run against it without downloading GBs of data would be really useful
@simon Thank you for this! I also downloaded the entire dataset and was able to extract data for the Philippines (bounding box) using the approach you shared. πŸ‘
@benhur07b my question is, why are there dots all over the seas?

@seav That I do not know. πŸ˜…

Your mileage may vary talaga sa data na to. Haven't dived deep into it pero mukha ngang hindi (pa) ganun kaganda quality.