Mastodawn

tor sparql queen ✅Apr 12, 2017

does anyone have a good example / bit of code i can look over for using spark + pref python to iterate over a large number of HTTP calls?

Show thread

use @[email protected]Apr 12, 2017

@cm_harlow are you doing some sort of massive crawl?

Show thread

tor sparql queen ✅Apr 12, 2017

@ekansa ... yeah :-/ i'm trying to get a bunch of MARC records outta LoC via 4 diff api routes

Show thread

use @[email protected]Apr 12, 2017

@cm_harlow And their terms of service are OK with that?

Sorry, I've never done anything with Spark. Sounds like you're doing a huge job though. I wonder if the LoC could just give you a big data dump?

Show thread

tor sparql queen ✅Apr 12, 2017

@ekansa terms of service for 1 of the services have a query time limit, others are open.

their data dumps are not the representation i need, a lossy representation at that, and nearly a year out of date.

and they're not willing to generate new data dumps for me at this moment and send me to the various request options.

(but i could pay a vendor for what i need)

Show thread

use @[email protected]Apr 12, 2017

@cm_harlow Wow! Seesh.

So you doing some sort of analysis on all these MARC records? Or is this to buildup your own data for retrieval services? Or something else.

Sorry, I can't help but I'm super intrigued by the scale of your project!

Show thread

tor sparql queen ✅Apr 12, 2017

@ekansa heh, nw. i'm looking to retrieve a full dump of the Authorities to then serve up + manage in a few different ways - as a Git repo, as a ResourceSync source, as a IPFS repo.

Show thread

tor sparql queen ✅Apr 12, 2017

@ekansa + perform a more granular conversion to RDF, enhance with reconciliation, + serve/publish in same mechanisms.

Show thread

use @[email protected]Apr 12, 2017

@cm_harlow wow! That sounds really awesome. For what resources? All the LoC or some?

Show thread

tor sparql queen ✅Apr 12, 2017

@ekansa all the LoC Name + Subject Authorities - at least at first.

Show thread

use @[email protected]Apr 12, 2017

@cm_harlow That sounds super cool. Would be really interesting to link up with some other linked datasets, esp. gazetteers.

Show thread

tor sparql queen ✅

@ekansa for sure. geo data is one of the things that suffers the worst from the current LoC data dumps in RDF bc the conversion is lossy