does anyone have a good example / bit of code i can look over for using spark + pref python to iterate over a large number of HTTP calls?
@cm_harlow are you doing some sort of massive crawl?
@ekansa ... yeah :-/ i'm trying to get a bunch of MARC records outta LoC via 4 diff api routes

@cm_harlow And their terms of service are OK with that?

Sorry, I've never done anything with Spark. Sounds like you're doing a huge job though. I wonder if the LoC could just give you a big data dump?

@ekansa terms of service for 1 of the services have a query time limit, others are open.

their data dumps are not the representation i need, a lossy representation at that, and nearly a year out of date.

and they're not willing to generate new data dumps for me at this moment and send me to the various request options.

(but i could pay a vendor for what i need)

@cm_harlow Wow! Seesh.

So you doing some sort of analysis on all these MARC records? Or is this to buildup your own data for retrieval services? Or something else.

Sorry, I can't help but I'm super intrigued by the scale of your project!

@ekansa heh, nw. i'm looking to retrieve a full dump of the Authorities to then serve up + manage in a few different ways - as a Git repo, as a ResourceSync source, as a IPFS repo.
@cm_harlow @ekansa This sounds awesome and potentially super useful. Are you doing this for private/internal use or public?
@spellproof @ekansa it'll be public + open
@ekansa @spellproof i got through maybe 40% of the NAF pushing it to GitHub but it's just taking forever with my dinky python + requests only scripts
@cm_harlow @ekansa I can imagine! Anything better is way over my head, but good luck!