@cm_harlow And their terms of service are OK with that?
Sorry, I've never done anything with Spark. Sounds like you're doing a huge job though. I wonder if the LoC could just give you a big data dump?
@ekansa terms of service for 1 of the services have a query time limit, others are open.
their data dumps are not the representation i need, a lossy representation at that, and nearly a year out of date.
and they're not willing to generate new data dumps for me at this moment and send me to the various request options.
(but i could pay a vendor for what i need)
@cm_harlow Wow! Seesh.
So you doing some sort of analysis on all these MARC records? Or is this to buildup your own data for retrieval services? Or something else.
Sorry, I can't help but I'm super intrigued by the scale of your project!
@ekansa for sure.
i'm basically doing whatever I can to get these datasets better published + shared. I want to explore what forking of large auth. datasets could look like but can't wait on LoC to move towards something other than Voyager / Z39.50 / SRU / MarkLogic for their data services