Mastodawn

JSON-LD can be nice to work with as a JSON object, for example:

https://kolektiva.social/@anarchivist/111905336837934109.json

But it can also be very difficult to work with, for example:

https://id.loc.gov/authorities/subjects/sh85079255.jsonld

Since you don't really know what you're going to get you need to use heavy RDF processing tools just to work with some JSON data. I think that's why people don't like JSON-LD.

Show thread

Ed Summers Feb 10, 2024

Polyglots don’t lead to interoperability

tl;dr: please don’t specify them.

Show thread

Patrick Hochstenbach Feb 10, 2024

@edsu This just depends on the tooks loc.gov used to create the JSON-LD, right. It just could be a JSON file, pretty readable, with a context inline or known to exists. I'm not a big fan of JSON-LD but this is not the reason. It is a compromise solution. As ugly and use-full as XML.

Show thread

Ed Summers Feb 10, 2024

@hochstenbach yes, they could have chosen to publish it differently, but they didn't, and it's still valid JSON-LD. Uncertainty about how its going to be structured raises the bar for everyone who wants to use it.

But I guess different communities on the web could have norms of usage, that take some of the guess work out of parsing.

Show thread

Patrick Hochstenbach Feb 10, 2024

@edsu Indeed, it would have taken LOC 10 minutes of work to produce an output that is much easier to consume using a JSON-LD frame at their side. E.g. https://gist.github.com/phochste/39562b6cf51585d983208eaab61af22f

sh85079255.frame

GitHub Gist: instantly share code, notes, and snippets.

Gist

Show thread

Ed Summers Feb 10, 2024

@hochstenbach Yeah, I guess that's what I'm trying to say, that using JSON-LD requires a stack of software to process it (in addition to the usual JSON support).

Show thread

Jakob Voß Feb 10, 2024

@edsu @hochstenbach JSON-LD is misunderstood. Data providers should start with a clear JSON format to be usable without any knowledge or interest in RDF. Then add JSON-LD context on top, to make it RDF as well. In practice, it's often done the other way round, without any benefit compared to other RDF serializations.

Show thread

Adrian Feb 10, 2024

@nichtich @edsu @hochstenbach Yes! That is an essential part of Linked Open Usable Data and we try to do this with #lobid . Creating nice nested JSON to be easily consumed and – indexed in elasticsearch – queryable (rather intuitively if you make yourself a bit familiar with the JSON's fields and structure) via HTTP in complex ways on every level (though not as complex as SPARQL).

Show thread

Ed Summers Feb 11, 2024

@acka47 @nichtich @hochstenbach yes, agreed!

Show thread

Andromeda Yelton Feb 11, 2024

@edsu @hochstenbach I made this exact comment to them when I was doing a project on LoC data.

Show thread

Ruth [☕️ 👩🏻‍💻📚✍🏻🧵🪡🍵]Feb 11, 2024

@thatandromeda @edsu @hochstenbach trying to figure out how much to say about what an absurd pain we found it to parse the LCNAF as JSONLD as serialized by LC 😭

For actual processing, we output working files which had the data we needed and I think our script was still only at like -- 80% of getting things there, but we could throw out name-title authorities

Show thread

Ed Summers Feb 11, 2024

@thatandromeda @hochstenbach what did they say?

Show thread

Andromeda Yelton Feb 11, 2024

@edsu @hochstenbach i do not recall (it was in a context where they were gathering a lot of feedback and synthesizing it later)

Show thread

Ed Summers Feb 14, 2024

@thatandromeda @hochstenbach I got worked up enough to write a blog post, lol: https://inkdroid.org/2024/02/14/publishing-jsonld/

Hopefully it was ok to quote your post @acka47 ?

On Publishing JSON-LD

Show thread

Ruth [☕️ 👩🏻‍💻📚✍🏻🧵🪡🍵]Feb 14, 2024

@edsu aha, THIS I will cite!

Show thread

Ruth [☕️ 👩🏻‍💻📚✍🏻🧵🪡🍵]Feb 14, 2024

@edsu *emails graduate assistant with subject line: VINDICATION (JSON-LD)*

Show thread

Ruth [☕️ 👩🏻‍💻📚✍🏻🧵🪡🍵]Feb 14, 2024

@edsu it took us a _month_ to work through

and she was like "I thought I was skilled enough to process JSON with Python" and I had to reassure her that this was like -- something nobody really used -- for REASONS.

Show thread

Patrick Hochstenbach Feb 14, 2024

@edsu @thatandromeda @acka47 that is the way! I really hope loc will pick this up and output some pretty JSON that is also LD.

Show thread

Andromeda Yelton Feb 14, 2024

@edsu @hochstenbach @acka47 “probably easier to use one of the XML representations instead” _ouch_

Show thread

Adrian Feb 14, 2024

@edsu No problem, thanks! @thatandromeda @hochstenbach

Show thread

Adrian Feb 14, 2024

@edsu At #elag2019, we already did hands-on bootcamp on creating #LOUD from the #Bibframe works dataset, indexing it and using it e.g. with #Openrefine. Slides: https://hbz.github.io/elag2019-bootcamp/ repo: https://github.com/hbz/elag2019-bootcamp I haven't heard of any new Bibframe work bulk download since, though. @thatandromeda @hochstenbach

From LOD to LOUD: building and using JSON-LD APIs

Bootcamp at ELAG2019

Show thread

kcoyle checking the perimeter Feb 14, 2024

@edsu @thatandromeda @hochstenbach @acka47 "But really, in my opinion, it just means publishing things at URLs you intend to manage so people can link to them over time" Oh, yes! Kudos to Edsu, as always

Show thread

Matt Miller Feb 15, 2024

@edsu @thatandromeda @hochstenbach @acka47 There are specific reasons the json-ld comes out like that mostly due to the problem of the underlying system, marklogic does not have jsonld serialization natively, Kevin Ford wrote the conversion program over 10 years ago (https://github.com/kefo/rdfxq/tree/master/modules) so it was pretty new at the time it was created, but I agree, it would be great to update it to be more user friendly.

rdfxq/modules at master · kefo/rdfxq

XQuery Library for RDF. Contribute to kefo/rdfxq development by creating an account on GitHub.

GitHub

Show thread

Ed Summers Feb 15, 2024

@matt @thatandromeda @hochstenbach @acka47 thanks for this Matt! I would have thought marklogic included pretty solid functionality for converting xml to usable json by now? https://docs.marklogic.com/guide/app-dev/json#id_55967

Is there an application layer between the marklogic db and the web? Or is id.loc.gov actually implemented completely in xquery, inside of marklogic?!

Working With JSON (Application Developer's Guide) — MarkLogic Server 11.0 Product Documentation

MarkLogic is the only Enterprise NoSQL Database

Show thread

Matt Miller Feb 15, 2024

@edsu @thatandromeda @hochstenbach @acka47
To go from a xml doc to json representation it probably can but to do doc + sem triples store into a valid json-ld serialization there is no native way of doing it, that I’m aware of.

Yep, marklogic is a doc db/triple store and application layer built in. It’s all xquery code running everything.

Show thread

Ed Summers Feb 15, 2024

@matt @thatandromeda @hochstenbach @acka47 I guess what I'm suggesting is to go from XML to some kind of sane JSON, and then layer in whatever @context is needed for it to make sense as JSON-LD? The (probably offbase) assumption being that you are storing XML docs in MarkLogic?

Show thread

Matt Miller Feb 15, 2024

@edsu
All the docs are in the DB yes, I think the easiest solution is to modify the current existing conversion to produce "nicer" json-ld, which I think would be a great, and I can definitely mention it to the team.

Show thread

Ed Summers Feb 15, 2024

@matt ok great! Assuming that there are XML docs in the database, it seems like you could use existing MarkLogic support for generating JSON, and then add in whatever @context you need into that to make it JSON-LD?

Show thread

Matt Miller Feb 15, 2024

@edsu yeah possibly, will need to look at the outputs and the current process.

Show thread

Rob Sanderson Feb 15, 2024

@matt @edsu

LUX (https://lux.collections.yale.edu/) uses JSON-LD in MarkLogic with an automated extraction of the triples into the ML container, but that extraction doesn't happen natively in ML. That said, it's an easily countable number of lines of python to do it before loading, and could have been a trigger on document load within ML.

LUX: Yale Collections Discovery

Explore Yale University's cultural heritage collections

Show thread

Ed Summers Feb 12, 2024

@hochstenbach here's what I ended up using:

https://gist.github.com/edsu/7d2d8fb2049075e76bfdf106b8805220

I kind of wanted to get some JSON-LD that only included the SKOS vocab but I wasn't quite sure how to do that with the frame...

Get some usable JSON for a given LC name or subject authority string: e.g. `./lcauthority.py "Southampton (England)"`

Get some usable JSON for a given LC name or subject authority string: e.g. `./lcauthority.py "Southampton (England)"` - lcauthority.py

Gist

Show thread

Patrick Hochstenbach Feb 12, 2024

@edsu I don't think with a frame one can filter on a whole vocabulary. But one can state : "I only want to see these and these JSON properties and skip the rest". The JSON could be a bit easier also with `"mads:elementList": { "@container": "@list" }` in the frame. If LOC doesn't want to do the frameing at their side (for them it would be much easier and less URL parsing), your code snippet provides a lot of inspiration for others how process this data easier.

Show thread

Ed Summers Feb 12, 2024

@hochstenbach ok, I'm glad I didn't miss something obvious! I might package this up as a little library/cli just as an example of how framing can make the data easier to work with.

I also noticed that sometimes strings are expressed as language-literals and other times they're not, which complicates using the data a bit. I suppose using an rdf library would make that go away. I opened an issue to make sure I wasn't missing a way of making that easier with a frame:

https://github.com/digitalbazaar/pyld/issues/192

Language tagged strings · Issue #192 · digitalbazaar/pyld

I think it may be out of scope for Framing and pyld, but I was wondering if there is any way to ensure that string literals either appear as language tagged strings or without them? For example is ...

GitHub

Show thread

Patrick Hochstenbach Feb 12, 2024

@edsu I'm afraid this default language needs to be solved at rdf level (data cleaning, or the source providing an easier mapping) and not json-ld framing. See also : https://github.com/w3c/json-ld-framing/issues/156

Framing to a default language for the entire doc · Issue #156 · w3c/json-ld-framing

I'd hoped I could use Framing to take an incoming multilingual JSON-LD document down to a single language without having to set @language with every @value as seen in Example 13. Here's what I trie...

GitHub

Show thread

Ed Summers Feb 12, 2024

@hochstenbach ah, thanks for letting me know I wasn't missing something obvious.

Show thread

Rob Sanderson Feb 15, 2024

@edsu @hochstenbach Garbage data from LC at fault here, not JSON-LD or PyLD. Not surprisingly.

IIIF has a good pattern for language maps: https://iiif.io/api/presentation/3.0/#language-of-property-values that makes usable JSON... assuming that the incoming data is also reasonable

Presentation API 3.0

IIIF is a set of open standards for delivering high-quality digital objects online at scale. It’s also the international community that makes it all work.

Show thread

Ed Summers Feb 15, 2024

@azaroth42 @hochstenbach lol -- that's kind of the problem. Isn't it valid JSON-LD?

I definitely didn't mean to blame PyLD. I only wanted to say that when people are using JSON on the web they shouldn't be required to pay the RDF tax (using an RDF processing library). It should be usable as JSON, like what you have done in the IIIF community.

Show thread

Rob Sanderson Feb 16, 2024

@edsu @hochstenbach

It is valid JSON-LD, much like SOAP and RDF-XML are valid but terrible XML. LC needs to understand that the audience for data is data/software engineers, and engage with our community to produce content that is both useful and usable. Unfortunately, as you doubtless experienced in your time there, it falls on deaf ears :(

Show thread

Ed Summers Feb 16, 2024

@azaroth42 @hochstenbach I wouldn't say people were uninterested in improving things. I remember there were significant barriers to getting things done when they spanned different parts of the organization. I don't think people willfully make things hard to use at any rate. But, that was a long time ago in a galaxy far far away. Ok maybe not that far, just 7 miles ;-)

Show thread

Misty Feb 11, 2024

@edsu Excellent sample post

Show thread

Ruth [☕️ 👩🏻‍💻📚✍🏻🧵🪡🍵]Feb 12, 2024

@edsu Ed may I cite this toot?

Show thread

Rob Sanderson Feb 15, 2024

@edsu I (of course) fundamentally disagree. The structure of your argument is: Toothbrushes can be nice, but someone once used one to kill someone. That's why people don't [implied shouldn't] like toothbrushes.

Just because the data from one organization technically meets requirements doesn't make JSON-LD in any way wrong. IIIF, Annotations, schema.org and many more have proven the value for more than a decade now.