RE: https://genart.social/@davidcarew/116178444886600923

I absolutely agree with @davidcarew about usefulness of #sed and #awk.

I am starting to dabble with #xsltproc for some #XML aware bulk modification.

I used #XSLT to convert an XML file into #wget commands to download additional XML files I needed for a project. (First file listed the base name of additional files, each in an element / node.)

@drscriptt

XML (and by association XSLT) took a lot of flak for its verbosity and stringent definition (compared to HTML and other SGML dialects which had a much looser, more forgiving definition), but a piece of me still loves it. I have an old project that kept data as XML markup and uses an XSLT to churn it into (X)HTML output for display. It's lovely 😁

@gumnos I wonder just how much of that reputation XML deserved.

As in IMHO XML doesn’t REQUIRE that files pass DTD validation.

Though lazy XML will likely eventually bite you.

I’m liking what I’m doing with it.

@drscriptt

It does require passing syntactical validation though (tags must be closed, proper quoting, escaping, declarations, etc), but the DTD validation is indeed optional-but-recommended (and failure to do so will, as you note, likely bite you in the bum eventually). But I like that assurance-of-consistency that makes it more reliable to work with.

@gumnos agreed.

I actually created a DTD for one of my small projects. It was a learning experience. As in a good positive experience.

I’m on the hunt for something that will let me INSERT, UPDATE, and DELETE nodes from an XML tree used as a small database of sorts. I already have SELECT functionality. If you’ll allow the SQL analogy.

As in I’d like to add a book;

<book><title>foo</title></book>

… to a books tree;

<books>
<book><title>bla</title></book>
</books>

I’d prefer to not read the entire tree into memory, modify, and write back out. I can, just feels suboptimal. Even for my small dataset.

@drscriptt

[brain rummages around in cold storage]

IIRC, you're looking for "SAX" parsers which incrementally process documents, firing events for each node (element, attribute, text, whitespace, processing-instruction, etc) and allowing you to act on those events. As opposed to "DOM" parsers which build up the entire tree in memory.

In both cases, if you're modifying the tree, you generally have to *read* the entire document (and the write the entire modified document back out), but where DOM parsers hold the whole document in memory, SAX parsers don't have to hold the whole document in memory at same time, much like awk(1).

Unfortunately, I'm unaware of any awk-like language for manipulating XML using SAX-like streams, so usually it involves creating custom code in your favorite language using its SAX bindings to do your INSERT/UPDATE/DELETE actions.