LoCCS classifications processing via sed is down to about a minute.

That's still really rough, but yeah, it's a lot faster if you don't recurse back and forth through the input files.

#LoCCS #FreeLCC #LCC #Libraries #sed

Library of Congress Classification WTF:

What is with the "gappy" K-Class subclasses and ranges? From the FreeLCC "OUT" (outline) files, I get stuff like:

KDZ-KH1 127 Civil law
KDZ-KH1 132 Persons
KDZ-KH1 143 Domestic relations. Family law

Note space after "KDZ-KH1"

(~1700 similar)

Other classes _don't_ have a space between elements, see e.g. from J:

JC11-605 Political theory. The state. Theories of the state
JC47 Oriental state

#LCC #LoCCS #FreeLCC #Librarians

And for those following my ongoing saga with the #FreeLCC #LCC / #LoCCS classification files ...

... I've now got the full classification _outlines_ sensibly parsing. That's about 10,246 entries. I've sorted out the superfluous page-leading context, joined lines needing joining, and eliminated "See also" / "For ... see" notes (not especially meaningful, can be added back later). That nets the equivalent of about 184 pages of classifications, typed, about half that typeset.