LoCCS classifications processing via sed is down to about a minute.
That's still really rough, but yeah, it's a lot faster if you don't recurse back and forth through the input files.
LoCCS classifications processing via sed is down to about a minute.
That's still really rough, but yeah, it's a lot faster if you don't recurse back and forth through the input files.
Library of Congress Classification WTF:
What is with the "gappy" K-Class subclasses and ranges? From the FreeLCC "OUT" (outline) files, I get stuff like:
KDZ-KH1 127 Civil law
KDZ-KH1 132 Persons
KDZ-KH1 143 Domestic relations. Family law
Note space after "KDZ-KH1"
(~1700 similar)
Other classes _don't_ have a space between elements, see e.g. from J:
JC11-605 Political theory. The state. Theories of the state
JC47 Oriental state
And for those following my ongoing saga with the #FreeLCC #LCC / #LoCCS classification files ...
... I've now got the full classification _outlines_ sensibly parsing. That's about 10,246 entries. I've sorted out the superfluous page-leading context, joined lines needing joining, and eliminated "See also" / "For ... see" notes (not especially meaningful, can be added back later). That nets the equivalent of about 184 pages of classifications, typed, about half that typeset.
OK, I want a handy way to collapse a set of _similar_ strings to the common part plus a list of variants. So:
DB History of Austria
DB History of Liechtenstein
DB History of Hungary
DB History of Czechoslovakia
... becomes:
DB: History of Austria Liechtenstein Hungary Czechoslovakia
Where "Histoy of" is the common part, and the other bits are consecutively added.
(Commas or periods could be inserted.)
@clew I've made some use of mind-mapping tools. I've found most of them to be _very_ tedious. There's a commercial tool that's been used, "The Brain", notably by #JerryMichalski, see:
I'm inclined to think that using tags or other classifications (e.g., #LoCCS) on corpora to generate mind maps may be more ultimately useful.
Another prespective is that we're grasping for new models of epistemic records. The codex has served us well, but is showing its age and limits.
@RefurioAnachro Contrast that with the inverse:
* Encumbered (e.g., Dewey Decimal: a proprietary system)
* Greenfield: newly-created -- would have to be created, designed, and worst of all, sold to adopters
* Unstandardised
* Non-comprehensive -- it would have to be built out. And the design failures addressed.
* Non-hierarchical -- flat spaces suck
* No tools
* Unflexible
* Unextensible
* Poor/no processes
* Single-site
* No practitioners