#PaulOtlet bezweifelte Ende des 19. Jahrhunderts die Zukunft des Buches, ja gar der Bibliothek an sich. Und doch hat er einen Beitrag im #TIBBlog verdient – heute am 10. Dezember 2024, seinem 80. Todestag.
Denn Otlet ist einer der großen Vordenker, dem wir unsere heutige Auffassung von Wissens- und #Informationsmanagement verdanken.
Mehr dazu von unserer #TIB-Kollegin #PetraMensing: https://blog.tib.eu/2024/12/10/paul-otlet-der-weltbibliothekar
Paul Otlet – der Weltbibliothekar - TIB-Blog

Anlässlich seines 80. Todestages blickt Dr. Petra Mensing auf das Leben des Weltbibliothekars Paul Otlet zurück, der ein Vordenker für das heutige Wissens- und Informationsmanagement war. Er verzeichnete Weltwissen auf 15 Millionen Karteikarten und beantwortete bereits 1912 etwa 1.500 Anfragen – damit war sein Mundaneum vielleicht die erste analoge Suchmaschine der Welt.

TIB-Blog
Book Club on Cataloging the World and Index, A History of the

Our next iteration over the coming month or so will focus on two relatively recent books in the area of intellectual history and knowledge management

BoffoSocko.com

@carapace I've put a fair bit of thought into how a document-oriented filesystem (in the #PaulOtlet sense of "document") might function. To the extent I've thought this out, it's somewhat modelled on how a physical library is organised: there is the actual storage ("stacks"), and there's the interface to that storage, ("catalogue").

A document is any contained information. It might be a text, image, sound, video, multimedia, or data record, or combination of these.

The stacks contain works. The catalogue provides ways of accessing those works, and any given work might appear in or be accessed through the catalogue in any number of different ways.

A huge challenge for any such metadata-based system is that metadata itself requires design and creation, and this remains hugely cumbersome for data presently. There's some useful metadata associated with filesystems, though much of that is at a systems rather than document level, and some metadata (say, file creation / modify / access timestamps) bears little if any relation to the underlying document. Tracking document-related metadata would be a huge step forward.

Relying on extant and often external metadata would also be useful. Library of Congress, OCLC, IMDB, CDDB, DOI, ISBN, and related records would be quite useful for classifying existing works. Some set of useful standards for other common records (personal computer files, system logs, receipts, memos, correspondence, online interactions) might also be useful. The more that metadata creation can be both automated and made useful (and no, "New Document" is not a useful title) the better.

#Filesystems #webfs #docfs

3/

Whitespace in filenames is a major category error IMO.

OTOH, filenames themselves (and filesystems as presently incarnated) are also grossly insufficient for many needs. It's interesting to note, for example, that on Android (and possibly iOS), databases (usually sqlite) have emerged as the de-facto default persistent data storage mechanism, even for content which would normally be held on a filesystem.

I've long been looking at questions such as what a document-oriented filesysem (#docFS) or the World Wide Web as fileystem accessible (#webFS) might look like.

For documents, I've generally arrived at a naming standard which uses underbars (_) to separate elements, hyphens (-) for standard whitespace, and double dashes (--) to indicate punctuated / multiple element (e.g., multiple authors, or a subtitle following a colon or dash). Permitted characters are otherwise 7-bit ASCII alphanumeric ([A-Za-z0-9], with dot as a file extension only, and possibly parentheses.

So:

Author-One--Author-Two_Title--Subtitle_YYYY.filetype

That might have a publisher or journal title added (additional underbar-delimited element after the title(s). Additional contributors (e.g., editors, translator) might be mentioned. And it's possible some identifier (ISBN, OCLC, DOI, LoC call number) might be added, though those are supplemental.

The idea isn't to fully and completely or precisely represent all aspects of a document or work, but to usefully do so. So yes, that means that foreign charactersets aren't presented, that full author lists aren't included (for scientific paper these can number in the tens to hundreds), etc. But enough to find the work reasonably within a corpus through a directory listing.

Yeah, I'm familiar with Calibre, Zotero etc., and should really get more familiar with them. But they're clunky enough and not sufficiently universally available (e.g., on Android, where most of my documents live these days, via an e-book reader) that I'm not optimistic they're really a solution.

(Hoisted from a limited share.)

#DocumentManagement #Whitespace #OnTheNamingOfCats #OnTheNamingOfFiles #Whatever #SameThing #RockyHorror #MacavitysNotHere #Bombalurina #Effanineffable #OldPossum #TSEliot #DOS #PaulOtlet #Mundaneum

@julian Yes, the "father of information science".
I like the quote from The Atlantic, »…a global network of “electric telescopes”«

But, frankly, it would be an exaggeration to say I am familiar with #PaulOtlet

/cf. slide 20: https://mprove.de/script/15/beyondhyperlocal/index.html

Beyond HyperLocal Journalism, World Publishing Expo 2015 @mprove

mprove.de

@researchbuzz The proximity element is limited as I am, of course, on Altair IV, some 20 of your light years away.

That said, one of my obsessions (though not necessarily a major element of my Mastodon tooting) is information, knowledge, and document management.

The tags #kfc, #webfs, and #docfs will lead to a few of my information-management / search toots / threads.

And if you've got opinions, feelings, and/or deep intel on #PaulOtlet and his #Mundaneum I'm all ears.

@woozle

Gotthard Deutsch (1859–1921) produced a card index of 70,000 ‘facts’ of Jewish history.

Does Deutsch’s index constitute a great unwritten work of history, as some have claimed, or are the cards ultimately useless ‘chips from his workshop’?

BoffoSocko.com

@jonny My principles here are:

  • The filename should be descriptive and not simply unique.
  • It should be human-meaningful in some manner if at all possible.
  • It should scope to the collection size / namespace.

Estimates I'm aware of are that there are on the order of 100--200m books ever published, growing at ~1m year, and a generally comparable set of scientific articles. News organisations such as Reuters, AP, and AFP produce about 1k--5k items daily, and I suspect many of those are photos or videos. Major newspapers tend to produce about 100--500 stories daily (weekday vs. weekend). You can work out ballpark maths from that.

For correspondence, the originator and recipient ("From:" and "To:" are both significant. Those might be referenced. Publishing, to a general audience, is in a sence correspondence where "From:" == Author and "To:" == World.

The filename need not be precise, exact, or an accurate presentation of conents, but USEFUL. That is, within a corpus, can I find a specific work or works of interest. In this sense, the titling scheme is an example of the principle I've developed that search is identity, in the sense that a search might produce 0, 1, or n>1 results. 0 is null, 1 is identity, and > 1 is a result set.

There are other naming and cataloguing schemes. A complete system would have correspondences between these and the conventional / human-readable titles, e.g., ISBN, LOCCS, OCLC, DOI, etc.

And yes there are other cataloguing systems such as SuDoc (used by the US government) which are useful in their own contexts.

Author, date, content, audience, and publisher are generally useful search-space reducing concepts of fairly generally applicable context. E.g., if I were including, say, store receipts or purchase orders, the vendor, customer, date, location, and a summary of contents (say, largest item) a description. Computer logs tend to be time and process/service oriented, perhaps also mentioning user or network address, etc.

Related hashtags and discussion:

#docfs #webfs #KFC #PaulOtlet #Maundenaum

@vertigo If you're familiar with #PaulOtlet, "document" is pretty much any fixed record: texts, images, audio, video, multimedia, data, software.

For publishing --- looking at texts, I'm thinking along the lines of a Kolmogorov complexity or minimum requisite complexity for a given work --- how much specification is required to create a sufficiently complete representation. I'm leaning heavily to Markdown and LaTeX as primary formats. (Possibly other lightweight markup langs, e.g., asciidoc or reStructured text).

Notion of having a source from which multiple endpoints might be produced: straight text, HTML, ePub, PDF, etc.

@Valenoern This is the essential idea behind "docfs", which would be a document-oriented filesystem. Its networked sibling being "webfs".

"Document" here is in the sense of #PaulOtlet, of any durable record. That might be a text, image, sound, video, multimedia content, data, software, or an amalgamation or melange.

One of my key ideas is that the metadata for these documents would be part of the filesystem, extending the notion of what constitutes file-centric data. I'd like to see some form of bibliographic data presented, where available for public and published media (book, articles, audio recordings, films).

Search is another element, and one idea for the filesystem would be as a virtual filesystem in which attributes could be supplied until a single item matching those criteria was found. "Identity is search".

For projects, some concept of structured workflows, with groups, tasks, milestones, and contributing data. For a sufficiently structured organisation, security and access controls.

I'd like the whole concept to be as commercialisation-hostile as possible, with both copyrights and payments entirely out of scope.

#docfs #webfs #kfc #maundenaum #DublinCore #metadata #bibliography #Plan9OS #Schopenhauer