#AHAlodeck R&D news:

- Testdataset: 386.5 GB
- 120.000 filesystem objects (files and folders)
- 7616839 key/value informations gathered/extracted as #xattrs.

Result: A 55 MB .tar.bz2 #holotar.

Fully search-and-filter-able by any of the 7 Mio xattrs.

Extraction step required only once (6h)
Afterwards:
Indexing done in 60 SECONDS! 🀩 πŸ₯‡

Same data (key/value pairs) viewed in different application:

`apt install eiciel`

That's a #DataCentric paradigm in action! You can literally shoot any mixed (annotated) data at that.

It really puts fun back into using data daily.

#AHAlodeck

Handed in my presentation proposal for #nttw10 #notimetowait

Presenting "#DataGardens" 🌻️🌻️🌻️
Powered by #AHAlodeck to host any arbitrary data, offload key/value to filesystem - and transform datas into annotated related object graphs.

Pure "#ScienceFaction"!
#WeAllHaveBigDataNow

in #ZFS on #Linux, #Proxmox, #Debian, #Ubuntu #FOSS #FinF #ObjectStorage

Download #wikidata dump for testing #ahalodeck and #xattrs on #zfs long-term storage... 😎

Found this while doing so:

https://www.wikidata.org/wiki/Wikidata:Lists/lexemes/en

It's a statistical breakdown of the #English #language. #Interesting.

Wikidata:Lists/lexemes/en - Wikidata

#AHAlodeck R&D news:

Created holotar-copy of a real-world mixed audio collection (recording production and digitized tracks archive): ~120.000 filesystem objects (files/folders)

(donated by recording engineers)

filesystem metadata-only = 175 MB (.tar)
+exiftool metadata as xattrs = 252 MB (.tar)

...compressed (.tar.bz2): **6,3 MB** ( 🀯 😎 πŸ€“ πŸ’Ύ ❗)

**This is AMAZING!**

I can browse, order, catalog and manifest any object in this collection using standard #GNU #Linux tools.
@beet_keeper

#AHAlodeck: My best friend wrote a mini-basic fulltext indexer for #xattrs in the shell.

Took ~1.5 seconds (!) to index ~14.000 files with "de-embedded" music tags.

πŸ₯³ πŸŽ‰
I love xattrs.
Imagine doing this with embedded metadata tags? πŸ˜‰

https://git-annex.branchable.com/git-annex-metadata/

git seems to have embedded key-value #metadata annotation capabilities... 😻 #AHAlodeck ahoi!

git-annex-metadata

so **each** key/value-pair of #metadata can be about 64kB.

This is sufficient to hold most annotations stored as #EmbeddedMetadata currently present in most file formats.
(eg #mp3, #mp4, #iptc, #tiff, #jpg, etc)

But from then on, the file-format becomes irrelevant for this task!

#ahalodeck #digipres #audiovisual #cataloging

Very helpful insights on "how much actual bytes" can be stored in #xattrs on `btrfs`:

https://github.com/kdave/btrfs-progs/issues/917

Current limit = about the nodesize (it was formatted with).

nodesize - sizeof(struct btrfs_header) - sizeof(struct btrfs_item) - sizeof(btrfs_dir_item)

Current default = 16K - 101 - 25 - 30 , which is 16228 bytes

Maximum (for now) is ~64kB, if formatted like this:

`$ mkfs.btrfs -n 64K /dev/sdX`

#ahalodeck #metadata #digipres

Questions about extended file attributes (xattrs) Β· Issue #917 Β· kdave/btrfs-progs

Hello everyone πŸ˜„ It's about btrfs-documentation: I'd like to ask about technical details on extended attributes (xattrs) on btrfs, and found these pages: https://docs.kernel.org/6.1/filesystems/btr...

GitHub