Mastodawn

I'd like to store one billion variable-length binary objects & get them by SHA256(obj) key. The median object size is 2 kilobyte. Low read/write volumes.

What I've tried so far: NFS with the hash's first few octets as nested directory names, it works but it is a bit slow and I also tried ZeroFS (also too slow).

Under considerations: DuckDB, RocksDB, BerkeleyDB, SQLite3, lmdb, something bespoke

Recommendations? Things/papers I should be reading?

Show thread

Clemens Zauner Dec 6

@job If only ther was a key-value based storage system, like Berkeley DB and derived systems.

Show thread

Job Snijders Dec 6

@czauner I mention Berkeley DB in my post, yes?

Show thread

Clemens Zauner

@job
Yes, you did. Sorry. But one question: why sha256? You are aiming for speed, and not cryptographic security, if I understand you correctly.

Why not go the computationaly cheaper route of sha1 or even md5? Or CRC64 (you need a collision handler then, for sure)
That also leads to shorter filenames.

Show thread

Job Snijders Dec 7

@czauner everything in this particular ecosystem is addressed by SHA256() so I figured that the computational effort is offset by ease of constructing lookup keys

Show thread

Clemens Zauner Dec 7

@job
Well, I just assume you will be running here into some latency-Problems; You will need to shave off any microsecond. NFS does not help either - even a 1ms RTT limits you rather brutally to 1k Ops/sec. You can go and try to parallelize the fuck out of your implementation.
If your NFS-Server has an ext - FS underlying, you need to enable dir_hashes there (ext - FS are painfully slow, when there are a ton of files in given directory). Or you go the XFS-Route (but I'm no expert in Linux-Filesystems, that is).

Or, From am ZFS-Perspective (which is also available for Linux):
tune the hell out of the medatada-cache, including a decent L2-ARC on an NVME. You might also consider playing with the recordsize - but given that you are primarily into read-performance, that might not be an issue; Ah yes, and generally turn atime off.