Mastodawn

I'd like to store one billion variable-length binary objects & get them by SHA256(obj) key. The median object size is 2 kilobyte. Low read/write volumes.

What I've tried so far: NFS with the hash's first few octets as nested directory names, it works but it is a bit slow and I also tried ZeroFS (also too slow).

Under considerations: DuckDB, RocksDB, BerkeleyDB, SQLite3, lmdb, something bespoke

Recommendations? Things/papers I should be reading?

Show thread

Cassandrich Dec 6

@job What do you mean by "nested"? Just one level should always suffice & perform a lot better.

Show thread

Job Snijders Dec 6

@dalias /mnt/nfs/XX/YY/ZZ/XXYYZZRESTOFHASH where XX, YY, and ZZ are the first few bytes of the hash

Show thread

Cassandrich Dec 6

@job Yeah just do XXX/REST

@dalias why?

@job Only 2 directory lookups instead of 4+.

Show thread

Garrett Wollman Dec 8

@dalias @job This is going to hurt on directory lookups (directory implementations are optimized for ca. 10³ entries). On the other hand, NFS client path traversal is lockstep so you're going to get N×RTT where N is the number of directories in the oath. (Plus another RTT for the actual read RPC.) Large files with an index that can be held open across multiple requests will perform much better. In theory a custom NFS client that didn't go through namei() would avoid the pipeline stalls.

Show thread

Garrett Wollman

@dalias @job I advise my users to store this kind of data in a ZIP file; they can hold it open during training and random seeks in an open file are much less costly than opening random tiny files to read a single block. There's plenty of memory to hold the whole central ZIP directory. The users ignore my advice.