I'd like to store one billion variable-length binary objects & get them by SHA256(obj) key. The median object size is 2 kilobyte. Low read/write volumes.

What I've tried so far: NFS with the hash's first few octets as nested directory names, it works but it is a bit slow and I also tried ZeroFS (also too slow).

Under considerations: DuckDB, RocksDB, BerkeleyDB, SQLite3, lmdb, something bespoke

Recommendations? Things/papers I should be reading?

@job simply a large directory with files inside with name==SHA256 ? XFS performs pretty well under these constraints, and while the file metadata certainly is annoying overhead, it sounds like a robust, easy to use and relatively fast option. (data structure used in that case: cleverly organized B+-trees, with storage-respecting scheduling of modifications)

If it's not an option, any key-value store would do, as mentioned by the others. I'm not sure why you mention NFS – sounds like you want…

@job remote access? Multiple separate remote concurrent read, single write access with low coherency guarantees? Or do you need things to be fully transactional, i.e., no reader could ever see a half-written object? Do you need guarantees that the moment a write finishes its write, everyone else sees the same data, or might a bit of a propagation delay be OK?
And while I'm incessantly asking: what *are* acceptable read latencies for you?
Do you need high availability/how regularly take backup?
@funkylab multiple remote readers, no transactions, everything addressed by content hash, should propagate in seconds. Ideally able to read at least 10K objects / second.

@job yeah postgres table with a blob column; haven't done that, but you should be able to let the db calculate the hash on insertion instead of having to supply it; column type would be sonething like

hashcolumn TEXT GENERATED ALWAYS AS (encode( sha256(youblobcolumn),'hex'))

Or similar. Considering one billion entries, you might want add an index using actually that column.