Mastodawn

Show HN: Turbolite – a SQLite VFS serving sub-250ms cold JOIN queries from S3

I built a SQLite VFS in Rust that serves cold queries directly from S3 with sub-second performance, and often much faster.

It’s called turbolite. It is experimental, buggy, and may corrupt data. I would not trust it with anything important yet.

I wanted to explore whether object storage has gotten fast enough to support embedded databases over cloud storage. Filesystems reward tiny random reads and in-place mutation. S3 rewards fewer requests, bigger transfers, immutable objects, and aggressively parallel operations where bandwidth is often the real constraint. This was explicitly inspired by turbopuffer’s ground-up S3-native design. https://turbopuffer.com/blog/turbopuffer

The use case I had in mind is lots of mostly-cold SQLite databases (database-per-tenant, database-per-session, or database-per-user architectures) where keeping a separate attached volume for inactive database feels wasteful. turbolite assumes a single write source and is aimed much more at “many databases with bursty cold reads” than “one hot database.”

Instead of doing naive page-at-a-time reads from a raw SQLite file, turbolite introspects SQLite B-trees, stores related pages together in compressed page groups, and keeps a manifest that is the source of truth for where every page lives. Cache misses use seekable zstd frames and S3 range GETs for search queries, so fetching one needed page does not require downloading an entire object.

At query time, turbolite can also pass storage operations from the query plan down to the VFS to frontrun downloads for indexes and large scans in the order they will be accessed.

You can tune how aggressively turbolite prefetches. For point queries and small joins, it can stay conservative and avoid prefetching whole tables. For scans, it can get much more aggressive.

It also groups pages by page type in S3. Interior B-tree pages are bundled separately and loaded eagerly. Index pages prefetch aggressively. Data pages are stored by table. The goal is to make cold point queries and joins decent, while making scans less awful than naive remote paging would.

On a 1M-row / 1.5GB benchmark on EC2 + S3 Express, I’m seeing results like sub-100ms cold point lookups, sub-200ms cold 5-join profile queries, and sub-600ms scans from an empty cache with a 1.5GB database. It’s somewhat slower on normal S3/Tigris.

Current limitations are pretty straightforward: it’s single-writer only, and it is still very much a systems experiment rather than production infrastructure.

I’d love feedback from people who’ve worked on SQLite-over-network, storage engines, VFSes, or object-storage-backed databases. I’m especially interested in whether the B-tree-aware grouping / manifest / seekable-range-GET direction feels like the right one to keep pushing.

https://github.com/russellromney/turbolite

turbopuffer: fast search on object storage

Inaugural blog post about the development of turbopuffer, a search engine that uses object storage and SSD caching for cost-effective, low latency search. This post describes into the motivation behind its creation, its unique architecture, and how it significantly reduces costs for large-scale vector searches. Discover how turbopuffer is transforming search infrastructure for companies like Cursor and Suno, offering a scalable and reliable solution.

Show thread

carlsverre

You might be interested in taking a look at Graft (https://graft.rs/). I have been iterating in this space for the last year, and have learned a lot about it. Graft has a slightly different set of goals, one of which is to keep writes fast and small and optimize for partial replication. That said, Graft shares several design decisions, including the use of framed ZStd compression to store pages.

I do like the B-tree aware grouping idea. This seems like a useful optimization for larger scan-style workloads. It helps eliminate the need to vacuum as much.

Have you considered doing other kinds of optimizations? Empty pages, free pages, etc.

Graft

Graft is an open-source transactional storage engine designed for efficient data synchronization at the edge.

Graft

Show thread

russellthehippo 4d ago

Very cool, thanks. I hadn’t seen Graft before, but that sounds pretty adjacent in a lot of interesting ways. I looked at the repo and see what I can apply.

I've tried out all sorts of optimizations - for free pages, I've considered leaving empty space in each S3 object and serving those as free pages to get efficient writes without shuffling pages too much. My current bias has been to over-store a little if it keeps the read path simpler, since the main goal so far has been making cold reads plausible rather than maximizing space efficiency. Especially because free pages compress well.

I have two related roadmap item: hole-punching and LSM-like writing. For local on non-HDD storage, we can evict empty pages automatically by releasing empty page space back to the OS. For writes, LSM is best because it groups related things together, which is what we need. but that would mean doing a lot of rewriting on checkpoint. So both of these feel a little premature to optimize for vs other things.