Range-Based Set Reconciliation via Range-Summarizable Order-Statistics Stores

Range-Based Set Reconciliation (RBSR) synchronizes ordered sets by recursively comparing summaries of contiguous ranges and refining only the mismatching parts. While its communication complexity is well understood, its local computational cost fundamentally depends on the storage backend

we introduce AELMDB, an extension of #LMDB that realizes this design

https://arxiv.org/html/2603.19820v1

Range-Based Set Reconciliation via Range-Summarizable Order-Statistics Stores

"I was considering using #LMDB (which uses mmap) in a NodeJS server but page faults would block the event loop stopping it from processing requests. Benchmarks can be misleading."

https://xcancel.com/Amr__Elmohamady/status/2014672445244895449#m

Pretty sure that pagefault handling is still at least an order of magnitude faster than anything a Node.JS server can process. People always worrying about the wrong things in their code...

What’s Wrong with Synthetic Data for Scene Text Recognition (STR)? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution

STR aims to precisely extract text from natural images that feature intricate backgrounds and a variety of imaging conditions. this task has become increasingly vital in the era of LLMs, as it contributes substantial training data

We store all images in JPEG format and write them into the #lmdb files that are commonly used in STR.

https://arxiv.org/html/2602.06450v2

What’s Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution

LwMQ Storage API

"The LwMQ Storage component is based, in part, on a modified version of LMDB, customized for use as part is the LwMQ Storage API.

Notable modifications include support for NTFS sparse files and alternate data streams."

https://www.lwmq.net/docs/api/storage.html

We rejected these patches because they're anti-features. Storing the #LMDB "lockfile" in an alternate stream of the DB file means you can't just delete stale lockfiles any more.

LwMQ Storage API — LwMQ 1.0 documentation

"lmdb/lmdb-win32-arm64
6 days ago — Platform specific binary for lmdb on win32 OS with arm64 architecture. Latest version: 3.5.2, last published: 4 days ago."

Yeah, no. #LMDB is an embedded DB library, intentionally kept under 64KB, so that it can be built statically and *embedded* into each app that uses it. It makes a difference whether you build it for 32bit or 64bit apps, and 32bit or 64bit DBs. You can't just build it once and call it "the platform/system LMDB". Stop doing this.

Why I Used SQLite for a 250GB Production Database

"Comparative Analysis: SQLite vs. Key-Value Stores

LMDB (Lightning Memory-Mapped Database) uses a B+ Tree and mmap — architecturally identical to SQLite in mmap mode."

https://anyimossi.dev/journal/sqlite-250gb-why-sqlite/

That's not a coincidence. SQLite's support for mmap was based on lessons learned from #LMDB

https://marc.info/?l=sqlite-dev&m=144565668324618&w=2
https://marc.info/?l=sqlite-dev&m=144565668524665&w=2

https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg73551.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg76252.html

Why I Used SQLite for a 250GB Production Database | Anyim Ossi — AI Software Engineer

An OSINT lookup tool on 250GB of scraped public data. SQLite, not Postgres — because eliminating IPC overhead mattered more than query planner sophistication. Query times fell from 4 hours to 40ms across six optimization layers.

LwMQ IPC Messaging System Performance

Long story short: the message rate for tiny messages (<= 80 bytes) is above 10 million/sec, while larger messages (4KB in this test) are not as fast at ~2M msg/sec, ...

the whole platform, including a super fast in-memory cache, a new persistent KV store based on #LMDB, and the utility libraries for file cleaning, hashing, and more, weighs about 3.5MB total, or about the size of a "Hello World" in Rust 😉

https://www.linkedin.com/posts/axelrietschin_finally-some-end-to-end-numbers-with-actual-activity-7430553500062896128-z5ab

LwMQ IPC Messaging System Performance | Axel Rietschin posted on the topic | LinkedIn

Finally, some end-to-end numbers with actual messages across processes on the same box. Long story short: the message rate for tiny messages (<= 80 bytes) is above 10 million/sec, while larger messages (4KB in this test) are not as fast at ~2M msg/sec, but the data throughput climbs above 70Gbps, all on a 2024 gaming laptop. The client (sender) seems faster than the server (receiver) because what is measured really is the message creation + queuing. Since LwMQ is fully asynchronous, the client is done queuing before all messages are actually sent over the underlying transport. I have yet to implement a lingering mechanism to allow for draining the send queue(s) before closing. The numbers don't necessarily reflect the last word on the subject. Note that the address is formulated using a familiar Uri format. However, no network component is involved in this IPC communication, and the hostname and port are merely illustrative. Any string can be used. Only point-to-point (1:1) channels are supported at this time, so a "server" must open a separate channel with each "client" it talks to. Small but important point: it does not matter which side starts first. A "client" can start, open a channel, and begin queuing messages long before anyone is listening, or vice versa. LwMQ connects in the background and delivers messages when a link is established, without losses. This relaxes the dreaded start-order dependencies found in pretty much every other IPC mechanism under the sun. Messages are structured entities with one or more data frames and optional timestamps, and the times include the creation and disposal of all messages. The callers can create multiple send queues per channel. There are various types of queues, single and multi-producer, bounded or not (where a queue blocks when reaching a given capacity) or discarding (where a queue discards either the oldest or the newest message when full) and last but not least, a tagged queue where messages can be assigned an optional tag (type) and only the last message of a given type can exist in the queue at any given time. This enables advanced scenarios. Oh, and priorities: queues have five standard priorities, plus idle and time-critical, which governs how the channel thread services them. This is the most sophisticated and flexible queuing system I'm aware of, and it should cover many edge scenarios from heartbeat to batching graphics commands to tracelogging. In the works: RDMA, TCP/RIO, and HvSocket transports. Stay tuned. www.lwmq.net - Your next favorite IPC messaging system covering everything from AI training workloads to financial data to run-of-the-mill IPC messages, as fast as hardware allows. Fun fact: the messaging DLL currently weighs 323KB, and the whole platform, including a super fast in-memory cache, a new persistent KV store based on LMDB, and the utility libraries for file cleaning, hashing, and more, weighs about 3.5MB total, or about the size of a "Hello World" in Rust 😉

LinkedIn

SCALE: Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

Virtual cell models aim to enable in silico experimentation by predicting how cells respond to genetic, chemical, or cytokine perturbations from single-cell measurements. ... In this work we present a specialized large-scale foundation model SCALE ...

To reduce random I/O overhead from large AnnData objects, we preprocess the dataset into #LMDB

https://arxiv.org/abs/2603.17380

SCALE:Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

Virtual cell models aim to enable in silico experimentation by predicting how cells respond to genetic, chemical, or cytokine perturbations from single-cell measurements. In practice, however, large-scale perturbation prediction remains constrained by three coupled bottlenecks: inefficient training and inference pipelines, unstable modeling in high-dimensional sparse expression space, and evaluation protocols that overemphasize reconstruction-like accuracy while underestimating biological fidelity. In this work we present a specialized large-scale foundation model SCALE for virtual cell perturbation prediction that addresses the above limitations jointly. First, we build a BioNeMo-based training and inference framework that substantially improves data throughput, distributed scalability, and deployment efficiency, yielding 12.51* speedup on pretrain and 1.29* on inference over the prior SOTA pipeline under matched system settings. Second, we formulate perturbation prediction as conditional transport and implement it with a set-aware flow architecture that couples LLaMA-based cellular encoding with endpoint-oriented supervision. This design yields more stable training and stronger recovery of perturbation effects. Third, we evaluate the model on Tahoe-100M using a rigorous cell-level protocol centered on biologically meaningful metrics rather than reconstruction alone. On this benchmark, our model improves PDCorr by 12.02% and DE Overlap by 10.66% over STATE. Together, these results suggest that advancing virtual cells requires not only better generative objectives, but also the co-design of scalable infrastructure, stable transport modeling, and biologically faithful evaluation.

arXiv.org

Evernode with lmdb

Securly sign and store information in a lightweight lmdb database.

Ever-lmdb-sdk is a powerful library that allows secure data storage and retrieval in Evernode smart contracts using #LMDB. In this guide, we will walk you through the steps required to set up Ever-lmdb-sdk in your project.

https://ever-lmdb-client.vercel.app/

Ever-Lmdb-Sdk - Evernode Smart Contract DB.

Securly store and retrieve data in evernode smart contracts using lmdb