Mastodawn

Oliver Kennedy Nov 1, 2025

I don't think I've ever been so happy to get basic join support working. Expressivity usually comes at the expense of performance, and we are aiming for a TON of expressivity in #Draupnir.

It's been a heck of a time getting the "good" cases to optimize down to what a standard database would do (e.g., simple Hash Joins), but the end is in sight... The group theory bits are (mostly) implemented, and now we just need to deal with simple operational issues like the scheduler freaking out when it tries to poke an unordered CSV file for a cursor that supports indexed seeks.

The effort is worth it though... with this latest change, we should be able to support the ring-style factorization optimizations you get in #DBToaster, while simultaneously supporting non-ring aggregates like Min and Max like #Souffle, as well as #Rel's map relations.

The project name seems more apropos than we realized when we first named it... The key insight has been the realization that we need to give up the ring (and settle for the monoid/group).

Show thread

Oliver Kennedy Jul 25, 2025

After my brain's had a chance to chew on this problem, the A(B(db)) = B(A(db)) property turns out to be *really* strong.

We're looking at transactions in the context of IVM, where this sort of thing makes sense (maintaining a view had better be independent of insertion order). However IVM schemes like #DBToaster achieve this by having triggers that read from at least one relation that every other trigger writes to. No proof yet, but I'm pretty sure that such pairwise dependency cycles are a necessary condition for A(B(db)) = B(A(db)).

All that to say that I don't think we've found some form of OCC that can magically employ commutativity to avoid inter-transaction data dependencies.

However, guaranteeing that all conflicts will be pairwise means that we *can* avoid (without loss of precision) the need to compute a transitive closure on the committed transaction graph to find conflicts. That seems neat.

Oliver Kennedy Feb 3, 2025

I gave in... and rewrote #DBToaster in rust, and it's running its first SQL queries in interpreted mode! Still need to test query differentiation, and I probably won't go full recursive IVM at first, but most of the building blocks exist.

This version ended up with a query language that looks a lot more like relational algebra than the original AGCA (which is more dataloggy), but the fundamental idea remains: Each tuple is decomposed into an 'identity' component (the key), and an 'annotation' component, which is associated with a Ring- or Group-like combinator.

As a simple example, the following two relations are identical (assuming attribute 0 is a key, and the value is annotated with the <Z, +, 0> group, or an analogous ring):
{ <1> -> 1, <1> -> 1 } (two copies of the tuple <1, 1>)
and
{ <1> -> 2 } (one copy of the tuple <1, 2>)

This change turns out to be huge win, by allowing group-by aggregates to be computed lazily. The relation encodes how multiple tuples need to be coalesced to get the final value. This, in turn, makes push-based query evaluation (and in turn semijoin/magic sets-style optimizations) easier and faster, since aggregation is no longer necessarily a pipeline blocker. Obviously, there are also benefits for IVM.

On a related note, after working for a long time with the JVM, rust-generated binaries feel like they run too fast. I did a whole lot of dumb things in the interpreter... but it still feels like the console is the bottleneck.

Oliver Kennedy Oct 2, 2024

#DBToaster's Aggregate Calculus has a feature that I never paid much attention to before: It has well-defined evaluation logic for not only classical bottom-up ('compute a relation from the inputs') query evaluation, but also a more provenance-y top-down ('compute the value of a specific tuple') semantics. The latter is faster, as it avoids collection operations... but requires first computing the query's support. However, in some cases, the support is bounded and known at compile time...

Oliver Kennedy Sep 26, 2024

It's not all that evident until you start looking at #DBToaster's AGCA, but the #ProvenanceSemiring join operator behaves a lot like unix pipes (i.e., function composition), but with explicit associativity/commutativity. g | f == f | g for all f, g.

This is simultaneously super powerful (e.g. you can express selection by intersecting with the infinite relation of satisfying tuples), and super frustrating when trying to model a transformation that can't be modeled in the semiring.