6k line Souffle program status: Appears to parse successfully.
Bonus: #Draupnir now has mildly cleaner parse error reporting.
6k line Souffle program status: Appears to parse successfully.
Bonus: #Draupnir now has mildly cleaner parse error reporting.
#UBODIn will be at #NEDB_Day_2026 on Jan 16 with two posters!
"Flow-centric Query Evaluation Pipelines" (Victoria, Andrew, Krishna) presents our preliminary efforts to make a scheduler-friendly query evaluation pipeline for our #Draupnir datalog engine. The key insight behind our work is decoupling state from operators. By making operators (mostly) stateless, we can inline better, and we can expose IO to the scheduler more efficiently.
"Benchmarking Tabular Representation Models on Longitudinal Data" (Pratik) presents our work on data integration for longitudinal studies. Longitudinal studies generate a slew of datasets that are almost alike, but not quite. Coupled with the fact that attributes are identified by prose questions rather than simple identifiers, they aren't a great fit for existing data integration/unionability tools. We'll specifically be presenting a benchmark, painstakingly adapted from the American National Election Survey, which shows that we need new data integration tools.
... and here I thought our out-of-core datalog compiler might be perceived as archaic...
Recursion in #Draupnir is getting closer, making it very nearly a proper #Datalog compiler. What would normally be a simple task is becoming considerably harder due to the need to support general monoid bases for the relations (which we want for cleaner aggregates than Souffle), as well as the need to handle batch scheduling to support disk.
The main challenge so far has been coming up with an execution plan that safely batches each iteration, while playing nicely with our push+pull scheduler, and simultaneously making sure that it maintains the correct arity of each tuple. Not hard... but very finicky.
We've come up with a pretty clean set of extensions to our logical pipeline DAG that seem like they elegantly capture recursion, and compiling a simple (count the paths) query to the logical stage appears to be producing a sensible graph. This has revealed some bugs in the pipeline optimizer, and we still need to add support into the interpreter... but it's progressing.
I don't think I've ever been so happy to get basic join support working. Expressivity usually comes at the expense of performance, and we are aiming for a TON of expressivity in #Draupnir.
It's been a heck of a time getting the "good" cases to optimize down to what a standard database would do (e.g., simple Hash Joins), but the end is in sight... The group theory bits are (mostly) implemented, and now we just need to deal with simple operational issues like the scheduler freaking out when it tries to poke an unordered CSV file for a cursor that supports indexed seeks.
The effort is worth it though... with this latest change, we should be able to support the ring-style factorization optimizations you get in #DBToaster, while simultaneously supporting non-ring aggregates like Min and Max like #Souffle, as well as #Rel's map relations.
The project name seems more apropos than we realized when we first named it... The key insight has been the realization that we need to give up the ring (and settle for the monoid/group).
Draupnir call for participation: reviewing the last development cycle and affirming the project direction
If you value the work that we do on the Draupnir project and think that it is important then we need to hear from you.
We are trying something new where we provide a short update and reflection about the project and an interactive vote where you can affirm contributors about the project direction.
I don't know if our database needed a bytecode compiler...
... or if the bytecode runtime and associated scheduler needed support for green threads...
... but I guess it has both now.
I don't know if I needed a bytecode interpreter for my database... but I have one now.