hm. i realize i have everything to begin the rewrite of Tukan.

#devlog #nudl #tukan

table views for debug inspection are now entirely auto-generated. when no type schema for a column is available, formatting just guesses based on what the data looks like.

#devlog #tukan

i still need to add variadic arguments to the interpreter engine. it's a pretty good fit for datalog since we're only one implicit relation away from iterating through variadic columns, e.g.

element(x) :- some_table(args...), each(x, args...).

#devlog #tukan

am now in the long process of translating frontend functions from using LMDB with specialized struct types to using the interpreter engine instead.

had to write a binding layer that takes care of conversions to/from string arrays, but now i'm in flow, and translating it all is straightforward.

struct definitions had to be changed to macros (example below) so we can generate all the code we need from them.

#devlog #tukan

and we have support for LMDB (decl_dbrel), as well as a helper for simple functional relations (frel).

pics below: code and runtime output.

i now might have enough to move this to the actual app.

#devlog #tukan

to my great surprise, the implementation worked without a problem. after some heavy template adaptations, i can now use LMDB tables like any other datalog-style table.

#devlog #tukan

alright. i have a (completely untested) implementation. now we can boogie.

https://paste.sr.ht/~duangle/f58113062f197b9863f251a5a0f5855478f506b9

#devlog #tukan

curious btw how this simple change in spec is already no longer representable with C structs. a C struct can do header + unsized array for you, but a header and two consecutive unsized arrays would need entirely new semantics, a specific encoding, and so on. the language designer throws their hands and says: "you do it. i gave you enough."

fine.

#devlog #tukan

problem is only that the overhead of the header can be significant. i can start out like this but will probably need to pack later. i'm thinking using a single varint format for all header integers: we find the smallest fitting encoding, and then encode num_columns this way.

by decoding num_columns, you then know the stride of offset[], and there you don't have to encode the offsets anymore, just narrow them.

so the smallest rep for 1 column has 2 bytes.

#devlog #tukan

now dealing with the interesting problem of mapping a vector<string> to LMDB rows which means we need a relatively transparent way of serialization. if the format is chosen well enough, then seeking can be done directly on the structure. writing is more expensive.

i'm thinking:

uint num_columns;
uint offset[num_columns];
char data[];

offset[x] gives you offset into data for column (x+1). column 0 is always 0. last offset is in fact the total number of bytes.

#devlog #tukan