Mastodawn

With the addition of segmented powers-of-two data structures and math types I wrote, close to having enough basics for the compiler. There's a separate "block list" (linked list of fixed-size memory blocks) data structure that reserves from pool of blocks for smaller dynamic lists which are mostly-append. I think I'd be happier if I could remove it entirely and only use segmented array based types, but currently I have no pooling for them. Maybe could give each segment size its own pool, hmm.

Show thread

egg boy color Jun 12

Basically keep a freelist of each power-of-two segment and look through those when allocating segments. If none remaining, go see if there's any larger-than-requested segments that can be split, before asking the arena for more memory.

I guess if I do that latter bit, I need metadata to know if the segment is still part of a larger contiguous segment or not. Otherwise, a split and later free could gradually leak larger segments. Maybe I could coalesce it all at some point, or not care for now.

Show thread

egg boy color Jun 15

Added basic pool of segments, segment lists/maps now are pooled.

Looking at string escape expansion for string tokens (normalized to their unescaped form then interned), and string escaping for sanitization when reporting strings with unprintable/control codes/whitespace/quotes/UTF-8 in them (extended chars/UTF-8 when supported could be cool, but want a way to fallback to printable ASCII range). Also implemented first pass at UTF-8 encoding/decoding of single codepoints, codepoint length.

Show thread

egg boy color

to avoid 2x the work handling escape sequences, I ended up making strings use a lexer-managed temp buffer instead of having separate classification + expansion stages for these, as the same character-based branching needs to be duplicated by each pass, and strings need to be interned before they are consumed by the parser anyway. Meanwhile, integer parsing into various bases is done after the tokenization -- parsing ints is also done by other code, so it makes sense to lift it outside the lexer.