Mastodawn

Eniko (moved ➡ gamedev.place)Nov 27, 2022

goddamn writing a recursive descent parser is a lot harder when you wanna do good syntax error recovery

stop genocide punch nazis Nov 27, 2022

@eniko I realized it way too late, but in case you haven't thought about it yet: designing the syntax so that there are unambiguous toplevel bits helps a lot. "This thing has to start a new definition, so no matter the mess before, I know again where I am."

Show thread

Eniko (moved ➡ gamedev.place)Nov 27, 2022

@nikodemus you mean like "func void foo(){}" instead of just "void foo(){}"?

Show thread

Jochem Kuijpers Nov 27, 2022

@eniko @nikodemus yep, those are referred to as synchronization points; when you encounter them, you just pop all the state from the stack until you're back at parsing top level functions, and report an error.

The semi-colon is a very common one, too. You don't tend to see it mid-expression, only at the end of a statement, so if you encounter one and expect more expression tokens, you get to close the expression and report an error.

Show thread

Esteban Küber

Nov 27, 2022

@jchmoe @eniko @nikodemus depending on the language you might have to account for opening and closing braces. For example, in Rust closures can have semicolons inside of a valid closure expression, so what I'd do there is swallow everything inside any new scopes encountered until I find a ; or }

Show thread

stop genocide punch nazis Nov 27, 2022

@ekuber @jchmoe @eniko Very true. The reason I think identifiable toplevel syntax is important is that otherwise unbalanced delimiters tend to blow up much worse in languages with big source files.

Show thread

Esteban Küber

Nov 27, 2022

@nikodemus @jchmoe @eniko my experience in rustc tells me that unbalanced delimiters need to be handled in the parser and not the lexer, which is not what we do 🫤

Show thread

stop genocide punch nazis Nov 27, 2022

@ekuber @jchmoe @eniko Wait, what? rustc _lexer_ deals with unbalanced delims? Do you happen to know why it came to be that way?

Show thread

Esteban Küber

Nov 27, 2022

@nikodemus the proc macros deal in TokenTrees, which need balanced delims, instead of TokenStreams which is ignorant of delims. But by the time the parser kicks in it turns a TokenTree into a TokenStream. There hasn't been a huge need to change it, other than improving delim error recovery.

Show thread

stop genocide punch nazis

@ekuber That makes sense, thanks!