Mastodawn

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens

we have successful JSON parsing, object query, object construction and serialization.

now to provide an actual service. the most important thing is colorization, so that's first.

it's beginning to work!

numbers and strings are not being highlighted correctly yet, don't know why.

also, apparently tokens are not allowed to span multiple lines¹, so i need to fix up this part as well.

more tomorrow.

—
¹ multiline tokens can be supported by clients, but the server needs coverage for the fallback so why maintain two paths?

how does LSP encode text file positions: UTF-8? (what sxpp uses) UTF-32? (also sensible i guess)

neither. (well since 3.17 yes, but only if the client feels that way)

it's UTF-16. UTF-16 support is mandatory. why? because it's a damn microsoft protocol that's why¹. 😫

fortunately character offsets are line-relative so fixing that up is not too expensive.

—
¹ yes it's also because Javascript.

the way sxpp's streaming lexer (tokenizer) works, UTF-16 and UTF-32 input streams are already supported.

since all controlling characters are well below 0x7f, and the lexer doesn't output strings, only token types and locations, you can just feed it 0xff clamped chars, and then offsets and positions are implicitly correct.

hm. Kate colorizes macros and comments, but not strings, numbers and keywords. idk what's wrong.

when i flip the token order, same result. when i make all tokens comments or macros, then they get colorized as such. it's weird.

got vscode to interface with my LSP but it sends no semantic token related RPC calls at all. no syntax higlighting.

terrible to debug - no error messages anywhere, everything fails silently.

https://bugs.kde.org/show_bug.cgi?id=519957

i went on IRC and asked around on the kate channel, and we browsed a bit of source code together, and --

so it turns out Kate's LSP client simply doesn't highlight strings, numbers and operators.

i filed a ticket with the project, let's see how it goes.

519957 – LSP Client Does Not Highlight Numbers, Strings, Operators

@lritter arguably, for constructs produced by the tokenizer or the parser, but not the semantic analyzer, that's not the LSP's job.

LSP is supposed to give _semantic_ highlighting (so e.g. for C++, for an identifier, clangd will assign a semantic category like "data member", "parameter", "local variable", etc.).

for _syntax_ highlighting, we only need the abstract syntax tree, and for that, modern editors depend on tree-sitter.

@lritter and for tree-sitter, you basically just write grammar file, and then tree-sitter will generate a parser from that.

@JamesWidman
it absolutely is the LSP's job - or else "string", "number" and "operator" wouldn't be officially recognized tokens.

but i understand the history of it. by now it's just history though. because this is the fucking pitch they give:

@JamesWidman and if that is *really* supposed to be true, then i'm not going to write N recipes for parsing. it would not even be adequate. SX allows for recursive tokenization of string blocks.

@lritter this is a little tangential, but: lots of interesting language-agnostic editor features have been enabled for the case where the editor has a syntax tree provided by the treesitter API; e.g.:

- text selection based on the range of a tree node;
- the ability to move the cursor to certain kinds of nodes (e.g. "jump to next/previous parameter")
-search-and-replace based on tree structure patterns;

etc.

it might be nice if the editor could get that parse tree from the LSP server though.

@lritter (and in the case of SX, it sounds like the only reasonable way to get that tree would be to get it from the LSP server.)

Philip Trettner 1d ago

@lritter THAT i strongly feel. i old-school wrote text log files with all the interactions and tried to find the issues.

You did advertise that you support semantic tokens in your initialize msg, right? (see img)

in vscode you need to have "editor.semanticHighlighting.enabled": true (but that's the default)

depending on range/full you should either see textDocument/semanticTokens/full or textDocument/semanticTokens/range messages for your open document

Philip Trettner 1d ago

@lritter I remember hours wasted because things were not properly advertised in initialize...

@artificialmind yep that's exactly how it looks and yet - nada. has to be some fuckup in vscode-languageclient, but i don't know how to debug this.

Anders Stenberg 1d ago

@lritter Do you have to make a custom extension for VS Code as well or is it possible to connect to any LSP in a generic way? From what I can tell it feels like each language's LSP has their own extensions.

@SonnyBonds yeah you not only have to make a custom extension, you have to also install a LSP client library for it within the extension. each extension duplicates that code. fantastic engineering! 😬

Anders Stenberg 1d ago

@lritter Another reason Kate is looking somewhat interesting.

Torbjörn Norinder 1d ago

@lritter you could try Emacs and eglot if Emacs Lisp does not scare you. The debugger is pretty nice and since it’s a Lisp you can just override arbitrary internal functions to add logging if you prefer doing it that way

@tnorinder i prefer something more modern thank you %)

Torbjörn Norinder 1d ago

@lritter fair enough. The cost of making Emacs reasonable for 2026 is indeed quite daunting.

https://emacs-lsp.github.io/lsp-mode/page/troubleshooting/

Chris Green 1d ago

@lritter Try testing it with emacs? It's LSP packages have some diagnostic abilities. And since the callers of the lsp code are just add on lisp packages with source, you can single step, look at variables in the lsp-calling code, interactively call functions used for lsp and see the results, etc.

Troubleshooting - LSP Mode - LSP support for Emacs

Language Server Protocol Support for Emacs

@lritter and this, kids, is a great example of a repeating pattern in all kinds of domains:

LSP the idea: great
LSP the reality: questionable

We can now continue our discussion of if mediocre realizations of good ideas are a net win or a net loss for the community. This has a lot of nuance and I expect 3 scrolls until next Tuesday, dismissed!

@artificialmind one could probably make a better protocol & provide a LSP adapter.

@lritter LSP in particular is "not that bad" in my books because the last time I wrote support for it, it was trivial to abstract away all the weirdness. I'm sure that is on accident because I've worked with plenty of "mid" stuff before that tried to infect your codebase as thorough as they can. LSP is surprisingly local in that sense.

@artificialmind you need a JSON and -RPC layer. that's a bit much already. transmitted documents are escaped and unescaped for no reason.

@lritter yeah from a dependency point of view you're right. But for the code organization it's not really viral. My compiler code produced typed LSP event output that I could unit-test normally. Then one submodule of my code did all the JSON+RPC+protocol weirdness and translated the typed events for the wire.

@artificialmind it's viral for JSON which is garbage

@artificialmind what kind of code organisations are viral?

@lritter I meant that sometimes it's not easy to build abstractions that keep the questionable design decisions at bay.

Like, if you try to abstract OpenGL (before direct state access) in a way that doesn't leak global state dependency, you only had bad choices basically. Slower perf, not playing ball with 3rd party code, etc..

That means OpenGL's "bad decisions" are your problem as well, in a viral way.

But LSP's "bad decisions" were confined to a corner in my code.

@artificialmind i see.

well it could always be worse: https://lv2plug.in/ (the linux VST analog from hell)

@lritter
> LV2 has a simple core interface

it can't be bad if it's simple right!! because simple == good!

@artificialmind they're not mentioning how your plugins need to be shipped with offline manifests written in https://en.wikipedia.org/wiki/Turtle_(syntax)

Turtle (syntax) - Wikipedia

Fabian Giesen 1d ago

@lritter not for the reason you think, by the way.

It's UTF-16 not because Windows brain worms but because Javascript brain worms. (VSCode is JS, or Typescript to be exact)

@rygorous yes. both can be true.

they went from NIH to MEH

@lritter I've been eyeing Kate a bit lately. It feels like I could maybe like it. Haven't been able to make debugging (with lldb) work at all though.

@SonnyBonds i use cmdline gdb for that anyway. old habits.

@lritter I like having some editor integration for breakpoints and stepping and stuff. VS Code (which I currently use) has an okay debugger integration, but something's a bit off with it and I'd love for something better to come along.

@SonnyBonds i step like once a year. it's all my own code so there's less confusion.

@lritter I step into things and inspect state quite often, multiple times a day at least.

I guess if there was a good graphical external debugger that I could get used to that would work as well.