A thing that has always frustrated me about github/bitbucket, as a language designer, is that you can't teach the forge to syntax highlight files in your own custom formats.

Now the existence of Codeberg/git.gay means potentially I could create a PR to forgejo to add this feature and it would get added to the forges I actually use. Perhaps at some point I will do this.

Anybody know off the top of their heads what syntax-highlighter format Codeberg/Forgejegejo even uses?

Oh. it's… oh.

It's… custom Go code… on a per-language basis. they use something called Chroma and the way Chroma works is it wrote custom lexers in Go for each language they want to support. Um. Hm.

This is actually the one single approach they could have attempted which prevents custom pluggable highlighters on a per-repo basis.

https://hey.hagelb.org/@technomancy/statuses/01KNQJ9H3R64BEHE1QWNBXZKVW

technomancy (@[email protected])

@mcc last I checked it was https://github.com/alecthomas/chroma ;  I remember sending a patch to support Fennel and it was handled pretty promptly

hey.hagelb.org
@mcc you would really think that "one widely-supported declarative non-executable grammar format for syntax highlighting" would be a solved problem by now but it kinda feels like tree sitter is sucking up all the oxygen in that space; don't love how that's going
@technomancy Do you have negative opinions about treesitter, and if so, why?

@mcc the main thing is that grammars are opaque blobs of executable code, which sucks! technically you can often compile them from a declarative data source but afaict you can't do this without npm

a well-designed format would make the unit of distribution a purely declarative data format but instead we ended up with this situation where you could install a grammar that segfaults your editor or steals your SSH keys; gross!

@technomancy in treesitter, they are?

yikes

@mcc some editors have configuration flows where it's just like "ok imma just curl a tree-sitter grammar .so file from gods know where and load it directly into the process; hope everything's fine and nothing bad happens!"

it's like the flow you'd come up with if you were a supply-chain attacker trying to maximize attack surface

Introducing arborium, a tree-sitter distribution

About two weeks ago I entered a discussion with the docs.rs team about, basically, why we have to look at this: When we could be looking at this: And of course, as always, there are reasons why thi...

fasterthanli.me

@c0dec0dec0de @technomancy *rubbing eyes* which problem does this fix, exactly? is the idea that instead of pulling in .SO's you pull in .WASMs and that's safer?

Unfortunately I don't think that's adequate for codeberg's purposes as they need to worry not only about VM breaks but also about DOS attacks. Nothing prevents wasm from simply being very slow.

@mcc @c0dec0dec0de That particular project also introduces the other problem of LLM usage (disclaimer in README, Amos being very pro-slop)
@jamesnvc @mcc yeah, that’s still regrettable.