i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

the "ideal" (their choice of words) case is 64.2%

edit: this got popular without me really intending to, so here's why i'm reading research: i want a semantic style transfer tool that can automatically format a patch "the same as the rest of the file / rest of codebase is formatted" without the rigidity involved in black or rustfmt that i find so hostile to my workflow that i refuse to use them. obviously, i want a tool that generates semantically equivalent code 100.0% of time (ignoring source locations or reading from __file__)

@whitequark And this is how research money is lit on fire, I guess. Why else conduct research into ML for a task that has had obvious, deterministic, efficient and well-tested solutions for decades?
@lu_leipzig I actually really don't like formatters like black or rustfmt which is why I'm collaborating on research into doing it with ML, but there are ways to do it that never produce a different AST
@whitequark oh, interesting, what do you not like about them? I could imagine a ML model would do a decent job deciding between n equivalent deterministically produced ASTs that vary e.g. w.r.t. indentation on multi-line definitions/calls.
@lu_leipzig I view code as art and so any tool that puts determinism strictly above aesthetics is a net negative to my craft

@whitequark @lu_leipzig Ideally, I think a formatter that learns how I formatted the rest of the buffer would be the goal.

Most of the time I like the deterministic formatting. However, I find deterministic formatting fails me around function headers and long function calls / long boolean statements.

I want it to do the deterministic formatting once, and then if I undo immediately, don't do it again to that area... and preferably learn what I was trying to do.

@theeclecticdyslexic @lu_leipzig my goal is to be able to run a command on a patch that formats the added lines "more or less like the rest of the file"

@whitequark @lu_leipzig that's a pretty reasonable concept I think.

I like the idea at least.

One thing I will say of deterministic formatters is they have changed my habits over time in order to get it to format the way I want. You can take that as both good and bad, but I think most (maybe 60%) of the things they have forced on me have been good.

Edit: I also get stun locked trying to decide how to format 15 lines of code far less often.

@theeclecticdyslexic @lu_leipzig yeah if a formatter requires me to do things I don't want I simply quit using the formatter (and sometimes the codebase)
@whitequark @theeclecticdyslexic @lu_leipzig You are absolutely right. So for JS/TS we're using eslint only. It is much less strict about things but gets the job done. Line length is one of my pet peeves. I simply cannot and don't want a strict length because sometimes a line is longer than the rest. For reasons. I don't use formatters either for that reason. Works well for me.

@whitequark

Even if the AST is the same, might a sufficiently bad format mislead humans reading the resulting code?

I'm reminded of the Obfuscated C Contest…

@lu_leipzig

@lu_leipzig @whitequark i would honestly be more interested into a deterministic but very configurable formatter, and a ml model to, from sample code, write a config for you, and you just do minor adjustments to it, generally all code styles stand in just a few hundred switches
@SRAZKVT @lu_leipzig this would be ~easy to do but convincing people to implement and maintain "a few hundred switches" has been incredibly difficult; my motivation is exactly that rustfmt maintainers have been consistently unwilling to entertain that
@SRAZKVT @lu_leipzig if every language i cared about (at this point: mainly rust, python, and c++) had highly configurable formatters i would not care to spend as much effort as i'm planning to on ml research

@whitequark @lu_leipzig most tooling devs today seem to believe in a one size fits all with no configurability, kind of sad

also i think the problem of "but if every codebase isn't formatted exactly the same" is way overblown, once you start reading the code it really doesn't take long to adapt to a new style, barely a few minutes from my experience

@SRAZKVT @lu_leipzig there is a more real problem of "some people bounce off contributing if you ask them to fix style"