i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

the "ideal" (their choice of words) case is 64.2%

edit: this got popular without me really intending to, so here's why i'm reading research: i want a semantic style transfer tool that can automatically format a patch "the same as the rest of the file / rest of codebase is formatted" without the rigidity involved in black or rustfmt that i find so hostile to my workflow that i refuse to use them. obviously, i want a tool that generates semantically equivalent code 100.0% of time (ignoring source locations or reading from __file__)

@whitequark compare and contrast the Extreme Programming philosophy, in which a code change doesn't count as "refactoring" unless all observable behavior is identical
@ireneista i like how it starts with this (left) and ends with "here is a variable we think would be good here. Do you like this" (right)
@ireneista starting with "gotofail bad" and ending with making the problem significantly worse, apparently without ever reflecting on this
@whitequark because "the thing we're promoting is incredibly dangerous, and not in fun ways" is not really the thing anyone wants to be cited for
@ireneista @whitequark Now, show me the numbers on the effort to make a rule-based style file compared to this. Because I'm sure that A_c is 100.0 in that case.
@GeoffWozniak @ireneista so the problem i'm solving is that while for C++, you have tools like clang-format which are nice and flexible, for Rust you have rustfmt which is rigid and makes your code look like ass. I do not like my code looking like ass but I am also receptive to the idea that introducing as many knobs as clang-format has into rustfmt would make it unmaintainable

@whitequark @ireneista I have not had to deal with rustfmt yet. For clang-format, I work in existing projects and use (very) mildly tweaked variants of the base style for the project.

At the risk of instigating the canonical bikeshed discussion, I am a conformist formatter and have not concerned myself with modifying style all that much. But I agree that clang-format has some bizarre knobs to tweak.

@GeoffWozniak @ireneista I view code as art so I find strongly canonicalizing formatters like black to be actively destructive. right now I use Ruff with a 300-line configuration for some of the Python code and I think there's gotta be a better way to approach this that isn't destructive
@whitequark @GeoffWozniak that's our view as well
@ireneista @GeoffWozniak based on a discussion with someone who has worked on this problem before we want to try building a diffusion model that captures the whitespace between code tokens and is then able to inject it into a given parsetree, which appears to be a fairly efficient and unproblematic way to do this
@ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken
@whitequark @ireneista This sounds a lot like XSLT (or XSLT-adjacent).