Mastodawn

i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

the "ideal" (their choice of words) case is 64.2%

edit: this got popular without me really intending to, so here's why i'm reading research: i want a semantic style transfer tool that can automatically format a patch "the same as the rest of the file / rest of codebase is formatted" without the rigidity involved in black or rustfmt that i find so hostile to my workflow that i refuse to use them. obviously, i want a tool that generates semantically equivalent code 100.0% of time (ignoring source locations or reading from __file__)

Show thread

Irenes (many)12h ago

@whitequark compare and contrast the Extreme Programming philosophy, in which a code change doesn't count as "refactoring" unless all observable behavior is identical

Show thread

✧✦Catherine✦✧12h ago

@ireneista i like how it starts with this (left) and ends with "here is a variable we think would be good here. Do you like this" (right)

Show thread

✧✦Catherine✦✧12h ago

@ireneista starting with "gotofail bad" and ending with making the problem significantly worse, apparently without ever reflecting on this

Show thread

Irenes (many)12h ago

@whitequark because "the thing we're promoting is incredibly dangerous, and not in fun ways" is not really the thing anyone wants to be cited for

Show thread

Geoff Wozniak 10h ago

@ireneista @whitequark Now, show me the numbers on the effort to make a rule-based style file compared to this. Because I'm sure that A_c is 100.0 in that case.

Show thread

✧✦Catherine✦✧10h ago

@GeoffWozniak @ireneista so the problem i'm solving is that while for C++, you have tools like clang-format which are nice and flexible, for Rust you have rustfmt which is rigid and makes your code look like ass. I do not like my code looking like ass but I am also receptive to the idea that introducing as many knobs as clang-format has into rustfmt would make it unmaintainable

Show thread

Geoff Wozniak 10h ago

@whitequark @ireneista I have not had to deal with rustfmt yet. For clang-format, I work in existing projects and use (very) mildly tweaked variants of the base style for the project.

At the risk of instigating the canonical bikeshed discussion, I am a conformist formatter and have not concerned myself with modifying style all that much. But I agree that clang-format has some bizarre knobs to tweak.

Show thread

✧✦Catherine✦✧10h ago

@GeoffWozniak @ireneista I view code as art so I find strongly canonicalizing formatters like black to be actively destructive. right now I use Ruff with a 300-line configuration for some of the Python code and I think there's gotta be a better way to approach this that isn't destructive

Show thread

Irenes (many)10h ago

@whitequark @GeoffWozniak that's our view as well

Show thread

✧✦Catherine✦✧10h ago

@ireneista @GeoffWozniak based on a discussion with someone who has worked on this problem before we want to try building a diffusion model that captures the whitespace between code tokens and is then able to inject it into a given parsetree, which appears to be a fairly efficient and unproblematic way to do this

Show thread

✧✦Catherine✦✧10h ago

@ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken

Show thread

Geoff Wozniak

@whitequark @ireneista This sounds a lot like XSLT (or XSLT-adjacent).