i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

the "ideal" (their choice of words) case is 64.2%

edit: this got popular without me really intending to, so here's why i'm reading research: i want a semantic style transfer tool that can automatically format a patch "the same as the rest of the file / rest of codebase is formatted" without the rigidity involved in black or rustfmt that i find so hostile to my workflow that i refuse to use them. obviously, i want a tool that generates semantically equivalent code 100.0% of time (ignoring source locations or reading from __file__)

this isn't satire, this is real research published by IEEE/ACM

@whitequark So let me get this straight, IEEE thinks you should count it as a win if rewriting your code by vibing it has less than 15% better odds than a literal coinflip of reproducibility?

edited for clarity and to fix a typo

@whitequark @danlyke so … by "reformatted" I assume you mean aesthetically tidied up, with no change in functionality required?

If I got that right: wtf?

@deborahh @danlyke this is what a reasonable person would understand to be "code style", yes
@whitequark @deborahh @danlyke ie, the sort of thing a linter does?
@whitequark is this satire? this sounds crazy
@whitequark i was more so asking if the paper was satire, but i guess looking at it answers my questions as well
@whitequark i can imagine a few cases where reformatting a code could change behavior (mostly related to language constructs that capture source locations) so I think I would be willing to accept as low as 99.99%
@porglezomp you'll love Fig. 6
@porglezomp there's explanatory text that says the issue with the identifier "found" is that it's rarely used
@whitequark I really love “not changing endl to \n” listed as a style issue when that changes buffer flushing behavior.
@porglezomp did you notice that one of the If tokens is capitalized in the output
@whitequark @porglezomp also cout << “\n”; is not equivalent to << endl; endl flushes the stream. “\n”; does not. They’d need to write << “\n” << flush; to get the same behavior. Which is annoying to write, which is precisely why endl exists!

@jonathankoren @whitequark @porglezomp I just thought I’d take a moment to say that I thought that i was barely coping with things today, but thanks to figure 6 I clearly am not.

I’m not even going to ask about a use case. Taa for now I’m off to disintegrate.

@whitequark @porglezomp I'm spitting out my drink at j++ ­→ j--. Holy shit.
@xgranade
I think the right is the output from running the model on the right code (center being the "desired output"). So it's not changing the semantics of the loop, just not not changing the loop order to match their desired outcome.

Given that loop order can have behavioral impact (and I would never trust an LLM to be able to tell if it did), that seems like the correct behavior to me though
@whitequark @porglezomp
@xgranade @whitequark @porglezomp
I think reversing the `j` for loop is actually wanted by them? It's labelled "ground truth", and it is a potential valid optimisation
@whitequark @porglezomp This looks like it could join the current crop of "DLSS5 off/DLSS5 on" memes.
@whitequark Huh. Were they actually trying to make it work, or trying to show that it's a bad idea to try to use ML for that task?

@whitequarksurely it just means differing output structure to accommodate the formatting, right?”

No, it just produces code that won’t compile. In a refactoring tool.

(Haha, didn’t see the post a minute before mine with the exact same snip)

@whitequark you know where there's a ready source of additional words? you surely will not regret sourcing additional words.
@whitequark not a paper *deliberately* about genetic algorithms, then?
@whitequark "Code style generally does not interfere
with the code semantics and executability"; but we present novel methods for it to do so!
@whitequark compare and contrast the Extreme Programming philosophy, in which a code change doesn't count as "refactoring" unless all observable behavior is identical

@ireneista TIL that my philosophy is the same as the Extreme Programming philosophy

@whitequark

@krans @whitequark it was a nice name for a movement, it did a good job of conveying that the goal was radical change

at the time, from what we can tell, none of the people saw it as a labor movement specifically, which is too bad... that might have prevented it from being watered down by successive cycles of consulting and renaming

@ireneista i like how it starts with this (left) and ends with "here is a variable we think would be good here. Do you like this" (right)
@ireneista starting with "gotofail bad" and ending with making the problem significantly worse, apparently without ever reflecting on this
@whitequark because "the thing we're promoting is incredibly dangerous, and not in fun ways" is not really the thing anyone wants to be cited for
@ireneista @whitequark Now, show me the numbers on the effort to make a rule-based style file compared to this. Because I'm sure that A_c is 100.0 in that case.
@GeoffWozniak @ireneista so the problem i'm solving is that while for C++, you have tools like clang-format which are nice and flexible, for Rust you have rustfmt which is rigid and makes your code look like ass. I do not like my code looking like ass but I am also receptive to the idea that introducing as many knobs as clang-format has into rustfmt would make it unmaintainable

@whitequark @ireneista I have not had to deal with rustfmt yet. For clang-format, I work in existing projects and use (very) mildly tweaked variants of the base style for the project.

At the risk of instigating the canonical bikeshed discussion, I am a conformist formatter and have not concerned myself with modifying style all that much. But I agree that clang-format has some bizarre knobs to tweak.

@GeoffWozniak @ireneista I view code as art so I find strongly canonicalizing formatters like black to be actively destructive. right now I use Ruff with a 300-line configuration for some of the Python code and I think there's gotta be a better way to approach this that isn't destructive
@whitequark @GeoffWozniak that's our view as well
@ireneista @GeoffWozniak based on a discussion with someone who has worked on this problem before we want to try building a diffusion model that captures the whitespace between code tokens and is then able to inject it into a given parsetree, which appears to be a fairly efficient and unproblematic way to do this
@ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken
@whitequark @GeoffWozniak yeah this is a recurring research topic for us, we've talked with several of our friends about it over the years. just making a parser/generator that properly round-trip whitespace and comments is already a ton of work, alas...
@ireneista @GeoffWozniak there's tree-sitter nowadays which I believe should do that (and I think it should be failure-tolerant considering its fairly wide use in editors: nvim, zed, etc)
@ireneista @GeoffWozniak my literal first Python project was making a Python parser that fully captures source spans (which wasn't upstream at the time--in 2014 or so), so i'm quite familiar with the topic by now :p
@whitequark @ireneista This sounds a lot like XSLT (or XSLT-adjacent).

@whitequark @ireneista I very much respect that.

I view code like writing and I will tweak structure and form for far too long sometimes. Layout ends up getting less of my attention.

@GeoffWozniak @ireneista I see layout as part of the form, I guess? I write source code files in much the same way as one would write chapters in a book: somewhat self-contained, and intended to make sense when read top-to-bottom linearly and with roughly one full-displayful of contex. so if rustfmt decides to blow up a function call into 20 lines out of nowhere it very much messes with that, for example

@whitequark @ireneista Well, I do have limits.

In my case I spend my time in Binutils and GCC. Do I love the GNU style? No. But does consistency help? Yes. So I demur. But I will restructure things so the single line curly braces don't take over.

@GeoffWozniak @ireneista the awful code style is probably #2 in the list of top 5 reasons I contribute to LLVM instead of GNU tools. I should use it as a testcase for the tool I'm working on, actually
@GeoffWozniak @ireneista awful memories of chasing down a bug in or1k binutils where .got section got somehow slightly unaligned from _GLOBAL_OFFSET_TABLE_. I never figured it out; I have since quit the company and I will mercifully never have to think about or1k again

@whitequark @ireneista I was in this wonderousness today, used in one of those functions that is a few hundred lines long with nested case statements and no attempt at functional abstraction.

So perhaps I have lost any hope of making art.

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elf-bfd.h;h=3d2fad49aa4c4e53dbb90467e8a9d5130a9d3dcb;hb=HEAD#l3426

@GeoffWozniak @ireneista yeah I have regretfully seen libbfd
@whitequark @ireneista Sorry, I probably should have put a CW on that.

@whitequark @ireneista I've grown used to it. That may say something bad about me, but it keeps me employed.

However, I never use it as a style in anything else, though.

@GeoffWozniak @ireneista yeah I mean I've submitted binutils patches while I was employed there, and for all the dislike I have for that code style it was so far down the list of bad things about that job that it didn't even register
@whitequark And this is how research money is lit on fire, I guess. Why else conduct research into ML for a task that has had obvious, deterministic, efficient and well-tested solutions for decades?
@lu_leipzig I actually really don't like formatters like black or rustfmt which is why I'm collaborating on research into doing it with ML, but there are ways to do it that never produce a different AST