Mastodawn

Gabriele Svelto

I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?

Gabriele Svelto Mar 9

I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.

Gabriele Svelto Mar 9

And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.

@gabrielesvelto it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one

Gabriele Svelto Mar 9

@a how so? Now you don't need a person to run that particular exploit for years, you can just poison an LLM so that whenever someone generates a sufficiently large sequence of commits the exploit can be injected in them directly. No user intervention and it can be done at scale. And it can be done in closed-source codebases too, it's just a matter of someone using a bot on them.

@gabrielesvelto you didn't need an LLM for xz, that is how

Ruchira S. Datta Mar 9

@gabrielesvelto @a You are correct, LLMs have made this exploit many times easier to execute.

@a @gabrielesvelto no it's actually an extremely well-made point. if we were (almost) unable to detect something like that in a FOSS project (not in the code, in a debug object mind you) then where do we get off introducing the black box which increases complexity a thousand times and claim we can still quality-control the final product. not to mention it took someone years to gain influence within the project vs a model that just scrapes public code indiscriminately

@silhouette @gabrielesvelto who said this already hadn't happened before the advent of LLMs? you detected ONE, you don't know how many you haven't

@a @gabrielesvelto I don't follow, are you agreeing with me or... what?

@silhouette @gabrielesvelto I'm not, I'm saying that the xz is a bad example for several reasons, including the fact that (and this was my last point) it is one known case among an unknown number of total cases

@a @gabrielesvelto I still don't follow your line of argument here. You are saying that there are currently an unknown number of potential vulnerabilities in human-generated FOSS code, so we should... hook it up to the complexity generator?

@silhouette @gabrielesvelto The argument sounds more like "I know a guy who almost died for peanut allergy, so we should prohibit the peanut production". Yes it is possible. It was also possible in the past. My point is that the use of LLMs doesn't change much the landscape in that regard.

@gabrielesvelto @silhouette of course, you can do whatever you want, I just think if you are going to criticize the use of LLMs there are better arguments that are less convoluted. 🤷‍♂️

Neo Ehproque Mar 10

@a @gabrielesvelto @silhouette "people die from peanut allergy so maybe it isn't such a great idea to introduce machines that have a 0.1% probability of introducing a peanut in every single item in the supermarket" is a pretty good point

(eval 'Toast)Mar 9

@silhouette @a @gabrielesvelto most people (by volume AND mass) using LLMs are doing so because they do not have the skills necessary to produce the code in question (they "have the skill to read it" but if you've ever tried reimplementing a compsci research paper without just copying their code as-is you know instinctively that's not the same thing), which means that they are unlikely to tell well-crafted malicious code from legitimate code, knowing that both achieve their results
this is implying they even do review it at all rather than simply relegate this to an agent that only checks if it matches the acceptance criteria (just like a real product manager!), which obviously immediately fails

Cliff'sEsportCorner Mar 9

@gabrielesvelto that incident example of Metamorphic Malware?

josh buermann Mar 9

@gabrielesvelto

Any blogger can poison the LLMs.

https://www.bbc.com/future/article/20260218-i-hacked-chatgpt-and-googles-ai-and-it-only-took-20-minutes

I hacked ChatGPT and Google's AI - and it only took 20 minutes

I found a way to make AI tell you lies – and I'm not the only one.

BBC

@gabrielesvelto after using a few of the LLMs to generate #powerShell code, i don't trust any of them.

Future Sprog Mar 10

Even many coding tutorials contain security flaws. Those all go into the plagiarism machine.

@gabrielesvelto

Óscar Morales Vivó Mar 10

@gabrielesvelto LLMs the average internet response to a query, which includes coding ones.

And paraphrasing Carlin: realize how bad average code is, and realize that half the code is worse than that 😅

Daniel Demmel Mar 9

@gabrielesvelto to be fair I'd much more trust Claude to write the sed regexes than myself... but it's inexcusable to brute force that kind of string replacement work directly with an LLM!

Maki 🔻 🌹Mar 9

@dain @gabrielesvelto https://regex-generator.olafneumann.org

Regex Generator - Creating regex is easy again!

A tool to generate simple regular expressions from sample text. Enable less experienced developers to create regex smoothly.

Daniel Demmel Mar 9

@RandamuMaki @gabrielesvelto oh, that expression builder in the second step is really nice! wish it would then do match testing on more lines in further steps like how regex101 does

Csepp 🌢Mar 9

@gabrielesvelto Or when sed fails you can often write a quick script in Python (or your language of choice).
For real tho I would love to have a dependable refactoring tool that understands syntax, probably something based on Tree Sitter, but I haven't been able to get any working.

Gabriele Svelto Mar 9

@csepp several fancy IDEs already have extremely sophisticate refactoring tools that understand the language syntax, e.g.: https://www.jetbrains.com/help/idea/refactoring-source-code.html

Code refactoring | IntelliJ IDEA

IntelliJ IDEA Help

Csepp 🌢Mar 9

@gabrielesvelto Yup, those are also pretty great.
Personally, I needed to refactor some C++ code that didn't fit any simple regex, so I ended up writing a Lua script to do it and did the rest of it by hand.
The only way I could find to reliably automate it would have been to write a custom clang-tidy pass, which didn't seem worth the effort.
I still wouldn't use an LLM for it, but I do wish there was an easier way to load the code model in a scripting language. To automate the refactor I did I would have needed to track arguments that are passed through variables or that come from function parameters, access non-C++ files (move strings to YAML), rewrite various forms of string concatenation to format strings, etc.

@csepp @gabrielesvelto Doesn't look like lua really has a good binding to libclang but if you used Python you could use the same libraries that clang-format/tidy do. They're using the actual llvm parser and give you an API to manipulate the AST.

Csepp 🌢Mar 9

@crazyeddie @gabrielesvelto I'll look into this, I couldn't find many up to date refactoring examples, but looking at the docs it should be possible to get something going. I think I've come across it when I was researching tools for my refactor but the lack of examples turned me off, since I had no idea how much work I'd have to put into it.

@gabrielesvelto @csepp I bet if you look at the C++ part of the tools there's not many refactors they can do :p

@csepp @gabrielesvelto tbf, in all likelyhood it wouldn't be `sed` that fails. it would be the inputs to `sed` that failed - garbage in, garbage out.

Cogito ergo mecagoendios Mar 9

@gabrielesvelto This parallels the discourse that says "LLMs are useful to automatically send template emails dozens of times a day". My Brother In Ohm, email templates have existed for decades at a fraction of the cost of a single token. You are just Dunning-Krügering yourself into thinking that the idea has just been solved today because you had never paid any attention to this problem until the day you wanted to search for an use case for a toy that you want to have an excuse for buying.

Aurimas Černius Mar 9

@gabrielesvelto I've been presented a case where changes were quite trivial across many repos, but making those changes still required taking context into account. LLM was helpful.
But...
that same presentation showed logs of tool admitting of doing forced push when it was from the start specifically instructed not to do forced pushes.

Feels like we need sandboxed dev environments where these tools could not do dangerous things, as they themselves are bad at this.

Gabriele Svelto Mar 9

@aurisc4 I've done a mix of grep and send a lot of times to add context. If more sophisticate refactoring is needed there are tools that understand the syntax of practically any language in existence and can be used for direct manipulation of the ASTs. Every problem where the input is machine-readable can be solved in a faster, cheaper and more reliable way using tools that process the data directly rather than passing through a (very large) neural network.

@gabrielesvelto fun fact: the speed of sed comes from the fact that it leverages FSM-based matching under the hood. So, I say: FSM FTW!

@gabrielesvelto I don't think the comparison is entirely fair tho. Both sed and syntax tree based editing are really powerful (and I use both when it makes sense), but if you need to do a one off migration you might be spending hours trying to figure out how to make it work right, while an llm will do a good enough job where you need to review the changes and fix a few mistakes, usually at the first try, without you having to actively spend time on it.

Gabriele Svelto Mar 9

@fourlastor what about the time spent setting up the LLM, sandboxing it and then reviewing all the changes? What about the risk of the code containing prompt-injections that might be designed to introduce vulnerabilities or simply take over your machine or credentials for a state-sponsored attacker to use? What about the reliance on a single closed-source paid-for commercial tool? Those are a lot of disadvantages to make up for.

@gabrielesvelto answering in order:

>what about the time spent setting up the LLM, sandboxing it and then reviewing all the changes?

This for what I'm working on is usually between 30 and 40 minutes, start to end (minus the time that the LLM takes to do its own work in its own git subtree, while I do other stuff). For context, claude doesn't commit, I review the changes locally (git is blacklisted). In my case this is been pretty stable on 100-150 tasks where I did the same kind of migration

@gabrielesvelto prompt-injections

The project is closed source, and we don't have places where we randomly include text files, if someone IN THE COMPANY manages to introduce malicious code, imho they'd just infect gradle instead of hoping on someone running an LLM to trigger something (other than devs having access to only what they need). State sponsored hackers specifically are really not in my list of things I can defend from, be it from LLMs or whatever introduced attacks

@gabrielesvelto What about the reliance on a single closed-source paid-for commercial tool

On this I 100% agree, you shouldn't RELY on it. I am confident that I can make the same changes myself (in some cases I did because it was clearly less time consuming than making an LLM do that), if tomorrow these tools disappear I am sure I will be comfortable working without them (as I do for example for my OSS/hobby work, where I can't really justify paying for the subscription)

@gabrielesvelto a counter example: one migration I needed to make was to migrate java serializable to parcelable. That was a GREAT candidate to be worked on by modifying the syntax tree. I created a small throw away plugin in intellij which did the work, removed the extension, added the annotation and ran on thousands of files in a few seconds.Imho trying to find the most appropiate tool for the task at hand is important, and having an all-or-nothing mentality (on either side) isn't constructive

Gabriele Svelto Mar 9

@fourlastor you don't need to do anything special to be a target of state-sponsored actors if your rely on an LLM for your coding tasks. State-sponsored actors have almost certainly poisoned the training data of major commercial LLMs, you don't need to add anything yourself. Remember, these things are trained on anything that's dredged from the internet. *Anything*. Do you really trust what happens within the model? Remember the xz compromise? It can now be done automatically *at scale*.

@gabrielesvelto and ok, but what is the *actual* scenario you're imagining? because my coding tasks go as such when I use LLMs:
1. I have 10-15 classes that need to change the way we do X from Y to Z
2. I prompt the LLM, telling it "change A,B,C so that they use Z instead of Y"
3. I review the code, fixing mistakes as I see them
1/x because post length limits

@gabrielesvelto
The code change is frankly pretty simple, we're talking of stuff on the level of "migrate Book so instead of using function calls, uses annotations for ABC, update the call sites", we're not talking about "change this complex piece of code so that it does complex ABC in another complex XYZ way". The realm of errors is "I know that Foo doesn't work well by itself and needs extra care"

@gabrielesvelto anything that goes over the bar of "this is stupid but boring" goes into the "I'll do it by hand because if anything I need to learn how it works before touching it"

@fourlastor @gabrielesvelto It's not a use sed or use LLM scenario here.

Sed isn't a refactoring tool. There are plenty of actual refactoring tools that don't use LLMs. I was using them before LLMs were invented and no, fucking sed isn't the same thing. I'm rather hoping that wasn't actually a serious comparison :p

Mechanical refactors are deterministic algorithms. If the conversation is about sticking AI in that it's probably nonsense and you can leave without fearing you'll miss anything

Marius Gundersen - mdg 🌻Mar 9

@gabrielesvelto the developers of TypeScript have decided not to implement refactoring tools because the refactoring can be done by LLMs...

d@nny disc@ mc² Mar 9

@gabrielesvelto i spent the last week using sed to produce an entire module system for a prototype. lovely piece of software that expands the meaning of structured data. not at all perfect but if we're comparing it to statistical approaches it at least has the benefit of determinism

@gabrielesvelto For fun I tried writing rust code with claude code. The code took an age to compile when it worked (do we call it build?). The project took months and so the code got large & was slow to build. Claude was able to refactor it (after it worked) to build 10 times faster. That is not mechanical as you mention... but was really challenging. Mechanical refactors it does 100 times better still of course, because it seds too yes, but it can check the new syntax & test build each change.

Gabriele Svelto Mar 9

@adingbatponder why did the project take so long to build?

@gabrielesvelto Well that is what rust seems to be like. I used a lot of packages incl. browser and screen grabbing tools which took ages to build. Like 20 mins. (It was inside a nixos flake though.)

Gabriele Svelto Mar 9

@adingbatponder yes, but why? Which packages where taking so long? Firefox has almost 4 millions of lines of Rust and it takes only a few minutes to build them.

@gabrielesvelto No clue. At the time it was chrome that pushed it into silly territory. But this was inside a flake. All I know was when it was refactored it was able to use 32 processors instead of only 2.

@gabrielesvelto not really, it is not on my computer.

Dr. Christopher Kunz Mar 9

@gabrielesvelto It's also Turing complete.

🏳️‍⚧️PepperTheVixen🦯Mar 9

@gabrielesvelto "Yeah but Sed is old and shitty and you gotta get with the times" -some techbro somewhere

🏳️‍⚧️PepperTheVixen🦯Mar 9

@gabrielesvelto NGL when I read "mechanical refactoring", I first imagined a bunch of robot arms on an Aperture-esque assembly line rearranging letters on printing press-style blocks

@gabrielesvelto "people are using this inadequate and problematic tool for a job, so let me suggest they use this different completely inadequate tool instead."
Speaking of unfortunate painful experience, using grep and sed at scale for mechanical refactoring very much randomly introduces mistakes into a codebase. I beg developers to use *at least* syntax-aware tools for mechanical refactoring jobs

JWcph, Radicalized By Decency Mar 9

@gabrielesvelto Just the other day I saw a goddamn professor claiming that we need to teach chatbots to reason in order for them to do math. As if we haven't had calculators that actually work every time for like 450 years. It's insane.