I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?
I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.
And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.
@gabrielesvelto it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one
@a how so? Now you don't need a person to run that particular exploit for years, you can just poison an LLM so that whenever someone generates a sufficiently large sequence of commits the exploit can be injected in them directly. No user intervention and it can be done at scale. And it can be done in closed-source codebases too, it's just a matter of someone using a bot on them.
@gabrielesvelto you didn't need an LLM for xz, that is how
@gabrielesvelto @a You are correct, LLMs have made this exploit many times easier to execute.