I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?
I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.
And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.
@gabrielesvelto it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one
@a @gabrielesvelto no it's actually an extremely well-made point. if we were (almost) unable to detect something like that in a FOSS project (not in the code, in a debug object mind you) then where do we get off introducing the black box which increases complexity a thousand times and claim we can still quality-control the final product. not to mention it took someone years to gain influence within the project vs a model that just scrapes public code indiscriminately
@silhouette @a @gabrielesvelto most people (by volume AND mass) using LLMs are doing so because they do not have the skills necessary to produce the code in question (they "have the skill to read it" but if you've ever tried reimplementing a compsci research paper without just copying their code as-is you know instinctively that's not the same thing), which means that they are unlikely to tell well-crafted malicious code from legitimate code, knowing that both achieve their results
this is implying they even do review it at all rather than simply relegate this to an agent that only checks if it matches the acceptance criteria (just like a real product manager!), which obviously immediately fails