I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?
I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.
And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.
@gabrielesvelto it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one
@a @gabrielesvelto no it's actually an extremely well-made point. if we were (almost) unable to detect something like that in a FOSS project (not in the code, in a debug object mind you) then where do we get off introducing the black box which increases complexity a thousand times and claim we can still quality-control the final product. not to mention it took someone years to gain influence within the project vs a model that just scrapes public code indiscriminately