I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?
I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.
And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.
@gabrielesvelto it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one
@a @gabrielesvelto no it's actually an extremely well-made point. if we were (almost) unable to detect something like that in a FOSS project (not in the code, in a debug object mind you) then where do we get off introducing the black box which increases complexity a thousand times and claim we can still quality-control the final product. not to mention it took someone years to gain influence within the project vs a model that just scrapes public code indiscriminately
@silhouette @gabrielesvelto who said this already hadn't happened before the advent of LLMs? you detected ONE, you don't know how many you haven't
@a @gabrielesvelto I don't follow, are you agreeing with me or... what?
@silhouette @gabrielesvelto I'm not, I'm saying that the xz is a bad example for several reasons, including the fact that (and this was my last point) it is one known case among an unknown number of total cases
@a @gabrielesvelto I still don't follow your line of argument here. You are saying that there are currently an unknown number of potential vulnerabilities in human-generated FOSS code, so we should... hook it up to the complexity generator?
@silhouette @gabrielesvelto The argument sounds more like "I know a guy who almost died for peanut allergy, so we should prohibit the peanut production". Yes it is possible. It was also possible in the past. My point is that the use of LLMs doesn't change much the landscape in that regard.