Mastodawn

I don't like that people use security as an angle when criticizing the use of AI in KeePassXC. If a project accepts public contributions, this means there will be malicious actors trying to smuggle in code which weakens security. The project must therefore have a solid review process in place to ensure this doesn't happen.

If you see AI as this huge security threat, then you don't trust this review process. But then you shouldn't have trusted the software at any time before to begin with.

Show thread

Anthony

@[email protected] @[email protected] I have been critical of the KeePassXC team's decision to accept LLM-assisted submissions for awhile now, and after long consideration I opted to move away from using it, security being one of my primary concerns. My view is that you're setting up a bit of a strawman in this post, and so I thought I'd elaborate more on my rationale in case it helps anyone who's weighing this decision too. The tl;dr is that code review should be your last line of defense, not your primary one, and that LLM use threatens to erode existing lines of defense while introducing new categories of risk. This is the opposite of what you should be doing when developing security-oriented software.

Here's a lot more words:

To my way of thinking the question isn't whether any given pull request is problematic. One of the KeePassXC maintainers who I've interacted with seemed to suggest this as well, that human beings sometimes submit poor quality pull requests too so what's the difference, especially if the review process catches them? The important question, and the difference, lies with the culture of the development team and process. Experience shows that security-oriented software in particular benefits greatly from a team of people dedicated to both transparency and the relentless pursuit of excellent implementations. My belief is that the use of LLMs in coding threatens both of those aspects, degrading them over time. Transparency, because no one can know exactly what an LLM is going to produce and why, and an LLM cannot tell you anything about its output; excellent implementations because (a) come on, have you ever looked at LLM output, especially for larger chunks of code; and (b) the only way we've ever found to produce excellent implementations of anything is by developing a well-functioning team of people and setting them loose on it.

Peter Naur famously argued that programming is theory building, and theories draw their power from their existence in the heads of the people who construct them. I am convinced by his argument by the simple fact that over the course of my career I've worked with large codebases written by other people, and have experienced firsthand that the only way to really understand the code is by talking to other people who understand the code. No one can look at a large codebase and understand how it works, not even with the best documentation in the world--not in a reasonable amount of time, anyway. Anyone who believes this hasn't picked up a non-trivial APL program and tried to figure out what it does. Anyone who believes this is mistaken about the practice of software development and engineering, and probably also believes in the myth of the 10x engineer or that women can't code as well as men, too.

LLMs are not people. They do not understand code. They cannot describe their thought process to you. They cannot point you to the most important functions, procedures, methods, or objects. They cannot give you hints about pitfalls you might fall into while working with their code. Any understanding like this that arises about LLM-generated code arises because human beings developed that understanding of the code and then communicated it.

LLMs are trained on masses of mediocre code. Their output has been found to include significantly more bugs and security issues than the average human-written code. Their use has been observed to result simultaneously in reduced productivity and a belief that productivity was increased, suggesting they might induce other blindspots in one's self-awareness too. Their use has been observed to result in de-skilling: people become less able to do things they used to be able to do without leaning on the tool. Given all that, I do not believe for a moment that an LLM can produce an excellent implementation, nor foster a culture in which excellent implementations arise; and I believe that any excellent implementation produced by a person using an LLM is a result of the person compensating for the weaknesses and traps of LLM use, all while it potentially degrades their future ability to produce excellent implementations and fools them into believing they wrote better code faster when they did not.

A good review process does not compensate for any of the issues I raise here. More importantly, actual security is about layers of protection. The code review is one of the last layers of protection. There should be many, many others, which to me includes a culture that does not succumb to the temptation to put a stochastic black box deskilling machine into the software development process. You wouldn't build a fortress with an open road leading into the center just because you had guards you could post on the road (it lets us get in and out faster, that portcullis is so slow!). You'd have the guards, and you'd have several layers of thick walls, and you'd have a moat, and you'd have archers, and... You certainly wouldn't voluntarily pull a giant wooden horse that could contain anything into your fortress!

I suspect that a project adopting more and more LLM-assisted submissions will not obviously suffer in the near term, but over the medium to long term is likely to develop issues, originating in one or more of my above observations, that eventually lead to problems in the software. As I said to someone about KeePassXC, I am not inclined to hitch my wagon to that train. Not when it comes to a piece of software like a password manager.

And that's not even opening up the moral and ethical issues of LLMs, which are substantial. Not to mention the dangers of becoming dependent on a technology and tools that might go away or become significantly more expensive when the asset bubble currently necessary for their continued existence finally deflates.

Other people might come to a different place, but for me this is more than enough reason to switch password managers.

Show thread

Volpeon

Nov 9

@abucci Thanks for your reply! You're making good points which I overall agree with. I've had rather subpar experiences with LLM-generated code at work myself, so it's not like I don't see the downsides and how it leads to the erosion of skill. It's true that this also has implications on the security.

However, from what I've seen, I think the way GitHub integrates Copilot into the process makes it less likely to cause the same degradation as an AI assistant directly integrated into an editor. As I said elsewhere, GitHub presents Copilot as a PR author and your usage of it is akin to iterating a PR with a human author until it meets the project's standards.
If regular PRs don't pose a risk to one's skills, then I don't see why this would. It incentivizes the thinking that the AI must be held to the same standards as any other PR author, that it isn't inherently above them. I think this is a good way to handle it.
I'm happy to be corrected if my understanding of Copilot or the way the devs use it is wrong. You're clearly more involved in this topic than I am.

Apart from that, I do wonder how realistic it is to expect projects to reject LLMs contributions forever. No matter what you and I want, the global trend moves towards increasing adoption of AI and this means external contributions will become more and more "tainted", with and without their knowledge. Given this outlook, I think it's better to be open for AI contributions. This allows the developers to become familiar with the strengths and weaknesses of AI, and it creates an environment where contributors are willing to disclose their use of it so that reviews can be conducted with appropriate care. An environment where AI is banned will only lead to people trying to deceive the developers and causing necessary trouble.

@ngaylinn

Show thread

Anthony Nov 10

@[email protected]
how realistic
I don't mean to come off as snarky or dismissive at all, but this phrase is so often used to bludgeon good ideas to death. Everyone agrees we should do X, someone pipes up "yes but is that realistic?" and poof, X is off the table. Somewhere in one of those CIA instruction manuals for how to disrupt organizations is the advice to do exactly this (don't have the patience to dig it out).

It's unrealistic to expect people to not murder one another, yet we insist that they do not do that.

Incidentally, have you seen this? https://blobfox.coffee/@Ember/115522745321119751

For some reason that URL is not loading for me right now, but it's a person pointing to a recent KeePassXC pull request that was not reviewed by another person, something the KeePassXC blog post and maintainers insist shouldn't ever happen. The PR is "reviewed" by "Copilot". This is why you ban AI: the slide from "experimenting with AI" to "flat out lying about it" is fast, in my experience. It is absolutely not better to tolerate this kind of thing in general; it is intolerable in security and security-adjacent software.

@[email protected]

Ember :catplant: (@[email protected])

@[email protected] > If the changes were proposed by a maintainer, another maintainer will do the review. This policy is strictly followed, even for small changes. wow, so they're straight up lying in a very easily proven way ([this PR](https://github.com/keepassxreboot/keepassxc/pull/12588)) (or do they think it doesn't count because "the LLM proposed it, not the prompter of the LLM"??) also that pr for "we only use slop for bug fixes or UI" *password history* being correctly imported is a pretty important thing i would think

Blobfox.coffee

Show thread

Nate Gaylinn Nov 9

@abucci @volpeon I agree with everything you say. I'm not well versed in this case, and my opinions are more nuanced than the original post, so I'll clarify what I meant by boosting it.

LLMs produce low quality code that nobody has read or understood. This a serious problem, and security is just one of many risks. Better to say "this is sensitive code, I don't trust an LLM to touch it" than to claim AI-generated code is insecure.

The risk of bad / insecure code is not new. It's a key challenge for open source. There are many development practices like code reviews to manage this risk. LLMs make the problem worse, but it's not new. Either we trust the practices, or we must update them, but we should think in terms of how we handle the bad code as it comes.

I'm against AI in software, but banning doesn't prevent it, especially in open source. People will use it. So, the question is, how do we keep software safe and reliable in the face of this? Saying "AI is bad because it's insecure" is too simplistic.

Show thread

Anthony Nov 10

@[email protected] @[email protected]
I'm against AI in software, but banning doesn't prevent it, especially in open source. People will use it. So, the question is, how do we keep software safe and reliable in the face of this?
I differ with you here. Banning will obviously not stop people from trying to submit pull requests with AI-generated code in them. That is a strawman I have not suggested, and I don't understand why people keep bringing it up as if it's relevant. What banning will do is set a clear tone for a project, and unequivocally identify (some) people who do not respect the project's values (if "no AI" is one). It's exactly like putting things like "no bigotry" in a code of conduct. You don't say "no bigotry" because you think doing so will make people stop saying bigoted things (at least I hope folks aren't that naive).

I said in some other thread that (most) software developers take finding bugs seriously, and work hard to eliminate them. This is a cultural practice. I'd say most bugs, at least in mature-ish projects, are not software-breaking, but are subtle behavioral issues. Many of those could be left in without breaking the core functionality of the software. But developers fix them anyway. Why? Because fixing bugs is part of the culture of software development. Not using AI/LLMs could also be made part of the culture, and treated with equal seriousness. But values like that do not emerge spontaneously; people have to advocate for them and practice them. Banning AI use is one way to advocate for this value.

Show thread

Nate Gaylinn Nov 10

@abucci @volpeon Agreed that "no AI" is a totally valid thing to put in your policy, with many benefits even if it doesn't prevent LLM contributions.

However, this re-framing is relevant. I don't mean to make a straw-person of your argument and then shoot it down. I mean to say: we should attend to engineering practices that mitigate the inevitable harm of thoughtless, mass-produced PRs, because we'll need it regardless of policy.

What I liked about the original post was shifting attention from "LLMs insecure" to "how were we mitigating this problem before using software development practices, and how can that inform what we do next?"

Absolutely no arguments against your emphasis on healthy development culture. I think that's the same point, actually! And it's valid to say that a no-AI policy is part of having a healthy dev team, in this moment.

Show thread

Anthony Nov 10

@[email protected] You emphasize a good point. At the level of managing a large inflow of pull requests I can understand the appeal of automating some of the work. For some projects it's definitely a problem to solve.
@[email protected]

Ember :catplant:​ (@[email protected])

Ember :catplant: (@[email protected])