@slomo @pwithnall @ebassi @hbons I would like to add a few things to what has been said.
1) There are some strong assumptions being made about how good LLMs are at replicating the functionality of a project's codebase. Having used LLMs a lot, my model of how they work is that they operate on code as if it's an n-dimensional parameter space, and they tweak parameters till the feedpack loop is satisfied (such as tests pass)[α]. The resulting code is weird and unmaintainable[β].
2) ... unless the user prompting the LLM is an expert in the domain or is a maintainer of the project, and at that point it's really the user projecting their intent by being heavily involved in the creation of the new codebase. This is what happened to chardet.
3) in the case of chardet, I find the case indistinguishable from the maintainer of a codebase rewriting it from scratch and changing the license. It reminded me of how Wim wrote large parts of GStreamer (LGPL), and then went and wrote Pipewire which is very similar to GStreamer and could potentially replace it, and it's MIT licensed.
4) this goes back to the best explanation I have for LLMs: they are amplifiers for the author's intent *and* their capabilities.
5) IMO instead of worrying about what random people can do with LLMs (they cannot achieve much), we should reflect on why *authors* seem to care so little about copyleft nowadays. Open-source "won" but copyleft is in decline and nearing irrelevance. And that has nothing to do with LLMs.
(5/6)
α. with the caveat that the parameterisation is likely to take known-good patterns/paths-on-a-parameter-curve learned from the training data.
β. you only start seeing this once the codebase becomes large enough, say >2000 LoC, and it's easy for inexperienced devs to miss it. LLMs are not a turn-key solution, and cannot be used to make such a solution.