As far as I understand #gnu #gpl #agpl and other common #copyLeft licenses don’t prevent openai and friends from using licensed content for training. The reason is the model itself isn’t a derivative. The training data is used transformatively in an abstract format.
This feels unethical. An LLM cannot provide value without training data. It is especially bad because at least openai specifically claims to be ethical in its sourcing of data.