@emilymbender
The framing is that there is a Commons of human knowledge that is accessible to virtually everyone (based on how internet protocols operate currently). There's a sort of social contract within these Commons which makes participants feel ok to publish permissively, be ok with bearing the costs of compute so that everyone can access the Commons freely.
To 'cooperate' is to abide by this unwritten social contract, respecting /robots.txt, respecting CC.
To 'defect' is to take advantage of this permissiveness to massively scrape the Commons, ignoring the social contract, offloading the costs to the providers in the Commons.
The current SOTA architecture is transformer-based, which requires massive data for training effectively. By cooperating and not engaging in training a 'free as in freedom' LLM, we're 1) losing the benefits of the Commons (as they either get sloppified or people take more information private because of increased compute costs), and 2) we also don't get to build an artifact based on the knowledge of the Commons that can be contributed to the Commons (an open LLM).
If the DeepSeek moment managed to wipe $600bn off Nvidia's market cap, commoditising LLM training would be the death knell of the AI slop hype race, as who would pay thousands in tokens to OAI and Anthropic when you can use a GPL-licensed LLM (or whatever permissive license we come up with)