@luis_in_brief @glyph It's not 'just' open-source code, though, right?
The models "work" because content has been plundered from anything and everything in between.
On the software side, I can see discussions trying to dissect the meaning of "open" and whether LLMs are undermining or propagating the very concept... I think there may be good points on either side of that argument (maybe)...
But without all the other stuff that has clearly been plundered from authors, writers, reporters, screenwriters, and so on... these LLM bots would be unable to simulate communication with us in any coherent way.
I've seen arguments stating that putting something on the open web means that it is there for the taking... for example, by virtue of blogging, I am consenting to the scrapers and would be models of tomorrow pillaging my words as they please...
I find this terribly misaligned. It's like saying that by virtue of going outside, I give permission for anyone to take pictures of me and profit off of them in any way that they please.
To me, non-consensual scraping is a blatant and vulgar disregard for me as a person.