Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?
@yabellini SciHub makes papers public that are behind paywalls. I agree, that they shouldn't be behind paywalls, but it's completely different to OpenAI.
I think they used mostly sources that are public anyway, like Wikipedia, etc. They also didn't publish them but trained an AI with it, that creates new texts. So they did a remix in a way. Remixes are handled differently in copyright law.
"The corpus [GPT-2] was trained on, […] 40 [GB] of text from URLs shared in Reddit" https://en.wikipedia.org/wiki/OpenAI
OpenAI - Wikipedia

New York Times Sues OpenAI and Microsoft Over Use of Copyrighted Work

Millions of articles from The New York Times were used to train chatbots that now compete with it, the lawsuit said.

The New York Times
@yabellini I can not read the article as it's behind a paywall and the other document is 69 pages long. I will not read that. If you want to say something with it, say waht you want to say. Depending on what you will say, I will think about if I want to check that with the provided sources or not.

@duco @yabellini We can bypass paywalls by prepending "archive .is/" to the URL.

https://archive.is/YOFMJ