Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?
@yabellini SciHub makes papers public that are behind paywalls. I agree, that they shouldn't be behind paywalls, but it's completely different to OpenAI.
I think they used mostly sources that are public anyway, like Wikipedia, etc. They also didn't publish them but trained an AI with it, that creates new texts. So they did a remix in a way. Remixes are handled differently in copyright law.
"The corpus [GPT-2] was trained on, […] 40 [GB] of text from URLs shared in Reddit" https://en.wikipedia.org/wiki/OpenAI
