| Website | https://whois.kellykaoud.is |
| Github | https://github.com/kaoudis |
| Meatspace | San Francisco ππ |
| Pronoun | She |
| Website | https://whois.kellykaoud.is |
| Github | https://github.com/kaoudis |
| Meatspace | San Francisco ππ |
| Pronoun | She |
Ask Jeeves, one of the first search engines, has shut down on Friday
Why are AI people so monumentally *bad* at copyright?
I'm looking for ethical/copyright-safe training data sets. Common Corpus sells itself as that... but then I go read the paper and they include CC BY-SA scientific papers and GPL stuff from GitHub, and then in models trained on that dataset they proudly state:
> Only trained on open data under a permissible [sic?] license [...] By design, all Pleias model are unable to output copyrighted content.
Um, no?? CC BY-SA is not public domain, it's a copyright license. You can't train on CC BY-SA content and then claim your model is any more copyright-safe than whatever Google and Meta are releasing. It just means you're violating the copyright of people releasing content under open licenses only.