It hit me this morning that often what I find frustrating in discussions around "intellectual property", piracy, large datasets for training things like CLIP, &c. is that IP is a really really poor substitute for actually useful conversations around consent and respect

Like Elsevier asking me to "pwease no steal uwu" about journal articles is very different than, like, an individual selling self-published books on the side saying "hey I need this money to pay rent, so please purchase it legit"

An artist saying "hey I don't like for-profit companies building generators from my work that I posted to deviant art" is very different than Disney cracking down on people making shit with characters they "own".

Someone saying "hey this is really personal work, I don't really want it passed around and edited without my consent" is not the same as pebbleyeet getting mad at anti-fash edits of his comics.

IP is bullshit but that doesn't mean we have to take unnuanced all-or-nothing approaches to things.

That would be like saying if you want to support squatters taking over an airbnb then you can't have a lock on your bathroom door: it's conflating such wildly different things that it's a little silly.

@left_adjoint thanks, I've been craving a nuanced view. maybe creative commons needs an "do not use for AI training data" clause?

I found this article quite interesting, mainly because it goes deep into the training data that is used (which is full of porn): https://reticular.hypotheses.org/5216

@jollysea @left_adjoint our view on "exclude from AI training" is that, legally and ethically, it's completely backwards

the legal element is pretty straightforward: if you have licensed your images to allow essentially any use, then use by AIs is fair game. if you require attribution, however, AIs have no way of accommodating that (today), and so the onus is *on AI developers* to respect that requirement. and if your images are licensed to prohibit commercial use then any inclusion in an AI data set by a corporation *is against that license*. again, these are all pretty cut-and-dried

ethically, taking a bunch of people's images (or anything else), churning them through a program, then handing out the results to people without any consideration for the people who did all that initial labor is just really crappy. doing it *for profit* is even crappier. there are plenty of potential uses for that work that might not have serious ethical concerns, but since the intention is to teach a computer to reproduce other people's work, the elimination of those people from their own field is a pretty overt implication

a central problem in AI has been a lack of collaboration between all stakeholders--not just those developing AIs, but the people who develop the data used in data sets. it is valuable that people are collecting and cataloging this data to begin with, to be sure, but the people whose work is being collected should have a right to influence how that collection is done, too. as far as we are aware, they are usually excluded from such pursuits, under the very techbro assumptions that "information wants to be free" and "if it's on the internet, it's forever" and so forth

we'd also say it's a very Western colonialist mindset to see massive troves of other people's efforts and think "i can harvest and exploit this for my own ends however i please." legally, sure, maybe you can. ethically? if you aren't thinking about (or more importantly, consulting with) the people who actually created everything you are looking at harvesting, you're doing something deeply antisocial

@bigmarinara thanks, those are some very good points. when I published the toot, I thought "wait, there is non-commercial, that should apply"

but then again, they wouldn't care, because they don't care at all and just take everything they can find.