Pretty fucking bold of these scientists to scrape one of my copyrighted photographs off the internet and then re-release it uncredited under a Creative Commons license because they used it for training data for an algorithm.

https://www.researchgate.net/figure/Sample-images-of-each-class-of-the-self-created-dataset-for-early-pest-detection_fig1_366224366

Using copyrighted images to train algorithms is a kind of grey area, and I can see some decent arguments in favor of either.

But you can't act as someone else's agent and distribute their work without permission.

Even if you limit your algorithm to only CC or other open licenses, the terms of most of those licenses still stipulate that the rights-holder be credited in a particular way when the image is displayed. That wasn't done here.
"Hey Everyone! I made an App to recognize Lady Gaga songs. Here are the songs we used to train the app, which I am releasing under a Creative Commons license, please credit me when you share them"

@alexwild I'm a big fan of the capabilities that AI tools are bringing and think that in most cases (excluding overfitting) they're a classic example of the purpose of fair use exemptions.

But you're right, THIS sort of behavior is not at all "that". It's just nicking an uncredited photo for your paper and presenting it as yours.

@nafnlaus @alexwild I am doubtful of this. The AI is constantly re-using the data set. It is not a case of the prior work being used once. The prior work is implicitly in use continuously with every use of the AI engine. That work is now always available and always informing the AI output.
@alexwild you can't complain because you're stalling EPIC PROGRESS here, that will REVOLUTIONIZE everything it touches like goddamn Midas
@alexwild I don't think it's a grey area at all. If you don't have permission to use the image for any reason, don't use it.
@SecondNatureMB @alexwild they would probably claim it was being used for educational purposes. And they probably have more lawyers.
@alexwild I actually think that images shouldn't be copyrighted on the Internet, much like software patents are a huge scam
@nero @alexwild Using it to train an algorithm without reproducing it is kind of like looking at a photo to check an identification. You use the info, so credits would be in place but it's not copyright infringement. But what is done in this paper is obviously not right.

@nero I actually think all camera equipement should be free.

But it isn't.

@nero @alexwild that would be ok, if everyone had guaranteed income. Copyright law doesn’t commodify expression, although corps pretend it does. It exists to allow creators to make a living - in order to make sure creators can create. This benefits society as a whole.
But since we don’t have universal income, or a real system of automatic compensation, your view is akin to asking everyone to sometimes work for free without warning.
@Dennas @nero I think we'd all do well with an expansion and clarification of Fair Use, adapted for the social media era.
@alexwild given the way that image AI can be used to generate new works you can make an argument that no AI uses -especially data training- should be allowed without explicit rights being agreed.
@alexwild @questauthority Arguably, this falls under fair use. Under the standard 4-part fair use doctrine:
1. The use is transformative and for educational/research purposes.
2. The nature of your work is more factual than fictional, as it documents actual living organisms in situ.
3. The authors display only a thumbnail of your work, likely only a small percentage of the pixels from your photography.
4. The use is unlikely to harm your ability to profit from your work.

@bhawthorne @questauthority Lol, no.

MDPI is a for-profit publisher; this is not classroom use, and it is not educational *about my image*, which is not even credited.

Your arguments about "actual organisms in situ" is just dumb, really, since you have no idea. It was a 2 hour studio session I had to arrange, including sets and lighting.

@bhawthorne @questauthority Points 3-4 are likely have some merit, although it's worth noting that they did display the entire work, not a crop, even though the resolution is much reduced.

@alexwild @bhawthorne @questauthority thats the main problem in these discussions. The 4 points are just interpretations of how it is handled in the US. But even these are rules would be handled differently in Europe.

And thats the main thing with fair use. It's always arguable. And in case of AI it probably needs a court decission.

@alexwild @questauthority That’s a really good point. I’m not aware of any case law yet on whether down-sampling qualifies for that prong of the fair-use test.
@alexwild @questauthority Well, given that the photos that they included were tiny thumbnails and I had no idea which was yours, perhaps I can be excused for not recognizing that one of them was a studio shot?
Regardless of whether mdpi is making money, the use is transformative and it certainly seems like fair use to me.

@bhawthorne @alexwild

Unlikely, IMO. I'm very familiar with the factors. The major distinction from the other thumbnail cases is that those indexed and linked back to the original sources. Absent that, several of the factors are likely to fall out differently.

@questauthority @alexwild Thanks, Mike. That’s helpful information.

@alexwild Might be me but it's not a gray area at all. You use a product to create yours, you pay.

You do groceries to create a dish, you pay for the groceries. Same thing.

@alexwild I’d fall on the side that it’s not a grey area when it comes to scraping copyrighted images. If anything the AI companies should be working with stock image companies and negotiate terms through them.

@alexwild: If they released it as a part of the input dataset, the algorithm grey area doesn't really apply, but copyright law also has exceptions for scientific and research uses, and this particular way of use might actually fall into the grey area surrounding those.

At the very least, I'd think you should be eligible for a proper credit, though. Perhaps write to the authors and describe the situation?

@alexwild Bold? You mean dishonest?

@alexwild

Contact the journal that published the accompanying paper. Most have ethics teams that look into this.

@komputernik @alexwild clearly do report to the journal. If they don't respect your license make it very clear. It's not the ethics board only that needs to be contacted but the legal. Also this is published by a journal of MDPI, a publisher with poor reputation:
https://en.m.wikipedia.org/wiki/MDPI
MDPI - Wikipedia

@alexwild maybe be happy you contributed to better a model.
@nero @alexwild Should the farmer be happy to have contributed wheat to the baker? And not be paid at all? I'm afraid you haven't considered at all the enormous labor, patience, expertise and costs that go into acquiring photographic reference images of any kind, much less of living organisms.
@nero @alexwild Yeah, that's not how it works. If those 'scientists'thought the image was useful, they should have ASKED.

@Gremriel @nero @alexwild The problem with asking on a project like that is that you need like, thousands upon thousands of pictures in order to constitute even a "small" dataset. They don't even have time to curate these things (a lot of porn ends up in them too), because it's not feasible for a small research team to go through each one and check.

When they want to monetize these things though, they really ought to spend the time and money to ethically source: they keep skipping that step.

@alexwild I would be furious. I'm furious on your behalf.
@alexwild will you be able to do something about it?
@alexwild Not cool! Hopefully they take it down & compensate you in some way.
@darnell I'd mostly like the journal to pay better attention to IP issues surrounding training data, because we'll be seeing a lot of this sort of thing.
@alexwild I see. The least they could do is credit you though. But image/video scraping seems to be an everlasting issue online.
@alexwild Try writing to the Journal. They can take action. If the lead author was a student, I would turn a blind eye and only make sure they knew the mistake. But in this case, the first author is an assistant professor. He should know better.
@alexwild so sorry. Blantant copyright infringement. If it were me, I would contact them very officially, and say I’m willing to allow further continued use with a contract, creator credit, removal of CC, and small compensation, or take it down immediately. If not, get a lawyer.

@alexwild

as is often the case, where the data came from isn't properly documented

@alexwild "available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright."

Do they have any concept how a) licenses b) copyright works?

@alexwild woooooooow. that’s, uh, how did they think this was okay???
@alexwild This is super slimy. I'm sorry Alex.

@alexwild

Yikes - yes, this is very much infringement.

@questauthority Yeah. There are so many interesting, unresolved IP gray areas surrounding the use of images for scientific research and as input for AI, but this one clearly isn't even close to that area.

@alexwild Weird way to spell "legally perilous AF"

They added a nice "Content may be subject to copyright", but didn't think about what those words mean?

I assume demand letter incoming, along an offer to settle for statutory damages?

@alexwild IANAL but I think as long as they only re-distribute it as a thumbnail (technically a citation) in the paper it's legally covered by the "research and scholarship" exceptions to copyright.

However, when it comes to the use as "training data" in commercial settings, I think we're in completely uncharted waters.

@cheetah_spottycat A citation? To what? The only way a viewer is going to discover the source image is to reverse-image search the thumbnail and hope for the best.
@alexwild It's defensible under fair use for educational purposes. They'd have the most trouble with the "amount" branch of the test. https://www.lib.uchicago.edu/copyrightinfo/fairuse.html
Fair Use and Other Educational Uses

@borogove Fair Use exceptions are unlikely to cover the assumption of rights required to assign a Creative Commons license to other people's works, though. It's the assignment of the license that's the issue, not the use of the images for training algorithms, or the reporting of the methods.

@alexwild

Probably thought no one would notice.

@Connect I suspect it's more likely they just didn't think about it. I'm more irked at the journal.
@alexwild Did anyone ever believe that RG was a friendly organization observing anybody's copyrights?
@wolfgangcramer Research Gate is what it is. The publisher, MDPI, should know better.
@alexwild Send them an invoice. They will quickly change their tune.
@alexwild this is a reasonable occasion for exploding DMCAs all around
@davidgerard Whatever it takes for editors to make sure authors are clearing these things correctly.