Pretty fucking bold of these scientists to scrape one of my copyrighted photographs off the internet and then re-release it uncredited under a Creative Commons license because they used it for training data for an algorithm.

https://www.researchgate.net/figure/Sample-images-of-each-class-of-the-self-created-dataset-for-early-pest-detection_fig1_366224366

Using copyrighted images to train algorithms is a kind of grey area, and I can see some decent arguments in favor of either.

But you can't act as someone else's agent and distribute their work without permission.

Even if you limit your algorithm to only CC or other open licenses, the terms of most of those licenses still stipulate that the rights-holder be credited in a particular way when the image is displayed. That wasn't done here.
"Hey Everyone! I made an App to recognize Lady Gaga songs. Here are the songs we used to train the app, which I am releasing under a Creative Commons license, please credit me when you share them"

@alexwild I'm a big fan of the capabilities that AI tools are bringing and think that in most cases (excluding overfitting) they're a classic example of the purpose of fair use exemptions.

But you're right, THIS sort of behavior is not at all "that". It's just nicking an uncredited photo for your paper and presenting it as yours.

@nafnlaus @alexwild I am doubtful of this. The AI is constantly re-using the data set. It is not a case of the prior work being used once. The prior work is implicitly in use continuously with every use of the AI engine. That work is now always available and always informing the AI output.
@alexwild you can't complain because you're stalling EPIC PROGRESS here, that will REVOLUTIONIZE everything it touches like goddamn Midas