1/ As far as I can tell, there are no #AI tools to help determine whether an arbitrary #book is in the #PublicDomain for an arbitrary country.

I respect the best of the non-AI tools and services already doing parts of this complex job, such as the #HathiTrust Rights Determination, #Stanford Copyright Renewal Database, and the #PublicDomainReview Guide to Finding Public Domain Works Online.

But there seems to be a niche for testing to see whether AI tools might do this job better and faster.

#Copyright #ScholComm

🧵

2/ I have two interests here.

1. When a #book is in the #PublicDomain, we can and should make it #OpenAccess. Too often we're held back by uncertainty.

2. I'm collecting an offline list of easy and medium-difficulty #scholcomm jobs that #AI tools could do about as well as humans, or better, even if the results are sometimes flawed.

Many jobs like this are already discussed in the literature, such as reformatting citations to fit the style of a given journal, identifying suitable peer reviewers for a given paper, generating alt text for images, detecting self-citation in publications -- and so on, to keep a long list short.

Determining the #copyright status of a given book is an idea I haven't seen others mention, and I want to put it out for discussion.

@petersuber Could you be specific about which AI tools you're discussing?

Claude by Anthropic, the AI that the US used to decide to bomb schoolgirls in Iran?

ChatGPT / OpenAI, the company that volunteered to help more with war crimes when Anthropic temporarily balked at being so blatantly involved in atrocities?

Meta, the company responsible for Facebook and complicit in multiple genocides because of negligent moderation?

Microsoft, which provided AI to Israel for the genocide in Gaza?

@skyfaller
If you're saying that the major AI tools are used for harmful purposes, I agree. (I often post about the harm and efforts to prevent it.) But if you're saying that it follows that we should never use those tools for innocent purposes, or that all AI tools have the same track record for harm, I have to disagree.

@petersuber I'm just trying to pin down whether you are proposing giving money to an oligarch involved in war crimes, and if so, which one.

And when it comes to LLMs and generative AI, I don't think there are any innocent uses, the harms are too great.

If you were talking about some other technology marketed as AI, I apologize for jumping to conclusions. AI companies intentionally blur the lines between the latest hype and more reliable / less destructive technologies.

@petersuber

This seems like a problem that basic programming accessing known authoritative sources could easily solve.

I also assume this is a case where it’s very important that a result indicating a copyright for a publication is expired is correct. This seems like a particularly bad match for gen-AI.

@stepheneb
On your first par: This is roughly the approach taken by the HathiTrust. As I said, I admire it. But it's limited to the works in its own collection and to the US public domain.

On your second par: Yes, it's fairly important to get this right. I should have said that AI tools can be useful even if they only do the first 80% of a job (say), and leave the rest for humans, or if their results are subject to human review. We already use AI tools this way to create metadata for new publications. We could take the same approach with a new AI tool for determining copyright status.

@petersuber @stepheneb +1 on Stephen’s comment. A Web crawler can easily be created and will be much more precise than giving all the books to the LLM tech companies to ingest (possibly against the authors’ wishes), cheaper, faster, and ecologically responsible too.

p.s.: and likewise for the citation reformatting case, but for that there’s a good free and open source solution already: use bibtex. (Or Endnote or the like for office365)

@petersuber I think of all the tasks I I might not want to trust AI to do, determining copyright status would be pretty high on the list. 😂