We are seeing increased commodification of abstracts - perhaps not surprising given their importance in GenAI.

In 2022, SpringerNature had abstracts from
'their' non-open access articles removed from @OpenAlex

Now Elsevier appears to have done the same.

https://github.com/ourresearch/openalex-guts/commit/b85b3bc77cf9c0f3bd162426a2ba0dacdc951065

Needless to say, neither provide abstracts for these articles to @crossref either: https://i4oa.org/#:~:text=The%20following%20figure.

Is this how open we want research abstracts to be?

#openmetadata #openabstracts
#barcelonadeclaration

do not store closed elsevier abstract · ourresearch/openalex-guts@b85b3bc

The guts for computing data for OpenAlex. For more, see https://openalex.org/. - do not store closed elsevier abstract · ourresearch/openalex-guts@b85b3bc

GitHub

@MsPhelps @OpenAlex @crossref

So the question (or, one of the questions) is: can an abstract be copyrighted, like the article, or can it not, like metadata? Maybe the third-party creation and dissemination of AI-generated abstracts cannot be prevented by legal constraints?

@anwagnerdreas @MsPhelps @OpenAlex @crossref

perhaps relatedly, they are also starting to paywall parts of the reference list too(!) https://mastodon.social/@rmounce/113202581844074647

@rmounce @anwagnerdreas @OpenAlex @crossref

1) Yes, abstracts straddle the boundary between text and metadata. Crossref considers them copyrightable, and thus exempts them from the non-licensable status of their other metadata

("Crossref generally provides metadata without restriction; however, some abstracts contained in the metadata may be subject to copyright by publishers or authors" -
https://www.crossref.org/documentation/retrieve-metadata/rest-api/#:~:text=restriction%3B%20however%2C%20some-,abstracts,-contained%20in%20the)

REST API - Crossref

Our publicly available REST API exposes the metadata that members deposit with Crossref when they register their content with us. And it’s not just the bibliographic metadata either: funding data, license information, full-text links, ORCID iDs, abstracts, and Crossmark updates are in members’ metadata too. You can search, facet, filter, or sample metadata from thousands of members, and the results are returned in JSON. Learn more in our REST API documentation.

www.crossref.org

@rmounce @anwagnerdreas @OpenAlex @crossref

2/ OpenAlex, however, distributes the inverted abstract index under CC0 as part of their data. (which can be challenged, and now apparently has been, twice)

@rmounce @anwagnerdreas @OpenAlex @crossref

3/ Regarding AI generated abstracts, I always wonder whether there are distinct legal aspects to a) using (non-openly licensed) full text to train a model, b) using (non-openly licensed) full text as prompts and c) whether the generated output consitutes derivative use?

@rmounce @anwagnerdreas @OpenAlex @crossref

4/ And that thing with references is Ridiculous (esp. given that E now do provide them to Crossref!)

/end

@MsPhelps @rmounce @OpenAlex @crossref wrt training, that's an obvious problem.

I wonder, though, how the creation of an abstract differs from the extraction of factual information in terms of #IntellectualProperty.
Why would the product of one process (eventually) be a derivative work and the other would not? And is it the creative/copyrightable character of the product that determines whether the process itself is legitimate use, or is it something else? Could the license, for instance, discriminate between these processes and allow one but disallow the other? These are sincere questions...

#FediLaw

@anwagnerdreas @rmounce @OpenAlex @crossref

In my understanding, in copyright terms 'derivative' is about whether the product is itself copyrightable - which is where creating new text would differ from extracting factual information.

1/3

@anwagnerdreas @rmounce @OpenAlex @crossref

Also, I keep coming back to thinking about the difference between a) the process, b) the outcome and c) what is done with the outcome (e.g. whether it's made public, or sold) - e.g. in some cases (not genAI per se or only), the process itself might be legal, but sharing the outcomes might violate the license of the original sources.

(as an aside, I've long wondered about that regarding TDM permissions in general)

2/

@anwagnerdreas @rmounce @OpenAlex @crossref

And regarding the last point: Creative Commons licenses, at least, make no difference between types of usage as long as they are allowed under the license, but there is a current discussion about signalling that through preference signals: https://creativecommons.org/2023/08/31/exploring-preference-signals-for-ai-training/ (h/t @jeroenbosman)

Exploring Preference Signals for AI Training - Creative Commons

One of the motivations for founding Creative Commons (CC) was offering more choices for people who wish to share their works openly. Through engagement with a wide variety of stakeholders, we heard frustrations with the “all or nothing” choices they seemed to face with copyright. Instead they wanted to let the public share and reuse…

Creative Commons
@MsPhelps @anwagnerdreas @rmounce @OpenAlex @crossref in my understanding of copyright abstracts are outcomes of creative work and thus fall under copyright. Under copyright limitations or fair use you can cite a few lines or paraphrase, but copying, and distributing full abstracts would like constitute a violation, unless of course the copyright holder has given permission (e.g. via an open license). 1/2
@MsPhelps @anwagnerdreas @rmounce @OpenAlex @crossref Of course you can make a summary of any work, which then in itself become copyrightable. "The article xx by authors xx researched topic xx answering question xx using xx data and xx method and xx analysis to come to xx results with xx conclusions and xx considerations." That would be your interpretation of what the article is about, but it has to be mostly in your own words/phrasing. I think you can do that at scale and can share it openly.
@MsPhelps @anwagnerdreas @rmounce @OpenAlex @crossref Of course to do that you would need to have legal access to full texts. Finally, it probably would not matter whether you made these summaries manually or using AI, although in the latter case a judge might question whether the result is copyrightable. But maybe the code/prompt is. An automated process would constitute TDM that rightholders can opt out of (using machine readable statements) if it is for non-scientific, commercial purposes.