Mastodawn

Bianca Kramer Nov 21, 2024

We are seeing increased commodification of abstracts - perhaps not surprising given their importance in GenAI.

In 2022, SpringerNature had abstracts from
'their' non-open access articles removed from @OpenAlex

Now Elsevier appears to have done the same.

https://github.com/ourresearch/openalex-guts/commit/b85b3bc77cf9c0f3bd162426a2ba0dacdc951065

Needless to say, neither provide abstracts for these articles to @crossref either: https://i4oa.org/#:~:text=The%20following%20figure.

Is this how open we want research abstracts to be?

#openmetadata #openabstracts
#barcelonadeclaration

do not store closed elsevier abstract · ourresearch/openalex-guts@b85b3bc

The guts for computing data for OpenAlex. For more, see https://openalex.org/. - do not store closed elsevier abstract · ourresearch/openalex-guts@b85b3bc

GitHub

Show thread

Simon Lucy Nov 21, 2024

@MsPhelps @OpenAlex @crossref

It's a bit of a miserable kick in the teeth how the front page of ScienceDirect has been emasculated after all the work was done to convert it from a lumpen mass into a structured scalable and robust vehicle for the crown jewels.

I think the public API still delivers abstracts, and the article page has been restructured with more excerpts in closed articles.

I don't think CrossRef is getting the financial support that it was.

Show thread

Bianca Kramer Nov 21, 2024

@simon_lucy @OpenAlex @crossref

For Crossref, it's not an issue of financial capacity? (at least not directly). It's a decision on the publisher side to deposit abstracts or not. For some, there are technical barriers, but for others, it's a strategic choice.

Show thread

Simon Lucy Nov 21, 2024

@MsPhelps @OpenAlex @crossref

I get that. But I notice that CrossRef is still going through issues in delivering services.

The point about Abstracts being harvested for model training would be one thing but ScienceDirect still publish Abstracts which are indexed by search engines. It certainly was the case that recognised search engines, like Google, could index full articles. I'd be very surprised if they couldn't now, because of the impact on page impressions.

Show thread

Simon Lucy

@MsPhelps @OpenAlex @crossref

Now I've thought about this a bit more and stirred the memory up, though it's still vague; CrossRef in Elsevier (generally) was thought of as linking and metadata (latterly events) and not content.

Though if Product Management had thought publishing Abstracts to CrossRef would increase Counter metrics they'd doubtless have made it happen.

Show thread

Jeroen Bosman Nov 23, 2024

@simon_lucy @MsPhelps @OpenAlex @crossref Not so sure about that. Both Elsevier and the SN/Holtzbrinck family have profit generating products built on abstracts (Scopus and Dimensions respectively) and probably don't like all abstracts becoming openly available at scale, machine readable, @OpenAlex, let alone with an open CC0 license. They can't (?) control other publishers sharing abstracts, but they can retain the ones they claim rights over.

Show thread

Simon Lucy Nov 23, 2024

@jeroenbosman @MsPhelps @OpenAlex @crossref

Abstracts aren't the driver for Scopus, that's the citation graph, the Abstracts database is one way into it and it's useful for decorating results but it's the graph that matters.

As far as I'm aware Abstracts are still returned from the APIs including the Analytics API, I could be corrected on that :-).

Show thread

Bianca Kramer Nov 23, 2024

@simon_lucy @jeroenbosman @OpenAlex @crossref

Imo Scopus is about more than the citation graph, and abstracts play an important role in discovery and profiling (also indirectly via Pure) - and E is expanding on that value with eg Scopus AI.

Abstracts (and other metadata) returned via the API are under direct control of E via the terms and conditions, so to me that's not a contradiction.

Show thread

Bianca Kramer Nov 23, 2024

@simon_lucy @jeroenbosman @OpenAlex @crossref

PS Not denying the importance of the citation graph, by the way, and I'm pretty sure Elsevier would have preferred to keep citations out of the public domain as well, but that other forces (Crossref policy and DORA) forced their hand in the end :)

Show thread

Jeroen Bosman Nov 23, 2024

@MsPhelps @simon_lucy @OpenAlex @crossref Currently looking into Scopus and Wos use cases at our institutions. It would surprise me if discovery using topical search terms (by students and researchers, incl. for syst reviews) would not be the most *frequent* use type of these systems. Of course citations based discovery, metrics and citation analysis is arguably also important and perhaps the prime reason why institutions hold on to their licenses for these systems, despite their limitations.

Show thread

Simon Lucy Nov 23, 2024

@MsPhelps @jeroenbosman @OpenAlex @crossref

I imagine they still spend a lot of effort picking up other publisher's back catalogue as the aim is as complete a corpus as possible.

I can see that 'AI' is now more important for Scopus than it was when I was around.

I don't know if they've added preprints and data set metadata which was a couple of things I was involved with along with the attempts to improve disambiguation of authors.