All the talk about Google losing ground & cutting jobs made me nervous about what I'd do if Google Scholar got the chop...

So I dug into the options – and came away relieved. Even a little optimistic!

New PLOS post: Could there be some viable challengers to Google Scholar on the horizon?

https://absolutelymaybe.plos.org/2023/02/16/could-there-be-some-viable-challengers-to-google-scholar-on-the-horizon/

#OpenScience #AcademicMastodon #AcademicChatter

Could There Be Some Viable Challengers to Google Scholar on the Horizon? - Absolutely Maybe

In 2019, I wrote a pair of posts about the risks of our reliance on Google Scholar (GS), and search engine alternatives…

Absolutely Maybe

@hildabast Thank you! I'm not sure I understand why #InternetArchiveScholar was summarily dismissed as being too small. Nowadays I use it as my primary academic search engine, for two reasons: 1) it tends to have more #OpenAccess / full text links than anything else; 2) advanced search actually works and results are generally more consistent and reliable.

If I fail to find anything, I proceed to check #GoogleScholar, where typically I then have to browse dozens of pages of unrelated junk.

@nemobis 25m compared to over 200m for others (and presumably well over 400m+ for Google Scholar) is very small. The chance of missing the best/most important works on a topic is extremely high. And if you're not seeing the body of work on a question, you can't know if you're seeing an outlier study, for example.

@hildabast What service has 200 million full text works?

As for metadata, there are at least 70 million records on IA scholar https://scholar.archive.org/search?q=year%3A%3C%3D2023, it's actually 130 million.
https://fatcat.wiki/

@nemobis A service with 200m articles would have way more than 25m in full text - especially if you use eg Unpaywall or GS in your browser to scoop up those in ResearchGate etc.

I used the number they say they have, and I think it's right. The results you linked to seemed to be counting individual scanned pages as records, not works/articles.

This doesn't cover enough of the literature - but it'd be a good place to hunt for full text of something you're desperate to find.

@hildabast Their number is "131,904,171 papers".
https://fatcat.wiki/stats

The search doesn't include all individual pages; those would be about 500 million from microfilms IIRC. IA extracts metadata from microfilms to include even papers without DOIs.

IA Scholar is a source used by GS, not the opposite. We don't know how many full texts GS has. Unpaywall has about 40M URLs but many of these are not archived.

25M is almost three times PMC, it's huge by any definition. Especially for full text.

Stats | fatcat!

@nemobis I went by what they said here, which may well be out of date, but it does look under construction to me: https://scholar.archive.org/

Even if it is that many articles, it's a question of what it is - for me, academic search has to cover the research literature across the board, including the most recent.

I disagree about your take on the other sources: eg Unpaywall is over 46m free ones. And for the scholarly research area, WWS and BASE have better coverage it seems to me than IA Scholar.

Rate limit reached

@hildabast It depends on what one is using the search for.

BASE doesn't offer full text search. I use BASE when I know exactly what I'm looking for (e.g. a specific author name or other piece of metadata, or a specific term I know will be in the abstract).

I wouldn't use IA scholar as a starting point for a systematic review. For full-text search, however, I found it works much better than GS, where basic boolean searches are impossible.

@hildabast Yes, the front page mentions 25M for full text. It doesn't say anything about the number of metadata records overall. So the 25M should be compared to the number of full text, as was done here:
https://en.wikipedia.org/wiki/List_of_academic_databases_and_search_engines#Full-text_aggregators

IA scholar is still WIP, yes, but it's by far the biggest full text indexer at the moment. (We have no idea about GS, but it isn't really a full text search anyway given it doesn't follow the user's instructions for search.)

List of academic databases and search engines - Wikipedia

@nemobis GS does search full text and must be the biggest for that. It does follow the user's instructions - there just aren't many options for the instructions so you have to sift through a lot of irrelevant results.

I've seen formal assessments of the strengths & weaknesses of GS & others - but not of IA. Will keep my eye out for that!

@nemobis PubMed doesn't search full text either (except for PMC). It's not the most critical thing to me. GS searches full text but agree on the limits.

To me, searching the whole literature is far more critical than anything else. Even when I'm not doing a systematic review I want to see the literature. I only ever want full texts after I've got the lay of the land.

For me, the GS ability to search articles that cite articles that are exactly what you want makes up for a lot.

@hildabast Maybe the difference is in how consistent the titles and abstracts are in the literature you usually look for vs. what I usually look for.

Citation search is a blessing, I agree! I actually mostly use lens.org for that these days.

@nemobis @hildabast

While it is true that we do not index full documents, we do index abstracts. In our experience, abstracts, titles and other metadata are a big enough haystack to search if you vaguely know what you are looking for. Searching the full text can be distracting.

@base @hildabast That's true and it's a huge advantage compared to Google Scholar actually. A search by proper name like #Wikipedia works much better on https://www.base-search.net/Search/Results?lookfor=Wikipedia+doctype%3A121+rights%3ACC-%2A+year%3A%5B2020+TO+%2A%5D&l=en&oaboost=1&ling=0&newsearch=1&refid=dcadven&name= than on https://scholar.google.it/scholar?as_ylo=2020&q=%22wikipedia%22&hl=it&as_sdt=0,5 . However, I don't always know what to look for in the abstract.
BASE (Bielefeld Academic Search Engine): Hit List

BASE Search Result