Mastodawn

Oh, Sci-Hub has now an AI interface that queries and references its huge repository of scientific articles and provides access to the full source papers. I think this is the logical next step in access to large archives.

"Hear the good news: recent advances in artificial intelligence enabled Sci-Hub to launch a robot that gives scientifically-grounded responses to questions. The robot starts with searching for relevant literature in Sci-Hub database, then turns to selecting and reading most recent studies, and composes the answer based on this information. The answer includes all the references, and each referenced article can be read on Sci-Hub with one click.

Unlike question-answering robots that were based upon the early generation of neural networks, Sci-Hub bot does not hallucinate and is not making up scientific facts and does not cite sources that do not exist. To support its statements, Sci-Bot uses articles from Sci-Hub database. Questions can be asked in any language, and answers can be saved on server and shared."

https://sci-bot.ru

Show thread

Malte Engeler Apr 15

@festal I am confused. Are you saying you consider these statements (new neural network = no hallucinations) accurate or do you think the quotes speak for themselves. If the former: How?!

Show thread

felix stalder Apr 15

@malteengeler

I think the lack of hallucination refers to the sources. It doesn't make up references, rather it gives you full access to the original sources.

The quality summaries across papers is likely to vary, but this is like complaining about abstracts not giving you full details. Of course not, if you want that, read the paper.

This interface provides a new access to them, the same way an abstract provides you with some idea whether it's worth reading the full paper.

Show thread

Malte Engeler

@festal But the abstract is created via LLMs the same way Google AI Overview etc is, right? I don't see how this beats the allegations of "no hallucinations". It is of course better to not also use the LLM to synthesize the text that humans understand as "sources". But it is still using stochastic word synthesis for the abstract, no?

Show thread

felix stalder Apr 15

@malteengeler

It's probably still the same underlying LLM logic, but working on a very constrained corpus (compared to Google summaries) and providing links back to the full sources. In this sense, it's more like a catalogue search, that gives you a sense of the content of a groups of documents, rather than just list of individual documents organized by "author, title, date, keyword".

If you care, you still have to read the paper, but you can access it differently and perhaps find relevant papers more quickly across a larger search space, and we are all dealing with way larger search spaces than before, even, and particularly, in academia.

Show thread

Malte Engeler Apr 15

@festal I personally don't feel encouraged by the promise of "random statistical nonsense but based on a more narrow set of data". I still consider it a problem that the model used as the base has all the integrated homogenisation - and applies it to the data set. The abstract is created from a limited set of text but the model used was trained on much bigger sets (I just checked but can't find what model they used. They didn't train one themselves on only sci-hub texts I assume).

Show thread

felix stalder Apr 15

@malteengeler fair enough. It depends what you compare it to. I can see cases where I prefer this to sci-hub's traditional search interface. I know, that's a low bar, it seems the relevant one.