There is an alternate timeline where the semantic web took off and there was wide investment in ontological tooling to ensure that the information in academic papers, websites, and applications was structured and accessible to future processing.

We instead live in a world where all the useful data is trapped inside proprietary formats, and entangled in meaningless prose - a world primed for large language models to come along and hallucinate the data that might contained therein.

@sarahjamielewis
Not just proprietary formats; a lot of academic papers are also locked up by the publishers

Anyone can run their language model on the Wikipedia, only a few researchers manage to get a hard disk from Elsevier

@sarahjamielewis @sabik sci-hub perhaps?
@perplexes
@sarahjamielewis
Doesn't seem to happen in practice