Mastodawn

Sarah Jamie Lewis Feb 24, 2023

There is an alternate timeline where the semantic web took off and there was wide investment in ontological tooling to ensure that the information in academic papers, websites, and applications was structured and accessible to future processing.

We instead live in a world where all the useful data is trapped inside proprietary formats, and entangled in meaningless prose - a world primed for large language models to come along and hallucinate the data that might contained therein.

Show thread

sabik Feb 24, 2023

@sarahjamielewis
Not just proprietary formats; a lot of academic papers are also locked up by the publishers

Anyone can run their language model on the Wikipedia, only a few researchers manage to get a hard disk from Elsevier

Show thread

Colin Curtin

@sarahjamielewis @sabik sci-hub perhaps?

Show thread

sabik Feb 24, 2023

@perplexes
@sarahjamielewis
Doesn't seem to happen in practice