Mastodawn

theruran 🌐🖥️🖲️Feb 24, 2023

Sarah Jamie Lewis

There is an alternate timeline where the semantic web took off and there was wide investment in ontological tooling to ensure that the information in academic papers, websites, and applications was structured and accessible to future processing.

We instead live in a world where all the useful data is trapped inside proprietary formats, and entangled in meaningless prose - a world primed for large language models to come along and hallucinate the data that might contained therein.

Sarah Jamie Lewis Feb 24, 2023

I wanted to actually implement a transformer before I voiced an opinion on large language models. The whole self-attention structure is really cool and highly recommend sitting down and playing with the ideas directly.

I just wish we had done a lot of other work to fully take advantage of them.

sabik Feb 24, 2023

@sarahjamielewis
Not just proprietary formats; a lot of academic papers are also locked up by the publishers

Anyone can run their language model on the Wikipedia, only a few researchers manage to get a hard disk from Elsevier

Colin Curtin Feb 24, 2023

@sarahjamielewis @sabik sci-hub perhaps?

sabik Feb 24, 2023

@perplexes
@sarahjamielewis
Doesn't seem to happen in practice

Madagascar_Sky Feb 24, 2023

@sarahjamielewis Do you have any reading recommendations for the semantic web stuff? I'm very lost.

Denny Vrandečić Feb 26, 2023

@Madagascar_Sky @sarahjamielewis I found this one painful and I don't agree with all of it, but very well written:

https://twobithistory.org/2018/05/27/semantic-web.html

In my opinion it misses Wikidata, but I'm biased.

Whatever Happened to the Semantic Web?

In 2001, Tim Berners-Lee, inventor of the World Wide Web, published an article in Scientific American.

Debora Weber-Wulff Feb 26, 2023

@vrandecic @Madagascar_Sky @sarahjamielewis

Okay, right, Denny is biased, he helped make #Wikidata the right way :)

I used to teach how worthless the promises of Semantic Web were. But since Wikidata reached a useful size, there is now quite a lot of useful things one can do with the data, so now I teach that there is a glimmer of hope called Wikidata and federated databases....

Shriram Krishnamurthi Feb 26, 2023

@WiseWoman @vrandecic @Madagascar_Sky @sarahjamielewis I once heard Peter Norvig perfectly summarize the problems with the Semantic Web to a True Believer who was trying to proselyte him: "People are lazy, and they lie." (One can add other human traits too: they are mistaken, they disagree, etc.)

Denny Vrandečić Feb 26, 2023

@shriramk @WiseWoman @Madagascar_Sky @sarahjamielewis

Those are very relevant observations, and this leads to the one large question the Semantic Web never answered: how does the incentive infrastructure look like? The few parts of the Semantic Web that provided a decent answer to that question were the ones that were successful: schema.org, Wikidata and the wider GLAM world, usage inside emails, inside organizations as a data integration technology.

Lukas Fuchsgruber Feb 27, 2023

@vrandecic @shriramk @WiseWoman @Madagascar_Sky @sarahjamielewis

Very interesting debate, I guess one upcoming incentive for semantic tech could be archiving and note keeping. For those that encounter so much info on feeds this could provide better private and collaborative means to collect stuff: ordering, selecting, describing, developing. I am thinking of the experience of using pinterest, tumblr, pinboard, obsidian, etc. as kind of cross-platform extensions to feeds and online collections.

Denny Vrandečić Feb 27, 2023

@lukasfx @shriramk @WiseWoman @Madagascar_Sky @sarahjamielewis

Yes, I agree, I think Semantic Web technology is underused in the personal and collaborative note taking space. Hypothes.is is an interesting approach in that direction.

But I'm this space you have to explain what the difference is to bookmarking extensions in your browser, delicious, and the Google Sidewiki, and why it would succeed where these things are not widely used today.

jonny (good kind)May 21, 2023

@vrandecic
@shriramk @WiseWoman @Madagascar_Sky @sarahjamielewis
the hope is we can build it p2p ;)
https://jon-e.net/infrastructure/
I'm v much on Aaron Swartz page in that the "people lie" argument is a strawman - if you're trying to make the semweb as a space of communication rather than "true" and uniform data, it is no longer the fatal problem it's presented as.

Decentralized Infrastructure for (Neuro)science

Decentralized Infrastructure for (Neuro)science

Madagascar_Sky Feb 27, 2023

@vrandecic @sarahjamielewis Thank you so much!

James Henstridge Feb 24, 2023

@sarahjamielewis Google has managed to get a bunch of sites to add RDF metadata to their sites in the last few years by calling it JSON-LD.

It seems that telling people it'll increase their search engine performance works pretty well.

Helge Feb 26, 2023

I think you describe the problem with the semantic web pretty well. Is not presented in a way people care about.

One should have presented it to researchers as: if you do it, your h-index will increase.

Please don't throw h-index hate my way for saying that. The reason it will increase is pretty simple: easier to use data implies more citations.

Mycotropic Feb 25, 2023

@sarahjamielewis NIH changed the rules on data sharing very recently so that all newly funded research will share all data in a timely way - including the code that generated the published results. It's not a standardization of language though, or of reporting which would have also helped but it's a start I think.
#Science #Epidemiology #NIH #SciencePolicy

Ricardo Segurado Feb 25, 2023

@sarahjamielewis

Maybe if that had taken off, inventing the appropriate ontological tools would have effectively been == creating AI.

Denny Vrandečić Feb 26, 2023

@red_concrete @sarahjamielewis yes, and it would have been a very different AI

Alvaro Feb 26, 2023

@sarahjamielewis I blame the time wasted in Byzantine discussions on high order logic representations instead of investing time in useful tools for common developers. A wasted opportunity.

Matěj Cepl 🇪🇺 🇨🇿 🇺🇦Feb 26, 2023

@sarahjamielewis Well, if the experience with The Semantic Web taught as anything than it is impossible to encode human knowledge in the structured information. So, it is an alternate timeline in the same sense as Harry Potter being real could be an alternate timeline.

Erwan Feb 26, 2023

@sarahjamielewis I believe the main reason why the semantic web failed isn't proprietary formats, but the fact that people are too lazy to annotate and categorize content. As a result, we get at best tagging (without any kind of ontology) and most of the time just automatic classification from the content.

Airbag Moments | 45X34 | 🇺🇦Feb 26, 2023

@sarahjamielewis I want to marry this toot

Airbag Moments | 45X34 | 🇺🇦Feb 26, 2023

@sarahjamielewis @shoq

Matěj Cepl 🇪🇺 🇨🇿 🇺🇦Feb 26, 2023

@sarahjamielewis It is not only about proprietary data. Still the most compatible format for the most data-modelled possible, #bibliographic databases, is #BibTeX which nobidy would ever suspected was meant seriously as a data format.

J Low Feb 26, 2023

@sarahjamielewis I'm a total noob, but isn't it actually improving?

Vota Shizamura 🌟Feb 26, 2023

@sarahjamielewis I'm TRYING but I can't do everything alone 😭

WtfPdf Feb 27, 2023

@sarahjamielewis proprietary and even portable document formats... 🤣 😭😭😭😭

msgbi Feb 27, 2023

@sarahjamielewis @charles_ex @aloa5

RusscamPhoto Feb 28, 2023

@sarahjamielewis @Dan_Blick “hallucinate the data”… can’t imagine what we are in for.

Dave Lane 🇳🇿

@sarahjamielewis boom. Well said

Tim Bray Feb 28, 2023

@sarahjamielewis Nice thread you launched there. Back in the day I was heavily involved at W3C and kind of TimBL's loyal opposition, a Semantic-Web skeptic. I still sort of am, but remain open-minded, there's a there there but we haven't found it. In this timeline anyhow.

Having said all that, I object to the phrase “entangled in meaningless prose”. That prose is the real payload, we are language-centric creatures.

@sarahjamielewis and worse: #research that is usually 100% tax-funded is #paywalled by #rentseeking publishers that charge obscene amounts to even read their publications at all.

jonny (good kind)May 21, 2023

@sarahjamielewis
check this out we're on the same page and there are a growing number of ppl working on making this real ❤️❤️
https://jon-e.net/surveillance-graphs/

Surveillance Graphs

Vulgarity and Cloud Orthodoxy in Linked Data Infrastructures - A critical history of the semantic web and linked data, grappling with the next generation of surveillance capitalism, where grand corporate knowledge graphs devour the planet and sell it back to us as a glassy-eyed LLM personal assistants, will we remain stuck in the ideology of the cloud, or can we have better dreams?

Surveillance Graphs

miki May 21, 2023

@sarahjamielewis There is an alternate timeline where the semantic web took off and black had SEOs had tons of fun spamming the crap out of it with fake entries promoting their products, and then people moved on to better platforms.