Mastodawn

Last month, I was in Vienna to talk about #Wikipedia and AI (and #Wikidata, #AbstractWikipedia, and the Wikidata Embedding Project). It took me some time, but I have now written down most of my talk there in English. Enjoy the story of the owl 🦉 and the bat🦇

https://blog.johl.io/the-owl-and-the-bat/

The Owl and the Bat

Knowledge Production on Wikimedia Projects with Artificial Intelligence (whatever that means) There’s no doubt that what we refer to as “Artificial Intelligence” changes knowledge production. English Wikipedia has recently adopted a policy which prohibits the use of Large Language Models to generate or rewrite article content. German Wikipedia started a Request for Comment discussion on a comparable policy a little bit before that. We typically don’t see everything AI in 2026 as a field in computer science research, but in a context of exploitative business practices that – among other effects on the real world —put considerable strains on Wikimedia’s infrastructure and thus the Knowledge Commons.

My heart is a Turing machine: A blog by Jens Ohlig

Show thread

awight 1d ago

@johl @simulo Semantic similarity is poisoned by orthography at the moment, but Wikidata does offer the promise of a truly vector embedding some time in the future. "Pixelated" would be a good metaphor for what we see now: https://www.wikidata.org/wiki/Wikidata_talk:Embedding_Project#Evaluating_anglocentrism — this owl might be playing baseball ⚾ with bats 🦇 .

Wikidata talk:Embedding Project - Wikidata

Show thread

awight

@johl @simulo But what is the semantic mechanization good for in the first place? I want to see my wikis encouraging more of people + people, and there are so many ways to make this happen: more resources going to the authors so they can afford to do this work, more collaborative software features like simultaneous drafting, more diverse modalities and ontologies.

Show thread

awight 1d ago

@johl @simulo To take one specific example, I want to see authors able to collaboratively translate on multiple wikis at once, which is close to the dream of Abstract Wikipedia, and possibly achievable with similar tooling, but perhaps with a subtle shift of focus.

Some arbitrarily chosen but concrete features in this direction: Readers see multiple languages at once with alignment and deduplication [eg. https://aclanthology.org/2024.emnlp-main.384/]. And [see amire80] global templates through WikiFunctions.

Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia

Farhan Samir, Chan Young Park, Anjalie Field, Vered Shwartz, Yulia Tsvetkov. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.

ACL Anthology