Mastodawn

Come do a PhD or postdoc with us! Generously ERC funded salaries, at the intersection of computational social science/humanities/DH, corpus linguistics, ML/AI, sustainability studies. Deadline 15th May (if interested but that's too short, or in the process of graduating, talk to me)
PhD: ideational ruptures euraxess.ec.europa.eu/jobs/336902
PhD: technoscience-related mechanisms euraxess.ec.europa.eu/jobs/336901
Postdoc: measuring long-term acceleration
euraxess.ec.europa.eu/jobs/337755

Andres Karjus Jan 23

How does #genAI affect artistic professions & creative industries? Does domain training also provide an edge in using #AI tools?
Preprint: "Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists"

We ran behavioural experiments with 50(!) working artists, a group of laypeople, and a #GPT4o agent, all using the same #StableDiffusion based tool. Who does better?
>> https://arxiv.org/abs/2501.12374

Expertise elevates AI usage: experimental evidence comparing laypeople and professional artists

Novel capacities of generative AI to analyze and generate cultural artifacts raise inevitable questions about the nature and value of artistic education and human expertise. Has AI already leveled the playing field between professional artists and laypeople, or do trained artistic expressive capacity, curation skills and experience instead enhance the ability to use these new tools? In this pre-registered study, we conduct experimental comparisons between 50 active artists and a demographically matched sample of laypeople. We designed two tasks to approximate artistic practice for testing their capabilities in both faithful and creative image creation: replicating a reference image, and moving as far away as possible from it. We developed a bespoke platform where participants used a modern text-to-image model to complete both tasks. We also collected and compared participants' sentiments towards AI. On average, artists produced more faithful and creative outputs than their lay counterparts, although only by a small margin. While AI may ease content creation, professional expertise is still valuable - even within the confined space of generative AI itself. Finally, we also explored how well an exemplary vision-capable large language model (GPT-4o) would complete the same tasks, if given the role of an image generation agent, and found it performed on par in copying but outperformed even artists in the creative task. The very best results were still produced by humans in both tasks. These outcomes highlight the importance of integrating artistic skills with AI training to prepare artists and other visual professionals for a technologically evolving landscape. We see a potential in collaborative synergy with generative AI, which could reshape creative industries and education in the arts.

arXiv.org

Andres Karjus Oct 3, 2024

Submission open until 15.10 for
#DHNB2025 Conference "Digital Dreams and Practices", this time held in Tartu, Estonia on 5-7 March 2025
#DH #CUDAN #culturalanalytics #digitalhumanities
Invited topics include but are not limited to:
(1) integration of trad humanities w computational techniques
(2) transition of #DH from the academic "ivory tower" to societal practice;
(3) practical applications of #AI
Submission types: long&short papers, abstracts/posters.
More info here: https://dhnb.eu/conferences/dhnb2025

DHNB 2025 – DHNB

Andres Karjus Oct 1, 2024

"Perceived gendered self-representation on Tinder using machine learning" now out in Humanities and Social Sciences Communications in the Sexuality, gender & society collection
https://www.nature.com/articles/s41599-024-03801-z
(where our phd student mined a country's entire tinder userbase to do large-scale visual anthropology🔥

Perceived gendered self-representation on Tinder using machine learning - Humanities and Social Sciences Communications

This paper explores the gendered differences between men and women as perceived through the images on the online dating platform Tinder. While personal images on Instagram, Tumblr, and Facebook have been studied en masse, large-scale studies of the landscape of visual representations on online dating platforms remain rare. We apply a machine learning algorithm to 10,680 profile images collected on Tinder in Estonia to study the perceived gendered differences in self-representation among men and women. Beyond identifying the dominant genres of profile pictures used by men and women, we build a comprehensive map of visual self-representation on the platform. We further expand our findings by analyzing the distribution of the image genres across the profile gallery and identifying the prevalent positions for each genre within the profiles. Lastly, we identify the variability of women’s and men’s images within each genre. Our approach provides a holistic overview of the culture of visual self-representation on the dating app Tinder and invites scholars to expand the research on gendered differences and stereotypes to include cross-platform and cross-cultural analysis.

Nature

Andres Karjus Aug 9, 2024

Interested in #LLMs for scalable zeroshot text&image annotation &analysis? The Baltic #DH summer school published recordings, mine here: https://www.youtube.com/watch?v=Fm7mJgI0MfU
- Intro to LLMs
- quantitizing analytics framework
- assessing error rates
- using OpenAI APIs
- running your open LLM in free Colab

BSSDH 2024 | Rapidly annotating and analyzing textual and visual data with LLMs | Andres Karjus

YouTube

Andres Karjus May 29, 2024

Attending 2 more conferences this week: presenting a poster at #DHNB2024 in person in Iceland (where a volcano just erupted) and a talk at the #xPhi2024 virtually, both on using #LLMs in #DH humanities, zero-shot text classification & Lenin detection & why we're now free to forget about topic modeling.

Andres Karjus May 17, 2024

About to take off to fly to Evolang 2024 in Madison! We're doing a workshop on Saturday on big language data & LLMs - stream link posted soon too, starts at 9AM CDT.
https://www.changeiskey.org/event/2024-evolang-workshop/
#Evolang #EvolangXV #Evolang2024

Large-scale computational approaches to evolution and change prospects and pitfalls | Change Is Key!

A first workshop on Large-scale computational approaches to evolution and change will be held at Evolang XV, Madison US. We aim to bring together language evolution, cutting-edge NLP, and LLM-driven approaches, to critically discuss novel opportunities of large-scale empirical approaches to language evolution and change.

Change Is Key!

Andres Karjus Apr 30, 2024

We collaborated with a private sector media house to explore quantifying media balance&polarization using
#ML #LLMs, on the example of stance towards immigration. Part of this was done pre-chatgpt, so we present results comparing finetuned BERTlikes vs zeroshot GPTs
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0302380

Automated stance detection in complex topics and small languages: The challenging case of immigration in polarizing news media

Automated stance detection and related machine learning methods can provide useful insights for media monitoring and academic research. Many of these approaches require annotated training datasets, which limits their applicability for languages where these may not be readily available. This paper explores the applicability of large language models for automated stance detection in a challenging scenario, involving a morphologically complex, lower-resource language, and a socio-culturally complex topic, immigration. If the approach works in this case, it can be expected to perform as well or better in less demanding scenarios. We annotate a large set of pro- and anti-immigration examples to train and compare the performance of multiple language models. We also probe the usability of GPT-3.5 (that powers ChatGPT) as an instructable zero-shot classifier for the same task. The supervised models achieve acceptable performance, but GPT-3.5 yields similar accuracy. As the latter does not require tuning with annotated data, it constitutes a potentially simpler and cheaper alternative for text classification tasks, including in lower-resource languages. We further use the best-performing supervised model to investigate diachronic trends over seven years in two corpora of Estonian mainstream and right-wing populist news sources, demonstrating the applicability of automated stance detection for news analytics and media monitoring settings even in lower-resource scenarios, and discuss correspondences between stance changes and real-world events.

Andres Karjus Mar 18, 2024

"Evolving linguistic divergence on polarizing social media"
https://www.nature.com/articles/s41599-024-02922-9
Can societies polarize to the point that ppl literally won't understand each other?
Can partisan divisions lead to language diverging?🤔
Me& @nerdpro explore the 🇺🇸 American case, using #bigdata #ML #AI & a corpus of tweets of ppl following onesided media.

Evolving linguistic divergence on polarizing social media - Humanities and Social Sciences Communications

Language change is influenced by many factors, but often starts from synchronic variation, where multiple linguistic patterns or forms coexist, or where different speech communities use language in increasingly different ways. Besides regional or economic reasons, communities may form and segregate based on political alignment. The latter, referred to as political polarization, is of growing societal concern across the world. Here we map and quantify linguistic divergence across the partisan left-right divide in the United States, using social media data. We develop a general methodology to delineate (social) media users by their political preference, based on which (potentially biased) news media accounts they do and do not follow on a given platform. Our data consists of 1.5M short posts by 10k users (about 20M words) from the social media platform Twitter (now “X”). Delineating this sample involved mining the platform for the lists of followers (n = 422M) of 72 large news media accounts. We quantify divergence in topics of conversation and word frequencies, messaging sentiment, and lexical semantics of words and emoji. We find signs of linguistic divergence across all these aspects, especially in topics and themes of conversation, in line with previous research. While US American English remains largely intelligible within its large speech community, our findings point at areas where miscommunication may eventually arise given ongoing polarization and therefore potential linguistic divergence. Our flexible methodology — combining data mining, lexicostatistics, machine learning, large language models and a systematic human annotation approach — is largely language and platform agnostic. In other words, while we focus here on US political divides and US English, the same approach is applicable to other countries, languages, and social media platforms.

Nature

Andres Karjus Mar 11, 2024

Our paper on quantifying film festivals is finally out in PLOS One!
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0297404
This is the largest mapping of the festival circuit to date (2009–2021, 616 festivals, 31989 unique films).
We also develop several novel approaches that are applicable to event science more broadly.

Quantifying the global film festival circuit: Networks, diversity, and public value creation

Film festivals are a key component in the global film industry in terms of trendsetting, publicity, trade, and collaboration. We present an unprecedented analysis of the international film festival circuit, which has so far remained relatively understudied quantitatively, partly due to the limited availability of suitable data sets. We use large-scale data from the Cinando platform of the Cannes Film Market, widely used by industry professionals. We explicitly model festival events as a global network connected by shared films and quantify festivals as aggregates of the metadata of their showcased films. Importantly, we argue against using simple count distributions for discrete labels such as language or production country, as such categories are typically not equidistant. Rather, we propose embedding them in continuous latent vector spaces. We demonstrate how these “festival embeddings” provide insight into changes in programmed content over time, predict festival connections, and can be used to measure diversity in film festival programming across various cultural, social, and geographical variables—which all constitute an aspect of public value creation by film festivals. Our results provide a novel mapping of the film festival circuit between 2009–2021 (616 festivals, 31,989 unique films), highlighting festival types that occupy specific niches, diverse series, and those that evolve over time. We also discuss how these quantitative findings fit into media studies and research on public value creation by cultural industries. With festivals occupying a central position in the film industry, investigations into the data they generate hold opportunities for researchers to better understand industry dynamics and cultural impact, and for organizers, policymakers, and industry actors to make more informed, data-driven decisions. We hope our proposed methodological approach to festival data paves way for more comprehensive film festival studies and large-scale quantitative cultural event analytics in general.

Twitter/x	https://twitter.com/AndresKarjus
Academic website	https://andreskarjus.github.io/
R & AI workshops website	https://datafigure.eu/
Lab	https://cudan.tlu.ee/