Jonathan Stray

@jonathanstray
760 Followers
139 Following
139 Posts
Working on better personalized news and information at
@CHAI_Berkeley. Previously computational journalism @columbiajourn. Editor, @better_conflict. More at jonathanstray.com.
Contrarian take: I don't think generative models are "plagiarism" any more than an artist who is influenced by other artists is (setting aside e.g. direct quotations). That doesn't mean there shouldn't be some compensation for use of work as training data -- but no one should expect to make a living off that, either. Spotify has already taught us how cheap original music is when you buy in bulk.

Read this?

https://nymag.com/intelligencer/article/ai-artificial-intelligence-chatbots-emily-m-bender.html

It’s a great piece of writing. I agree with a lot of it. I think where I diverge is that in many situations whether we think there is mind behind the bot doesn't matter.

Folks are experimenting with hooking LLMs up to a web browser and ordering taskrabbits, with some success. If Sydney is going babble on about dark fantasies, the problem is not that we might forget who is human, but that it shouldn’t be allowed to direct resources in the real world.

Here's a question that is bound to generate hot takes, but is also an honest and serious policy concern: would GDPR have prevented this? https://www.washingtonpost.com/dc-md-va/2023/03/09/catholics-gay-priests-grindr-data-bishops/
Catholic group spent millions on app data that tracked gay priests

The group used the data to find clerics who used Grindr and other dating and hookup apps and shared their work with bishops, a Post investigation has found.

The Washington Post
Aside from the science we're doing, one of the great lessons here has been how hard it has been to arrange collaboration between external platforms and platform teams. There's a lot of distrust! But I don't see any other way of solving these problems.
3/3

We're asking participating users (consenting, and paid) to answer a weekly survey for three months -- a version of the Online Social Support scale. And use their answers to adjust feed ranking.

If this works, it will demonstrate a method that might be used by any platform.

2/3

If platforms shouldn't optimize just for engagement, what should they optimize for?
We designed an experiment to test selecting Feed content for long term user value, a collaboration between an international group of researchers and Meta.

https://www.wired.com/story/platforms-engagement-research-meta/
1/3

A Unique Experiment That Could Make Social Media Better

Academic researchers weren’t getting anywhere by criticizing Big Tech platforms, so we decided to try collaborating instead.

WIRED

The lesson is this: don't believe any platform "audit" that doesn't carefully simulate what real users actually do.

So far, pretty much every study has used either random clicks or bots programmed to have monomaniacal interest in something bad.

So what will work instead? On-platform experiments. We covered this in our big recommender paper last year. http://arxiv.org/abs/2207.10192
2/2

Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

Recommender systems are the algorithms which select, filter, and personalize content across many of the worlds largest platforms and apps. As such, their positive and negative effects on individuals and on societies have been extensively theorized and studied. Our overarching question is how to ensure that recommender systems enact the values of the individuals and societies that they serve. Addressing this question in a principled fashion requires technical knowledge of recommender design and operation, and also critically depends on insights from diverse fields including social science, ethics, economics, psychology, policy and law. This paper is a multidisciplinary effort to synthesize theory and practice from different perspectives, with the goal of providing a shared language, articulating current design approaches, and identifying open problems. It is not a comprehensive survey of this large space, but a set of highlights identified by our diverse author cohort. We collect a set of values that seem most relevant to recommender systems operating across different domains, then examine them from the perspectives of current industry practice, measurement, product design, and policy approaches. Important open problems include multi-stakeholder processes for defining values and resolving trade-offs, better values-driven measurements, recommender controls that people use, non-behavioral algorithmic feedback, optimization for long-term outcomes, causal inference of recommender effects, academic-industry research collaborations, and interdisciplinary policy-making.

arXiv.org

New research: most previous work showing that YouTube, TikTok etc. recommend increasingly extreme content is junk 😐

These "audits" simulated users clicking randomly. But that overestimates rabbit hole effects, because most users aren't extreme.

It may still be true that recsys choose content that is bad for users and for society. But the testing methods used so far aren't up to the job.

https://arxiv.org/abs/2302.11225

1/2

The Amplification Paradox in Recommender Systems

Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached through other means, e.g., other websites. In this paper, we explain the following apparent paradox: if the recommendation algorithm favors extreme content, why is it not driving its consumption? With a simple agent-based model where users attribute different utilities to items in the recommender system, we show that the collaborative-filtering nature of recommender systems and the nicheness of extreme content can resolve the apparent paradox: although blindly following recommendations would indeed lead users to niche content, users rarely consume niche content when given the option because it is of low utility to them, which can lead the recommender system to deamplify such content. Our results call for a nuanced interpretation of ``algorithmic amplification'' and highlight the importance of modeling the utility of content to users when auditing recommender systems.

arXiv.org

Hear me out: I actually think "pay to get yourself verified" is the right answer to the "who gets a checkmark" problem.

The reason Twitter was selective about bluechecks is the per-user cost of verification. So they had to restrict it, and ended up making arbitrary decisions that pissed a lot of folks off. Charging for verification prevents platforms from having to be gatekeepers, by aligning the cost structure.

Of course they actually need to verify, not just hand out badges for cash.

This tweet claims there were two specific bugs which were restricting distribution on Musk's account.
https://twitter.com/elonmusk/status/1624660886572126209

I had previously guessed that the story about him firing an engineer in a fit of petty narcissism was incorrect. It had a "too good to be true" vibe and there was no mention of a recording of the actual events. https://mastodon.social/@jonathanstray/109838184931064293

We are all more likely to believe stories which make people we dislike look stupid.

Elon Musk (@elonmusk) on X

Long day at Twitter HQ with eng team Two significant problems mostly addressed: 1. Fanout service for Following feed was getting overloaded when I tweeted, resulting in up to 95% of my tweets not getting delivered at all. Following is now pulling from search (aka Earlybird).

X (formerly Twitter)