Arvind Narayanan

@randomwalker
10K Followers
96 Following
162 Posts

I'm a computer science professor at Princeton. I write about AI hype & harms, tech platforms, algorithmic bias, and the surveillance economy.

I've been studying decentralized social media since the late 2000s, so I'm excited to use and write about Mastodon at the same time.

Check out this symposium on algorithmic amplification that I'm co-organizing: https://knightcolumbia.org/events/optimizing-for-what-algorithmic-amplification-and-society

Websitehttps://www.cs.princeton.edu/~arvindn/
Substack: AI Snake Oilhttps://aisnakeoil.com/
Book: Fairness and machine learninghttps://fairmlbook.org/
This piece about the UK liver transplant matching algorithm, by @randomwalker and @sayashk, is well worth a read. Excellent example that algorithms that make real-world decisions aren't inherently problematic, but they can certainly be horrifyingly bad if not designed (and tested, and transparently published, and audited) carefully.
https://www.aisnakeoil.com/p/does-the-uks-liver-transplant-matching
Does the UK’s liver transplant matching algorithm systematically exclude younger patients?

Seemingly minor technical decisions can have life-or-death effects

AI Snake Oil

interesting new paper on the feasibility of using LLM agents for computational reproducibility checks by Narayanan and colleagues @randomwalker

https://arxiv.org/abs/2409.11363

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.

arXiv.org
Kirkus Reviews, which provides early book reviews to the publishing industry, has given AI Snake Oil a very positive "starred" review, which we're told is rare and kind of a big deal. Honored and grateful! https://kirkusreviews.com/book-reviews/arvind-narayanan/ai-snake-oil/
Preorder:
https://www.amazon.com/Snake-Oil-Artificial-Intelligence-Difference/dp/069124913X
https://bookshop.org/p/books/ai-snake-oil-what-artificial-intelligence-can-do-what-it-can-t-and-how-to-tell-the-difference-arvind-narayanan/21324674
More preorder links at the bottom of this post
https://www.aisnakeoil.com/p/ai-snake-oil-is-now-available-to
Coauthored by @sayashk, published by @princetonupress.
AI SNAKE OIL | Kirkus Reviews

Two academics in the burgeoning field of AI survey the landscape and present an accessible state-of-the-union report.

Kirkus Reviews

I'm ecstatic to share that preorders are now open for the AI Snake Oil book! The book will be released on September 24, 2024.

@randomwalker and I have been working on this for the past two years, and we can't wait to share it with the world.

Preorder: https://princeton.press/gpl5al2h

‘Will AI transform law? The hype is not supported by current evidence’, write @randomwalker & @sayashk https://www.aisnakeoil.com/p/will-ai-transform-law They also published a scholarly paper on the topic with Peter Henderson: https://www.cs.princeton.edu/~sayashk/papers/crcl-kapoor-henderson-narayanan.pdf #law #ai #tech #chatgpt
Will AI transform law?

The hype is not supported by current evidence

AI Snake Oil
Most online speech is hosted on algorithmic platforms designed to optimize for engagement. But algorithms are not neutral. Read other essays in our "Algorithmic Amplification & Society" project series, in collaboration with @randomwalker. Learn more here:
https://knightcolumbia.org/research/algorithmic-amplification-and-society
Excited to share that we’ve started publishing the essays from “Optimizing for What? Algorithmic Amplification and Society,” our spring symposium organized with @randomwalker. Here’s a brief intro. by
@kgb. Links to the first two essays follow.
https://knightcolumbia.org/blog/exploring-algorithmic-amplification-a-new-essay-series
Exploring Algorithmic Amplification: A New Essay Series

The "ChatGPT has a liberal bias" paper has at least 4 *independently* fatal flaws:
– Tested an older model, not ChatGPT.
– Used a trick prompt to bypass the fact that it actually refuses to opine on political q's.
– Order effect: flipping q's in the prompt changes bias from Democratic to Republican.
– The prompt is very long and seems to make the model simply forget what it's supposed to do.
By @sayashk and me, summarizing our analysis and a separate one by Colin Fraser. https://www.aisnakeoil.com/p/does-chatgpt-have-a-liberal-bias
Does ChatGPT have a liberal bias?

A new paper making this claim has many flaws. But the question merits research

AI Snake Oil
The amount of misinformation on Mastodon around Threads and the EU is a great demonstration of how motivated reasoning is not a problem only for commercial social media platforms.
It's been six months since Elon took over Twitter. I have some thoughts on the "Twitter diaspora" and the current decentralized alternatives: https://www.techdirt.com/2023/04/28/six-months-in-thoughts-on-the-current-post-twitter-diaspora-options/
Six Months In: Thoughts On The Current Post-Twitter Diaspora Options

Today is six months since Elon took over Twitter and began this bizarre speedrun of the content moderation learning curve in which he seems to repeatedly… not learn a damn thing. Over and over agai…

Techdirt