Mastodawn

Arvind Narayanan Nov 19, 2024

Ted Nov 12, 2024

This piece about the UK liver transplant matching algorithm, by @randomwalker and @sayashk, is well worth a read. Excellent example that algorithms that make real-world decisions aren't inherently problematic, but they can certainly be horrifyingly bad if not designed (and tested, and transparently published, and audited) carefully.
https://www.aisnakeoil.com/p/does-the-uks-liver-transplant-matching

Does the UK’s liver transplant matching algorithm systematically exclude younger patients?

Seemingly minor technical decisions can have life-or-death effects

AI Snake Oil

Arvind Narayanan Sep 26, 2024

Ulrike Hahn Sep 20, 2024

interesting new paper on the feasibility of using LLM agents for computational reproducibility checks by Narayanan and colleagues @randomwalker

https://arxiv.org/abs/2409.11363

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.

arXiv.org

Arvind Narayanan May 14, 2024

Kirkus Reviews, which provides early book reviews to the publishing industry, has given AI Snake Oil a very positive "starred" review, which we're told is rare and kind of a big deal. Honored and grateful! https://kirkusreviews.com/book-reviews/arvind-narayanan/ai-snake-oil/
Preorder:
https://www.amazon.com/Snake-Oil-Artificial-Intelligence-Difference/dp/069124913X
https://bookshop.org/p/books/ai-snake-oil-what-artificial-intelligence-can-do-what-it-can-t-and-how-to-tell-the-difference-arvind-narayanan/21324674
More preorder links at the bottom of this post
https://www.aisnakeoil.com/p/ai-snake-oil-is-now-available-to
Coauthored by @sayashk, published by @princetonupress.

AI SNAKE OIL | Kirkus Reviews

Two academics in the burgeoning field of AI survey the landscape and present an accessible state-of-the-union report.

Kirkus Reviews

Arvind Narayanan Apr 11, 2024

Sayash Kapoor Apr 10, 2024

I'm ecstatic to share that preorders are now open for the AI Snake Oil book! The book will be released on September 24, 2024.

@randomwalker and I have been working on this for the past two years, and we can't wait to share it with the world.

Preorder: https://princeton.press/gpl5al2h

Arvind Narayanan Jan 25, 2024

Frederik Borgesius Jan 24, 2024

‘Will AI transform law? The hype is not supported by current evidence’, write @randomwalker & @sayashk https://www.aisnakeoil.com/p/will-ai-transform-law They also published a scholarly paper on the topic with Peter Henderson: https://www.cs.princeton.edu/~sayashk/papers/crcl-kapoor-henderson-narayanan.pdf #law #ai #tech #chatgpt

Will AI transform law?

The hype is not supported by current evidence

AI Snake Oil

Arvind Narayanan Oct 27, 2023

Show thread

Knight First Amendment Inst.Oct 26, 2023

Most online speech is hosted on algorithmic platforms designed to optimize for engagement. But algorithms are not neutral. Read other essays in our "Algorithmic Amplification & Society" project series, in collaboration with @randomwalker. Learn more here:
https://knightcolumbia.org/research/algorithmic-amplification-and-society

Arvind Narayanan Aug 24, 2023

Knight First Amendment Inst.Aug 23, 2023

Excited to share that we’ve started publishing the essays from “Optimizing for What? Algorithmic Amplification and Society,” our spring symposium organized with @randomwalker. Here’s a brief intro. by
@kgb. Links to the first two essays follow.
https://knightcolumbia.org/blog/exploring-algorithmic-amplification-a-new-essay-series

Exploring Algorithmic Amplification: A New Essay Series

Arvind Narayanan Aug 21, 2023

The "ChatGPT has a liberal bias" paper has at least 4 *independently* fatal flaws:
– Tested an older model, not ChatGPT.
– Used a trick prompt to bypass the fact that it actually refuses to opine on political q's.
– Order effect: flipping q's in the prompt changes bias from Democratic to Republican.
– The prompt is very long and seems to make the model simply forget what it's supposed to do.
By @sayashk and me, summarizing our analysis and a separate one by Colin Fraser. https://www.aisnakeoil.com/p/does-chatgpt-have-a-liberal-bias

Does ChatGPT have a liberal bias?

A new paper making this claim has many flaws. But the question merits research

AI Snake Oil

Arvind Narayanan Jul 6, 2023

Alex Stamos Jul 6, 2023

The amount of misinformation on Mastodon around Threads and the EU is a great demonstration of how motivated reasoning is not a problem only for commercial social media platforms.

Arvind Narayanan Apr 29, 2023

Mike Masnick ✅Apr 28, 2023

It's been six months since Elon took over Twitter. I have some thoughts on the "Twitter diaspora" and the current decentralized alternatives: https://www.techdirt.com/2023/04/28/six-months-in-thoughts-on-the-current-post-twitter-diaspora-options/

Six Months In: Thoughts On The Current Post-Twitter Diaspora Options

Today is six months since Elon took over Twitter and began this bizarre speedrun of the content moderation learning curve in which he seems to repeatedly… not learn a damn thing. Over and over agai…

Techdirt

Website	https://www.cs.princeton.edu/~arvindn/
Substack: AI Snake Oil	https://aisnakeoil.com/
Book: Fairness and machine learning	https://fairmlbook.org/