Sayash Kapoor

@sayashk
684 Followers
453 Following
40 Posts

I am a CS Ph.D. candidate at Princeton University. I investigate technical promises made by overzealous proponents and often find that they fall short.

Currently looking at machine learning failures: hype, overoptimism, and reproducibility, and writing a book on AI Snake Oil.

Websitehttps://www.cs.princeton.edu/~sayashk/
AI Snake Oil bookhttps://aisnakeoil.substack.com

I'm ecstatic to share that preorders are now open for the AI Snake Oil book! The book will be released on September 24, 2024.

@randomwalker and I have been working on this for the past two years, and we can't wait to share it with the world.

Preorder: https://princeton.press/gpl5al2h

If you have been anywhere near AI discourse in the last few months, you might have heard that AI poses an existential threat to humanity.

In today's Wall Street Journal, @randomwalker and I argue that claims of x-risk rest on a tower of fallacies: https://www.wsj.com/tech/ai/ai-risk-humanity-experts-thoughts-4b271757

No paywall link: https://archive.is/Jv60s

Does AI Pose an Existential Risk to Humanity? Two Sides Square Off

Yes, say some: The threat of mass destruction from rogue AIs—or bad actors using AI—is real. Others say: Not so fast

WSJ

Honored to be on this list.

When @randomwalker and I started our AI snake oil newsletter a year ago, we weren't sure if anyone would read it. Thank you to the 13,000 of you who read our scholarship and analysis on AI week after week.

OpenAI mitigates ChatGPT’s biases using fine tuning and reinforcement learning. These methods affect only the model’s output, not its implicit biases (the stereotyped correlations that it's learned). Since implicit biases can manifest in countless ways, OpenAI is left playing whack-a-mole, reacting to examples posted on social media.
People have been posting glaring examples of ChatGPT’s gender bias, like arguing that attorneys can't be pregnant. So @sayashk and I tested ChatGPT on WinoBias, a standard gender bias benchmark. Both GPT-3.5 and GPT-4 are about 3 times as likely to answer incorrectly if the correct answer defies gender stereotypes — despite the benchmark dataset likely being included in the training data. https://aisnakeoil.substack.com/p/quantifying-chatgpts-gender-bias
Quantifying ChatGPT’s gender bias

Benchmarks allow us to dig deeper into what causes biases and what can be done about it

AI Snake Oil
The AI moratorium letter only fuels AI hype. It repeatedly presents speculative, futuristic risks, ignoring the version of the problems that are already harming people. It distracts from the real issues and makes it harder to address them. The letter has a containment mindset analogous to nuclear risk, but that’s a poor fit for AI. It plays right into the hands of the companies it seeks to regulate. By @sayashk and me. https://aisnakeoil.substack.com/p/a-misleading-open-letter-about-sci
A misleading open letter about sci-fi AI dangers ignores the real risks

Misinformation, labor impact, and safety are all risks. But not in the way the letter implies.

AI Snake Oil

Language models have become privately controlled research infrastructure. This week, OpenAI deprecated the Codex model that ~100 papers have used—with 3 days’ notice. It has said that newer models will only be stable for 3 months. Goodbye reproducibility!

It'll be interesting to see how developers are going to use these models in production if things are going to break every couple of months.

By @sayashk and me on the AI Snake Oil book blog: https://aisnakeoil.substack.com/p/openais-policies-hinder-reproducible

OpenAI’s policies hinder reproducible research on language models

LLMs have become privately-controlled research infrastructure

AI Snake Oil
I used ChatGPT to help me write a peer review. It didn't help at all. There is a big difference between being really cool and being a useful tool. This experience provides some lessons for how we should evaluate LLMs and other new AI. https://freedom-to-tinker.com/2023/03/08/can-chatgpt-and-its-successors-go-from-cool-to-tool/
Can ChatGPT—and its successors—go from cool to tool?

Anyone reading Freedom to Tinker has seen examples of ChatGPT doing cool things.  One of my favorites is its amazing answer to this prompt: “write a

Freedom to Tinker

Benchmarking in AI is fraught with issues, including disconnects between a dataset's construction and the tasks it's meant to represent, and poor representations of real-world constructs & use cases. Join the discussion on March 13th, 9:00am PT, with
@nima @Leif @borhane!

Learn more at http://hf.co/ethics

Ethics & Society at Hugging Face - a Hugging Face Space by society-ethics

Discover amazing ML apps made by the community

Anthropomorphizing AI is dangerous: it causes emotional harms and it can derail policy debates. AI developers and journalists need to stop enabling this tendency, and we need research on how people interact with chatbots to create better guardrails. We also come up with a more nuanced message than “don’t anthropomorphize AI”. Perhaps the term anthropomorphize is so broad and vague that it has lost its usefulness when it comes to generative AI. https://aisnakeoil.substack.com/p/people-keep-anthropomorphizing-ai By @sayashk and me.
People keep anthropomorphizing AI. Here’s why

Companies and journalists both contribute to the confusion

AI Snake Oil