@mmitchell_ai

7.4K Followers
138 Following
784 Posts
Interdisciplinary researcher focused on shaping AI towards long-term positive goals. Focusing on ML and ethics.
Opinions mine. =)
Here's me wearing jewelry she made, New Years Eve 2016. =)
Yesterday my much-loved aunt Maggie passed away. She was a force of nature, brilliant and thoughtful. A Hollywood metal artist who defined an aesthetic you're probably familiar with. Sharing a little video about her work on Star Trek & beyond.
https://www.youtube.com/watch?v=Gqqk7B44oAg
Maggie Schpak: STAR TREK Metalwork Artist

YouTube

I was invited to guest host a MAIHT3000!
Tune in noon PDT, May 20th, for a discussion with me and
@emilymbender
on facts & fantasy underlying the idea of "AI deception".
http://twitch.tv/dair_institute

https://dair-community.social/@emilymbender/112452295980744218

dair_institute - Twitch

Twitch account for The Distributed AI Research Institute (DAIR).

Twitch

On the next Mystery AI Hype Theater 3000, guest host @mmitchell_ai and I will look into claims that "AI has learned to deceive humans". Join us live at noon Pacific, May 20, 2024:

twitch.tv/dair_institute

I'm officially holiday-cheered!
Hooray for @evangreer and Joy for putting this together!!
https://www.instagram.com/reel/C1JAyMCskZX/?igsh=czh4YjVjY3hmMnY4
Sorry to everyone hurt by Israel/Palestine attacks. Everyone who has lost people they love, cared for. All the people further hurt by rhetoric online. Death and 280-char tweets don't mix well; the realities here are too horrifying.
OH MY GOD. Word on the street is that there's been a 23andMe hack targeting Ashkenazi Jews?! Journalists, can anyone confirm this?!?!
Another paper on model unlearning (YAY), although they missed the opportunity for the title to have OBLIVIATE ✨ ! (H/T
Joshua Lochner)
https://browse.arxiv.org/abs/2310.02238
Who's Harry Potter? Approximate Unlearning in LLMs

Large language models (LLMs) are trained on massive internet corpora that often contain copyrighted content. This poses legal and ethical challenges for the developers and users of these models, as well as the original authors and publishers. In this paper, we propose a novel technique for unlearning a subset of the training data from a LLM, without having to retrain it from scratch. We evaluate our technique on the task of unlearning the Harry Potter books from the Llama2-7b model (a generative language model recently open-sourced by Meta). While the model took over 184K GPU-hours to pretrain, we show that in about 1 GPU hour of finetuning, we effectively erase the model's ability to generate or recall Harry Potter-related content, while its performance on common benchmarks (such as Winogrande, Hellaswag, arc, boolq and piqa) remains almost unaffected. We make our fine-tuned model publicly available on HuggingFace for community evaluation. To the best of our knowledge, this is the first paper to present an effective technique for unlearning in generative language models. Our technique consists of three main components: First, we use a reinforced model that is further trained on the target data to identify the tokens that are most related to the unlearning target, by comparing its logits with those of a baseline model. Second, we replace idiosyncratic expressions in the target data with generic counterparts, and leverage the model's own predictions to generate alternative labels for every token. These labels aim to approximate the next-token predictions of a model that has not been trained on the target data. Third, we finetune the model on these alternative labels, which effectively erases the original text from the model's memory whenever it is prompted with its context.

arXiv.org
2b: And it does it in the sophisticated "C2PA" compliant way, which roughly means that it follows well-established rules on the right way to do this based on years of work. (4/4)
2a: Some of you know about how important METADATA is in ethical AI work, to be shared alongside content, disclosing what the content is -- for example, if it's AI generated -- etc. THIS DOES THAT. (3/n)