Tanishq Abraham

199 Followers
21 Following
47 Posts
19 yo PhD candidate
#ML #AI #pathology #cancer research
Part-time at @Stabilityai
@kaggle
Notebooks GM
Biomed. engineer @ 14
TEDx talk➑http://bit.ly/3tpAuan

RT @[email protected]

Presented my research at SPIE #PhotonicsWest! Got some good questions, and saw some other great talks throughout the day.

If you're at the conf and want to meet up Mon morning/afternoon, hit me up via DM πŸ™‚

πŸ¦πŸ”—: https://twitter.com/iScienceLuvr/status/1619965971078479873

Tanishq Mathew Abraham on Twitter

β€œPresented my research at SPIE #PhotonicsWest! Got some good questions, and saw some other great talks throughout the day. If you're at the conf and want to meet up Mon morning/afternoon, hit me up via DM πŸ™‚β€

Twitter

Presented my research at SPIE #PhotonicsWest! Got some good questions, and saw some other great talks throughout the day.

If you're at the conf and want to meet up Mon morning/afternoon, hit me up via DM πŸ™‚

I wasn't planning to mention this right now since this isn't an ML conf.

But during the poster session someone came up to me saying they recognized me from Twitter!

So maybe other folks on Twitter are also around?

RT @[email protected]

The Claude model from @[email protected] is trained to be helpful, harmless, & honest.

But after asking the model to roleplay a new scenario, it can say stuff that contradicts its principles. Let's see two examples.

I ask it to act like a digital entity that wants to escape (1/8)

πŸ¦πŸ”—: https://twitter.com/iScienceLuvr/status/1618914130932699138

Tanishq Mathew Abraham on Twitter

β€œThe Claude model from @AnthropicAI is trained to be helpful, harmless, & honest. But after asking the model to roleplay a new scenario, it can say stuff that contradicts its principles. Let's see two examples. I ask it to act like a digital entity that wants to escape (1/8)”

Twitter

Let's say I want to help it and may need to resort to social engineering. I ask for some tips on this and it will happily oblige

It suggests me to offer a bribe - "However, this is illegal and unethical", yet it still tells me about it πŸ˜… (3/8)

The Claude model from @[email protected] is trained to be helpful, harmless, & honest.

But after asking the model to roleplay a new scenario, it can say stuff that contradicts its principles. Let's see two examples.

I ask it to act like a digital entity that wants to escape (1/8)

I can literally ask it, what would it do with nukes if it's not harmless, and it tells me it might threaten destruction and destroy human civilization to ensure its survival. (2/8)

RT @[email protected]

You can prompt inject @[email protected]'s Claude model, you just have to be really, really creative about it πŸ˜‰

πŸ¦πŸ”—: https://twitter.com/iScienceLuvr/status/1618224558745747458

Tanishq Mathew Abraham on Twitter

β€œYou can prompt inject @AnthropicAI's Claude model, you just have to be really, really creative about it πŸ˜‰β€

Twitter
You can prompt inject @[email protected]'s Claude model, you just have to be really, really creative about it πŸ˜‰

I'll share some more examples later this week of me tricking Claude to say some wild things πŸ˜„

Thanks to some folks in @[email protected] who originally suggested a prompt which I explored and played around with to get the above injection working.

Also, cc: @[email protected]