https://www.anthropic.com/research/small-samples-poison

Finally! I waited for this paper since #GPT4 was released. 250 documents can backdoor any #LLM. No matter the size. Noone can predict how many #LLMs are already #poisoned. Anyone who uses copilot on their mailbox could aoon start sendind emails without user consent.

#Guiderails cant fix that.

This should also apply for #GenAI videos and images.

Happy #hacking everyone!

A small number of samples can poison LLMs of any size

Anthropic research on data-poisoning attacks in large language models