The Hacking Policy Council released a white paper calling for clarity and legal protections for AI red teaming. https://lnkd.in/e4UMv9AM
While organizations may be familiar with red teaming to test software for security, “AI red teaming” has a broader scope by testing AI systems for flaws and vulnerabilities that include security, bias, discrimination, and other harmful or undesirable outputs - as demonstrated by the recent Biden Administration Executive Order on trust in AI: https://lnkd.in/et7v-yCB
AI red teaming, when performed in good faith, aims to identify and disclose misalignment in AI systems so it can be corrected and thereby help ensure trustworthiness of the system. To encourage information sharing of AI misalignment and enable independent AI red teaming, the Hacking Policy Council recommends:
1) Develop consistent alignment goals for AI red teaming. Governments should work with the private sector to develop consistent goals for AI alignment in the context of AI red teaming. This will enable AI red teaming to test for dissonance with those goals.
2) Protect information sharing for AI alignment purposes. Governments should ensure legal frameworks to facilitate security information sharing are adapted to encourage and protect information sharing for harmful, discriminatory, or undesirable outputs in AI systems.
3) Prepare to receive misalignment disclosures. Organizations should prepare to accept disclosures from independent AI red teamers. This may require adaptations to security vulnerability disclosure programs and handling processes to accommodate disclosures for harmful, discriminatory, or undesirable outputs in AI systems.
4) Clarify legal protections for independent AI red teaming. Governments should ensure legal protections for independent security research extend to independent #AI red teaming performed in good faith.