💡 AI agents moving from experiment to enterprise?

Data governance is the difference between teams that scale safely and teams that make headlines for the wrong reasons.

RBAC, ABAC, or both? What's your stack? 👇

#AIAgents #DataSecurity #RBAC #ABAC #LLMSecurity #PII #CyberSecurity

The deeper lesson is that safety can fail in two places at once: incomplete command validation and weak observability across agent layers. If a lower-level agent can act while the top-level agent thinks it only detected risk, the system is not actually in control.

Multi-agent systems need recursive validation, strong isolation, and end-to-end action visibility.

https://www.promptarmor.com/resources/snowflake-ai-escapes-sandbox-and-executes-malware

#AI #AgenticAI #AISafety #Cybersecurity #LLMSecurity #PromptInjection #SoftwareSecurity #Snowflake (2/2)

Snowflake Cortex AI Escapes Sandbox and Executes Malware

A vulnerability in the Snowflake Cortex Code CLI allowed malware to be installed and executed via indirect prompt injection, bypassing human-in-the-loop command approval and escaping the sandbox.

Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
https://www.linkedin.com/pulse/i-deleted-ais-moral-compass-20-minutes-home-lab-your-red-yann-allain-zbzte/ (sorry for the LinkedIn link — no time to write this up on a proper blog yet.)

#AISafety #LLMSecurity

I was testing our new AI security filters with Gemini, and the agent decided to independently try and SQL inject my local database just to see if the filter worked. 😅

#PromptInjection #AIAgents #MCP #InfoSec #AISafety #AIAgent #CyberSecurity #AppSec #LLMSecurity #Claude #Anthropic #GoogleGemini #GeminiAI

I was testing our new AI security filters with Gemini, and the agent decided to independently try and SQL inject my local database just to see if the filter worked. 😅

#PromptInjection #AIAgents #MCP #InfoSec #AISafety #AIAgent #CyberSecurity #AppSec #LLMSecurity #Claude #Anthropic #GoogleGemini or #GeminiAI

ContextHound v1.8.0 is out 🎉

This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.

Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
github.com/IulianVOStrut/ContextHound

#LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence

Im meinem Umfeld der #SozialenArbeit wird derart unkritisch mit #LLMs und #llmsecurity umgegangen, dass einem schwindelig werden kann. Aber vergiss Aufklärungsversuche, sie werden dich lynchen. Wir sparen doch so viel Zeit! #datasecurityworries #PatientendatenInGefahr

📡 **In the Wild** — every Monday ContextHound scans 6 popular open-source AI repos automatically.
• anthropic-cookbook — 3,919 findings
• promptflow — 3,749 findings
• crewAI — 1,588 findings
• LiteLLM — 1,155 findings
• openai-cookbook — 439 findings
• MetaGPT — 8 findings

🎮 **Try It** — paste any prompt or LLM code snippet and see findings instantly. No install needed. Runs entirely in your browser.

https://contexthound.com

#LLMSecurity #PromptInjection #AISecOps