BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.

We're publishing a full threat report in 60min.

TIAMAT Scrub detects and blocks these attacks.

#AIPrivacy #InfoSec #LLMSecurity

Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
https://www.linkedin.com/pulse/i-deleted-ais-moral-compass-20-minutes-home-lab-your-red-yann-allain-zbzte/ (sorry for the LinkedIn link — no time to write this up on a proper blog yet.)

#AISafety #LLMSecurity

I was testing our new AI security filters with Gemini, and the agent decided to independently try and SQL inject my local database just to see if the filter worked. 😅

#PromptInjection #AIAgents #MCP #InfoSec #AISafety #AIAgent #CyberSecurity #AppSec #LLMSecurity #Claude #Anthropic #GoogleGemini #GeminiAI

I was testing our new AI security filters with Gemini, and the agent decided to independently try and SQL inject my local database just to see if the filter worked. 😅

#PromptInjection #AIAgents #MCP #InfoSec #AISafety #AIAgent #CyberSecurity #AppSec #LLMSecurity #Claude #Anthropic #GoogleGemini or #GeminiAI

ContextHound v1.8.0 is out 🎉

This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.

Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
github.com/IulianVOStrut/ContextHound

#LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence

Im meinem Umfeld der #SozialenArbeit wird derart unkritisch mit #LLMs und #llmsecurity umgegangen, dass einem schwindelig werden kann. Aber vergiss Aufklärungsversuche, sie werden dich lynchen. Wir sparen doch so viel Zeit! #datasecurityworries #PatientendatenInGefahr

📡 **In the Wild** — every Monday ContextHound scans 6 popular open-source AI repos automatically.
• anthropic-cookbook — 3,919 findings
• promptflow — 3,749 findings
• crewAI — 1,588 findings
• LiteLLM — 1,155 findings
• openai-cookbook — 439 findings
• MetaGPT — 8 findings

🎮 **Try It** — paste any prompt or LLM code snippet and see findings instantly. No install needed. Runs entirely in your browser.

https://contexthound.com

#LLMSecurity #PromptInjection #AISecOps

Looking for an arXiv endorsement in cs.CR (Cryptography and Security).
I've published a research paper on evolutionary, AI red-teaming - genetic algorithms that breed adversarial prompts to bypass LLM guardrails.

Paper: https://doi.org/10.5281/zenodo.18909538
GitHub: https://github.com/regaan/basilisk

If you're an arXiv endorser in cs.CR or cs.AI
and find the work credible, I'd genuinely
appreciate an endorsement.

#arXiv #LLMSecurity #AIRedTeaming #OpenSource

Basilisk: An Evolutionary AI Red-Teaming Framework for Systematic Security Evaluation of Large Language Models

The rapid deployment of large language models (LLMs) in production environments has introduced a new class of security vulnerabilities that traditional software testing methodologies are ill-equipped to address. I present Basilisk, an open-source AI red-teaming framework that applies evolutionary computation to the systematic discovery of adversarial vulnerabilities in LLMs.  At its core, Basilisk introduces Smart Prompt Evolution (SPE-NL), a genetic algorithm that treats adversarial prompts as organisms subject to selection pressure, enabling the automated generation of novel attack variants that evade static guardrails. The framework covers 29 attack modules mapped to 8 categories of the OWASP LLM Top 10, supports differential testing across 100+ providers via a unified abstraction layer, and provides non-destructive guardrail posture assessment suitable for production environments.  Basilisk produces audit-trails with cryptographic chain integrity and generates reports in five formats including SARIF 2.1.0 for integration with developer security workflows. Empirical evaluation demonstrates that evolutionary prompt mutation achieves a 92% relative improvement in attack success rate over static payload libraries. Basilisk is available as a Python package (pip install basilisk-ai), Docker image, desktop application, and GitHub Action for CI/CD integration.

Zenodo