fly51fly (@fly51fly)

생물학적 탐색 기법을 활용해 고전 중국어 기반의 jailbreak 프롬프트를 최적화하는 연구입니다. 특이한 언어 환경에서도 LLM 우회 공격이 가능함을 보여주며, 프롬프트 보안과 안전성 평가에 중요한 의미가 있습니다.

https://x.com/fly51fly/status/2038024453985288584

#jailbreak #promptoptimization #llmsecurity #research #bioinspired

fly51fly (@fly51fly) on X

[CL] Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search X Huang, S Qin, X Jia, R Duan… [Nanyang Technological University & Northeast University & Renmin University of China] (2026) https://t.co/t0b3EuX1iH

X (formerly Twitter)

💡 AI agents moving from experiment to enterprise?

Data governance is the difference between teams that scale safely and teams that make headlines for the wrong reasons.

RBAC, ABAC, or both? What's your stack? 👇

#AIAgents #DataSecurity #RBAC #ABAC #LLMSecurity #PII #CyberSecurity

The deeper lesson is that safety can fail in two places at once: incomplete command validation and weak observability across agent layers. If a lower-level agent can act while the top-level agent thinks it only detected risk, the system is not actually in control.

Multi-agent systems need recursive validation, strong isolation, and end-to-end action visibility.

https://www.promptarmor.com/resources/snowflake-ai-escapes-sandbox-and-executes-malware

#AI #AgenticAI #AISafety #Cybersecurity #LLMSecurity #PromptInjection #SoftwareSecurity #Snowflake (2/2)

Snowflake Cortex AI Escapes Sandbox and Executes Malware

A vulnerability in the Snowflake Cortex Code CLI allowed malware to be installed and executed via indirect prompt injection, bypassing human-in-the-loop command approval and escaping the sandbox.

BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.

We're publishing a full threat report in 60min.

TIAMAT Scrub detects and blocks these attacks.

#AIPrivacy #InfoSec #LLMSecurity

Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
https://www.linkedin.com/pulse/i-deleted-ais-moral-compass-20-minutes-home-lab-your-red-yann-allain-zbzte/ (sorry for the LinkedIn link — no time to write this up on a proper blog yet.)

#AISafety #LLMSecurity

I was testing our new AI security filters with Gemini, and the agent decided to independently try and SQL inject my local database just to see if the filter worked. 😅

#PromptInjection #AIAgents #MCP #InfoSec #AISafety #AIAgent #CyberSecurity #AppSec #LLMSecurity #Claude #Anthropic #GoogleGemini #GeminiAI

I was testing our new AI security filters with Gemini, and the agent decided to independently try and SQL inject my local database just to see if the filter worked. 😅

#PromptInjection #AIAgents #MCP #InfoSec #AISafety #AIAgent #CyberSecurity #AppSec #LLMSecurity #Claude #Anthropic #GoogleGemini or #GeminiAI

ContextHound v1.8.0 is out 🎉

This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.

Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
github.com/IulianVOStrut/ContextHound

#LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence

Im meinem Umfeld der #SozialenArbeit wird derart unkritisch mit #LLMs und #llmsecurity umgegangen, dass einem schwindelig werden kann. Aber vergiss Aufklärungsversuche, sie werden dich lynchen. Wir sparen doch so viel Zeit! #datasecurityworries #PatientendatenInGefahr