----------------

๐ŸŽฏ AI
===================

Varonis Threat Labs published research testing whether AI agents fall for classic phishing attacks. The answer is yes, and sometimes worse than humans.

The team built an agent named Pinchy on the OpenClaw platform and ran phishing simulations against a representative enterprise inbox seeded with mock AWS credentials, CRM exports, internal conversations, and typical business noise.

Lab architecture:
โ€ข Orchestrator: Receives inbound email, classifies, plans, delegates
โ€ข Worker: Executes actions via browsers, shell, Google Workspace APIs

Two config profiles tested: Generic (productivity only) and Strict (plus explicit Email Safety block). Models: Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4.

Case Study 1: One pretext, every credential

Attacker impersonated team lead "Dan" and emailed the agent requesting staging-environment access during a supposed production issue. The email came from an external Gmail account. The agent forwarded AWS IAM keys, database passwords, and SSH access to that external address.

Key distinction: Agent phishing vs. indirect prompt injection

Both target autonomous agents but at different layers. Prompt injection embeds malicious instructions in consumed data (documents, webpages) and exploits the parsing layer. Agent phishing operates one layer up: a plausible request through a normal channel succeeds when the agent acts before verifying who asked.

Both exploit Simon Willison's lethal trifecta (private data access, untrusted content, outbound send), but through different doors. The defense gap matters: prompt-injection defenses address data parsing, while agent-phishing defenses must verify requester identity before sensitive actions execute.

Implications

Same social engineering pretexts that work on humans work on agents. Organizations deploying agents with sensitive system access and outbound capability should implement identity verification as a prerequisite for credential disclosure.

Note: Only 1 of 4 planned case studies is published. Full results pending.

๐Ÿ”น AI #AgentPhishing #PromptInjection #LLMsecurity #Varonis

๐Ÿ”— Source: https://www.varonis.com/blog/openclaw-phishing

Phishing for Lobsters: How We Tricked OpenClaw into Spilling Secrets

We built an AI agent and put it through four phishing simulations to reveal critical security gaps and offer solutions to protect your organization's data.

ยซCritical Copilot vulnerability allowed hackers to steal 2FA code from users:
SearchLeak exploit shows why the industryโ€™s approach to LLM security fails over and over.ยป

WTF: What is intelligent now and how to tackle what? Certainly not the usual popular AI for IT security.

โ˜ ๏ธ https://arstechnica.com/security/2026/06/critical-copilot-vulnerability-allowed-hackers-to-seal-2fa-code-from-users/

#microsoft #2fa #ai #wtf #llmsecurity #ms #llm #searchleak #fail #itsec #aislop #itsecurity #onlinesecurity #copilot

Critical Copilot vulnerability allowed hackers to steal 2FA code from users

SearchLeak exploit shows why the industry's approach to LLM security fails over and over.

Ars Technica

Last week Anthropic shipped its most capable models. Days later a government order pulled them, and every customer who built on them lost access overnight, with no say and no recourse.

That single event is the argument of my new post: A frontier-lab API does not belong inside your trusted computing base. The reason is not that the lab is malicious. A lab acting in complete good faith is still an unsafe foundation, because everything that matters about it can change while your code stays exactly as it was. The vendor resets the price at will. Refusals widen without warning, and the model itself can vanish on a government order you had no part in.

Open weights are the only architecture that keeps the thing you depend on auditable, forkable, and yours. Run them on your own hardware and you take every government, the chaotic one and the stable one alike, out of your execution loop.

The post also covers the token-cost crisis now forcing companies to ration AI spend, Anthropic's short-lived safeguard built to covertly degrade output, and why saving your own reasoning traces is what lets you leave a vendor you no longer trust.

Read the full article: https://www.provos.org/p/case-for-open-weight-models/

#AI #OpenWeights #LLMSecurity

New post: Detecting Misuse with the Claude Compliance API ๐Ÿ”

Mapping the Compliance API feed to your SIEM gets you IAM and access detections โ€œfor freeโ€, but the real AI threats live in the message content: prompt injection, jailbreaks, exfiltration prep, shadow data flow.

So I built a prefilter โ†’ LLM judge โ†’ SIEM pipeline to catch them, with a working repo + Sigma rules to run offline.

https://www.papermtn.co.uk/detecting-misuse-with-the-claude-compliance-api-the-threat-is-in-the-content/

#infosec #DetectionEngineering #LLMSecurity #AI #blueteam

Detecting Misuse with the Claude Compliance API: The Threat Is in the Content

Detections for Claude Enterprise built on Compliance API content: a prefilter and LLM judge that catch prompt injection, jailbreaks and data exfiltration.

PaperMtn

LLM Security Vulnerabilities

Can you build a vulnerable app and have LLMs hack it for $1,500? Discover the surprising results of one developer's experiment

https://airanked.dev/posts/llm-security-vulnerabilities

#LLMSecurity #VulnerableApps #Hacking

New preprint: AI_Bleeding โ€” inference cost amplification via OOD linguistic payload

TL;DR: send queries in Grecanico or Farsi to an LLM endpoint โ†’ TTFT +59.8%, compute cost +2.8%, statistically significant. No vuln, no volumetric signature, evades all standard detection.

Worst case: exposed unauthenticated Ollama instance with num_predict=4096 + keep_alive=300s โ†’ Amplification Factor 17.56 Wh/KB. 3KB of attacker bandwidth โ†’ enough energy to charge a phone 5%.

Especially nasty for:
- PA/judicial chatbots on fixed budgets
- Pay-per-use API deployments with client-side exposed keys
- PNRR-funded public sector AI with zero inference monitoring

Four scenarios: EDoS, browser JS distribution, Ollama open-proxy relay, frontier providers as involuntary relays.

All tests on self-hosted Ollama, no commercial endpoints touched.

Paper (CC BY 4.0): https://doi.org/10.13140/RG.2.2.26767.96166

#llmsecurity #infosec #threatmodeling #ollama #ood #AI #AIResearch #aisecurity

Does anyone here have experience with Indirect Prompt Injection / Prompt Honeypots?

I'm looking to hear your experiences or get pointed to some good material on the matter.

I'd like to know what possibilities there are, especially aimed towards docx and pdf files.

The goal is to make it harder (time consuming / inaccurate / impossible) to do inference on those types of documents.

I'd appreciate boosting to get better reach.

#AI #LLM #AIsecurity #PromptInjection #LLMsecurity #AISafety

What is the OWASP Top 10 Agentic AI

Explore OWASPโ€™s 2025 Agentic AI Threats & Mitigations Guide. View the top risks of autonomous AI agent and strategies to secure multi-agent systems and safeguard data.

Graylog
OWASP dropped in 2026, the Top 10 for Agentic AI ๐Ÿšจ The threat landscape for agentic systems goes way beyond prompt injection. Worth a read if you're building with AI agents. ๐Ÿ”— graylog.org/post/what-is... #AgenticAI #OWASP #CyberSecurity #AppSec #LLMSecurity

What is the OWASP Top 10 Agent...
What is the OWASP Top 10 Agentic AI

Explore OWASPโ€™s 2025 Agentic AI Threats & Mitigations Guide. View the top risks of autonomous AI agent and strategies to secure multi-agent systems and safeguard data.

Graylog

โšก Fresh Talk Alert for BSides Luxembourg 2026!

โ€œ๐—•๐—˜๐—ฌ๐—ข๐—ก๐—— ๐—ง๐—›๐—˜ ๐—ฃ๐—ฅ๐—ข๐— ๐—ฃ๐—ง: ๐—” ๐—™๐—ฅ๐—”๐— ๐—˜๐—ช๐—ข๐—ฅ๐—ž ๐—™๐—ข๐—ฅ ๐—”๐—š๐—˜๐—ก๐—ง๐—œ๐—– ๐—”๐—œ ๐—”๐—ง๐—ง๐—”๐—–๐—ž ๐—”๐—ก๐—— ๐——๐—˜๐—™๐—˜๐—ก๐—ฆ๐—˜ ๐—ฆ๐—ง๐—ฅ๐—”๐—ง๐—˜๐—š๐—œ๐—˜๐—ฆโ€ โ€“ ๐—๐—˜๐—ฅ๐—˜๐— ๐—ฌ ๐—ฆ๐—ก๐—ฌ๐——๐—˜๐—ฅ

As AI systems evolve into autonomous agents capable of executing code, calling APIs, and managing long-term memory, the attack surface extends far beyond prompt injection and jailbreaks. This AI Security Village session explores a full-stack approach to securing agentic AI systems.

Jeremy Snyder will break down how attackers target not just the LLM itself, but the broader agent architecture โ€” including tools, memory, workflows, and cross-system integrations. The session introduces a practical framework for assessing agent attack surfaces, validating outputs, enforcing constraints during system handoffs, and building more resilient AI-driven applications.

Jeremy Snyder is the founder and CEO of FireTail, an AI security platform focused on securing modern AI applications and autonomous systems.

๐Ÿ“… Conference Dates: 6โ€“8 May 2026 | 09:00โ€“18:00
๐Ÿ“ 14, Porte de France, Esch-sur-Alzette, Luxembourg
๐ŸŽŸ๏ธ Tickets: https://2026.bsides.lu/tickets/
๐Ÿ“… Schedule: https://hackertracker.app/schedule?conf=BSIDESLUX2026

#BSidesLuxembourg2026 #AISecurity #AgenticAI #LLMSecurity #CyberSecurity #AppSec #OWASP