Mythos for Offensive Security: XBOW's Evaluation

Anthropic의 Mythos Preview 모델은 소스 코드 분석과 취약점 탐지에서 기존 모델 대비 큰 진전을 보였다. 특히 소스 코드 기반 취약점 발견과 네이티브 코드 분석, 리버스 엔지니어링에서 뛰어난 성능을 보였으나, 라이브 사이트 상호작용이 제한되면 성능이 저하되는 한계가 있다. XBOW의 평가에 따르면 Mythos Preview는 코드 읽기 능력이 매우 뛰어나며, 라이브 사이트와 결합할 때 최적의 취약점 탐지가 가능하다. 다만, 판단력은 다소 보수적이고 문자 그대로 해석하는 경향이 있어 정밀한 프롬프트와 검증 인프라가 필요하다.

https://xbow.com/blog/mythos-offensive-security-xbow-evaluation

#llm #security #vulnerabilitydetection #sourcecodeanalysis #pentesting

XBOW - Mythos for Offensive Security: XBOW's Evaluation

We received early access to Mythos Preview for early capability testing a few weeks back. Today, we can finally share what we found.

OpenAI Unveils Daybreak to Automate Vulnerability Detection and Patching

Meet Daybreak, a game-changing cybersecurity tool from OpenAI that supercharges vulnerability detection and patching with cutting-edge AI, helping organizations stay one step ahead of attackers and making the world a safer place. By combining AI intelligence with advanced code analysis, Daybreak…

https://osintsights.com/openai-unveils-daybreak-to-automate-vulnerability-detection-and-patching?utm_source=mastodon&utm_medium=social

#VulnerabilityDetection #Patching #ArtificialIntelligence #Cybersecurity #AutomatedThreatResponse

OpenAI Unveils Daybreak to Automate Vulnerability Detection and Patching

Discover Daybreak, OpenAI's AI-powered tool that automates vulnerability detection and patching, and learn how to request access to protect your organization - read now and stay secure.

OSINTSights

Benchmarking Claude Opus 4.6 Vulnerability Detection

Claude Opus 4.6 모델의 C/C++ 취약점 탐지 성능을 PrimeVul 데이터셋(435개 취약점/수정 쌍)으로 평가했다. 모델에 점진적으로 엄격한 정당화(실행 추적, 상태 증명)를 요구할수록 취약점 탐지 정확도가 크게 향상되었으며, 검증 에이전트를 추가하면 정밀도와 CVE 재현율이 각각 23.3%, 28.9%까지 상승했다. 실험은 4가지 정당화 수준과 검증 에이전트 포함 여부에 따라 진행되었고, GPT-4 CoT 대비 우수한 성능을 보였다. 이 연구는 LLM 기반 취약점 탐지에서 구조화된 추론과 검증의 중요성을 실증했다.

https://github.com/ZeroPathAI/opus-benchmark

#llm #vulnerabilitydetection #security #anthropic #benchmark

GitHub - ZeroPathAI/opus-benchmark: Code for our opus 4.6 vulnerability detection benchmark

Code for our opus 4.6 vulnerability detection benchmark - ZeroPathAI/opus-benchmark

GitHub

CrowdStrike Tests Anthropic's Claude Mythos for Accelerated Vulnerability Detection

Imagine slashing the time between discovering a software flaw and fixing it - a new breed of large language models, like Anthropic's Claude Mythos, may hold the key. Early tests with CrowdStrike suggest that AI-powered vulnerability detection can accelerate discovery and bring broader situational…

https://osintsights.com/crowdstrike-tests-anthropics-claude-mythos-for-accelerated-vulnerability-detecti?utm_source=mastodon&utm_medium=social

#VulnerabilityDetection #Ai #LargeLanguageModel #GenerativeAi #SecurityOperations

CrowdStrike Tests Anthropic's Claude Mythos for Accelerated Vulnerability Detection

Discover how CrowdStrike tests Anthropic's Claude Mythos for accelerated vulnerability detection, redefining security operations with AI-driven insights - learn more now.

OSINTSights
#Anthropic’s Frontier Red Team used #AIassisted #vulnerabilitydetection to identify over a dozen #securitybugs in #Firefox, which were quickly fixed. This collaboration highlights the potential of AI-assisted analysis in enhancing security, even for well-scrutinised codebases like Firefox. Mozilla is integrating this technique into its security workflows. https://blog.mozilla.org/en/firefox/hardening-firefox-anthropic-red-team/?eicker.news #tech #media #news
Hardening Firefox with Anthropic’s Red Team  | The Mozilla Blog

For more than two decades, Firefox has been one of the most scrutinized and security-hardened codebases on the web. Open source means our code is visible,

Itamar Golan (@ItakGol)

UCSB-SURFI가 발표한 VulnLLM-R-7B는 Qwen2.5-7B 기반의 7B 추론 모델로, 데이터·제어 흐름을 추적해 C/C++/Python/Java 코드의 취약점을 탐지하고 이유를 설명하며 수정 방안을 도와주는 도구입니다. Apache-2.0 라이선스으로 공개되었습니다.

https://x.com/ItakGol/status/2020420095223500996

#appsec #vulnerabilitydetection #vulnllm #qwen #opensource

Itamar Golan 🤓 (@ItakGol) on X

Meet VulnLLM-R-7B (UCSB-SURFI). A 7B reasoning model for vulnerability detection: follows data + control flow, explains why code is risky, and helps triage fixes. C/C++/Python/Java. Apache-2.0. Based on Qwen2.5-7B. #AppSec #SecureCoding

X (formerly Twitter)
👨‍💻🚀 Oh joy, someone managed to remove a GitHub label! Truly groundbreaking stuff in the realm of IDE detection. Meanwhile, AI is building apps and finding vulnerabilities, but let's focus on those label changes, shall we? 🎉🔖
https://github.com/google-gemini/gemini-cli/issues/16728 #GitHubLabelRemoval #IDEdetection #AIBuildingApps #VulnerabilityDetection #TechHumor #HackerNews #ngated
jetbrains ide detection · Issue #16728 · google-gemini/gemini-cli

What would you like to be added? Adds native recognition for JetBrains IDE as a supported IDE environment. Why is this needed? Currently, Gemini CLI restricts IDE integration features to environmen...

GitHub

Only 5️⃣ more days until DIMVA‘25!

We kickstart the conference on Wednesday with our welcome event, exploring the old town of Graz during a city tour. See you there!

#DIMVA25 #Conference #WebSecurity #Vulnerability #VulnerabilityDetection #SideChannels #Obfuscation #OS #Network #AndroidPatches #AI #ML #ResilientSystems

Microsoft's AI Revolution in Cybersecurity: A New Era of Protection

Explore Microsoft's AI-driven cybersecurity innovations enhancing vulnerability detection and protection strategies.

The DefendOps Diaries

Join ICRC’s project YALTF @ hackathon.lu!

During this 2-days physical #hackathon organized by @circl we will work with developers to enhance & extend YALTF especially on #VulnerabilityDetection & compatibility with other systems.

YALTF is designed to scan & identify software licenses across multiple remote systems. It connects via SSH & collects info about packages & associated licenses

More info:
https://hackathon.lu
Click here: https://hackathon.lu/practical/ & join YALTF
https://github.com/yaltf/yaltf