- *Big* update on AI model interpretability from Anthropic: https://www.anthropic.com/research/natural-language-autoencoders with open weight models: https://github.com/kitft/natural_language_autoencoders

- + Dream state for Agents to clean up memories: https://platform.claude.com/docs/en/managed-agents/dreams

- Firefox writeup validates Mythos is helping it find lots of bugs: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/

- and naturally AI is being used by hackers, so probably don't use freshly released packages: https://xeiaso.net/blog/2026/abstain-from-install/

#AI #AINews #anthropic #mechinterp #dreams #cybersecurity

Natural Language Autoencoders

Turning Claude's thoughts into text

Qwen team releasing more interpretability stuff. I haven't had time to play with sparse autoencoders, but they're very neat. They can be thought of like a GPIO pin that goes high when certain feature-related patterns of neurons are active. If you had a feature trained for "fediverse" you might see that pin go high during a conversation of other adjacent social networks or protocols for example.

https://qwen.ai/blog?id=qwen-scope

#llm #mechinterp #qwen

Qwen Studio

Qwen Studio offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

- Google CLI: control gmail, calendar, drive, API: what could go wrong?: https://github.com/googleworkspace/cli

- Paperclip pitch: "Open-source orchestration for 0-human companies" for Openclaw, Claude Code, Codex: https://github.com/paperclipai/paperclip

2 from early Feb:
- Latent Space+Goodfire on Mechanistic Interpretability + real-time Steering ๐Ÿคฏ: https://www.latent.space/p/goodfire

- 1Password on Openclaw security issues: https://1password.com/blog/from-magic-to-malware-how-openclaws-agent-skills-become-an-attack-surface

#AI #AInews #google #openclaw #claudecode #codex #security #mechinterp #opensource

GitHub - googleworkspace/cli: Google Workspace CLI โ€” one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills.

Google Workspace CLI โ€” one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills. - googlework...

GitHub

aidan ewart (@aidanprattewart)

2501.17727 ๋“ฑ arXiv ๋…ผ๋ฌธ ๋•๋ถ„์— ๊ด€๋ จ ๋‚ด์šฉ์ด ๋„๋ฆฌ ์•Œ๋ ค์กŒ๊ณ , ์ด๋Ÿฌํ•œ ๋ถ„์œ„๊ธฐ๊ฐ€ GDM์˜ mechanistic interpretability ํŒ€๊ณผ ๋” ๋„“์€ ํ•ด์„(interp) ์—ฐ๊ตฌ ๊ณต๊ฐ„์ด ์ตœ๊ทผ 'pragmatic' interp๋กœ ์ดˆ์ ์„ ์˜ฎ๊ธฐ๊ฒŒ ํ•œ ์›์ธ ์ค‘ ํ•˜๋‚˜๋ผ๋Š” ์˜๊ฒฌ์„ ์ œ์‹œํ•จ.

https://x.com/aidanprattewart/status/2009320890140668006

#interpretability #mechinterp #arxiv #research

aidan ewart (@aidanprattewart) on X

@sebkrier I think people mostly are aware of this thanks to papers like https://t.co/LNQORidB9m (and also eg older stats results) and this kind of vibe is part of what caused GDMโ€™s mech interp team (and much of the broader interp space) to switch focus recently to โ€˜pragmaticโ€™ interp.

X (formerly Twitter)

Michelle Bakels (@MichelleBakels)

Day 21: Google Labs์˜ Kath Korevec๊ฐ€ 'Proactive Agents'๋ฅผ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ 'State of MechInterp' ์„ธ์…˜์—์„œ๋Š” SAEs์˜ ์ƒ์‚ฐ ์ ์šฉ, ํšŒ๋กœ ์ถ”์ (circuit tracing), AI4Science ์ ์šฉ ์‚ฌ๋ก€์™€ ๋” ์‹ค์šฉ์ ์ธ ํ•ด์„('Pragmatic' Interp) ์ ‘๊ทผ๋ฒ• ๋“ฑ ๊ธฐ๊ณ„ ํ•ด์„ํ•™(mechanistic interpretability) ๊ด€๋ จ ์ตœ์‹  ์—ฐ๊ตฌยทํ† ํ”ฝ์„ ๋‹ค๋ค˜์Šต๋‹ˆ๋‹ค.

https://x.com/MichelleBakels/status/2008560287067369589

#proactiveagents #mechinterp #ai4science #googlelabs

Michelle Bakels (@MichelleBakels) on X

@aiDotEngineer @latentspacepod Day 21 Proactive Agents โ€“ Kath Korevec, Google Labs https://t.co/MUklTHWRo2 [State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp โ€” Goodfire https://t.co/f0ynRX8JYK

X (formerly Twitter)