Mastodawn

- *Big* update on AI model interpretability from Anthropic: https://www.anthropic.com/research/natural-language-autoencoders with open weight models: https://github.com/kitft/natural_language_autoencoders

- + Dream state for Agents to clean up memories: https://platform.claude.com/docs/en/managed-agents/dreams

- Firefox writeup validates Mythos is helping it find lots of bugs: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/

- and naturally AI is being used by hackers, so probably don't use freshly released packages: https://xeiaso.net/blog/2026/abstain-from-install/

#AI #AINews #anthropic #mechinterp #dreams #cybersecurity

Natural Language Autoencoders

Turning Claude's thoughts into text

ǝʌɐp Apr 30

Qwen team releasing more interpretability stuff. I haven't had time to play with sparse autoencoders, but they're very neat. They can be thought of like a GPIO pin that goes high when certain feature-related patterns of neurons are active. If you had a feature trained for "fediverse" you might see that pin go high during a conversation of other adjacent social networks or protocols for example.

https://qwen.ai/blog?id=qwen-scope

#llm #mechinterp #qwen

Qwen Studio

Qwen Studio offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

Tim Mar 6

- Google CLI: control gmail, calendar, drive, API: what could go wrong?: https://github.com/googleworkspace/cli

- Paperclip pitch: "Open-source orchestration for 0-human companies" for Openclaw, Claude Code, Codex: https://github.com/paperclipai/paperclip

2 from early Feb:
- Latent Space+Goodfire on Mechanistic Interpretability + real-time Steering 🤯: https://www.latent.space/p/goodfire

- 1Password on Openclaw security issues: https://1password.com/blog/from-magic-to-malware-how-openclaws-agent-skills-become-an-attack-surface

#AI #AInews #google #openclaw #claudecode #codex #security #mechinterp #opensource

GitHub - googleworkspace/cli: Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills.

Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills. - googlework...

GitHub

sayzard Jan 9

aidan ewart (@aidanprattewart)

2501.17727 등 arXiv 논문 덕분에 관련 내용이 널리 알려졌고, 이러한 분위기가 GDM의 mechanistic interpretability 팀과 더 넓은 해석(interp) 연구 공간이 최근 'pragmatic' interp로 초점을 옮기게 한 원인 중 하나라는 의견을 제시함.

https://x.com/aidanprattewart/status/2009320890140668006

#interpretability #mechinterp #arxiv #research

aidan ewart (@aidanprattewart) on X

@sebkrier I think people mostly are aware of this thanks to papers like https://t.co/LNQORidB9m (and also eg older stats results) and this kind of vibe is part of what caused GDM’s mech interp team (and much of the broader interp space) to switch focus recently to ‘pragmatic’ interp.

X (formerly Twitter)

sayzard Jan 7

Michelle Bakels (@MichelleBakels)

Day 21: Google Labs의 Kath Korevec가 'Proactive Agents'를 발표했습니다. 또한 'State of MechInterp' 세션에서는 SAEs의 생산 적용, 회로 추적(circuit tracing), AI4Science 적용 사례와 더 실용적인 해석('Pragmatic' Interp) 접근법 등 기계 해석학(mechanistic interpretability) 관련 최신 연구·토픽을 다뤘습니다.

https://x.com/MichelleBakels/status/2008560287067369589

#proactiveagents #mechinterp #ai4science #googlelabs

Michelle Bakels (@MichelleBakels) on X

@aiDotEngineer @latentspacepod Day 21 Proactive Agents – Kath Korevec, Google Labs https://t.co/MUklTHWRo2 [State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, "Pragmatic" Interp — Goodfire https://t.co/f0ynRX8JYK

X (formerly Twitter)