Open-sourcing circuit-tracing tools

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Ah, the riveting world of "circuit tracing" in language models 🤖🔍, because what we really needed was another way to complicate things we barely understand. A "replacement model" that makes things "interpretable"? 😂 More like a desperate attempt to justify endless AI research grants.
https://transformer-circuits.pub/2025/attribution-graphs/methods.html #circuittracing #AIinterpretability #researchgrants #language_models #techhumor #HackerNews #ngated
Circuit Tracing: Revealing Computational Graphs in Language Models

We describe an approach to tracing the “step-by-step” computation involved when a model responds to a single prompt.

Transformer Circuits
Circuit Tracing: Revealing Computational Graphs in Language Models

We describe an approach to tracing the “step-by-step” computation involved when a model responds to a single prompt.

Transformer Circuits

Whoa! LOTS to unpack here. Weekend Reading!

Anthropic reveals research how AI systems process information and make decisions. AI models can perform a chain of reasoning, can plan ahead, and sometimes work backward from a desired outcome. The research also provides insight into why language models hallucinate.

Interpretation techniques called “circuit tracing” and “attribution graphs” enable researchers to map out the specific pathways of neuron-like features that activate when models perform tasks. See the links below for details.

Summary Article: https://venturebeat.com/ai/anthropic-scientists-expose-how-ai-actually-thinks-and-discover-it-secretly-plans-ahead-and-sometimes-lies/

Circuit Tracing: https://transformer-circuits.pub/2025/attribution-graphs/methods.html

Research Overview: https://transformer-circuits.pub/2025/attribution-graphs/biology.html #AI #Anthropic #LLMs #Claude #ChatGPT #CircuitTracing #neuroscience

Anthropic scientists expose how AI actually ‘thinks’ — and discover it secretly plans ahead and sometimes lies

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Anthropic has developed a new method for peering inside large language models like Claude, revealing for the first time how these AI systems process information and make decisions. The research, published today in two papers (available […]

VentureBeat
My lab will use #ephys, #-omics, and #circuittracing to bridge #retina and brain biology. I’ll post soon job ads but you can already send me an email if you're interested in a master/PhD/postdoc. Come work with me @unibe in beautiful Bern! 3/5
https://www.youtube.com/watch?v=Ahlse_WM0P8
University of Bern - Knowledge creates Value!

YouTube