Mastodawn

Shared Geometry of Neural Networks

최근 연구 'Manifold Steering'은 신경망 내부 표현과 행동 결과 사이의 인과 관계를 활성화 매니폴드 상에서 개입함으로써 분석한다. 이 방법은 기존의 선형 개입 방식보다 자연스러운 신경망 동작을 더 잘 복원하며, LLM과 비디오 월드 모델 모두에서 검증되었다. 신경망의 동작을 제어하고 디버깅하는 데 있어 신경 기하학적 구조가 핵심 메커니즘임을 제시한다. 이는 신경망의 내부 작동 원리를 이해하고 AI 모델 제어 기술 발전에 중요한 시사점을 제공한다.

https://twitter.com/TheAITimeline/status/2053712704104206729

#neuralnetworks #manifold #llm #modelinterpretability #representationlearning

The AI Timeline (@TheAITimeline) on X

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior Author's Explanation: https://t.co/ax8HuhezQw Overview: Manifold steering investigates the causal link between internal representations and behavioral outcomes by intervening along

X (formerly Twitter)

sayzard 3d ago

Natural Language Autoencoders

Neuronpedia는 AI 모델의 내부 작동을 탐색, 시각화, 조작할 수 있는 오픈소스 해석 가능성 플랫폼입니다. 이 플랫폼은 자연어 오토인코더, 회로 추적, 어시스턴트 축 등 다양한 도구와 기능을 제공하며, Google DeepMind, Anthropic, OpenAI 등 주요 연구진과 협력하여 최신 연구 결과와 모델 해석 도구를 공개합니다. API와 라이브러리를 통해 개발자가 모델 내부 상태를 분석하고, 활성화를 조작하여 모델 행동을 제어할 수 있습니다. 특히, 수십 페타바이트 규모의 활성화 데이터와 메타데이터를 지원하며, 다양한 LLM과 Sparse Autoencoder 기반 해석 도구를 포함합니다.

https://www.neuronpedia.org

#modelinterpretability #autoencoder #llm #opensource #neuralnetworks

Neuronpedia

Open Source Interpretability Platform

Neuronpedia

sayzard Feb 6

Deedy (@deedydas)

스타트업 Goodfire가 모델 가중치에서 직접 AI 모델을 이해·조정하는 기술을 목표로 12.5억 달러(1.25B)를 조달했다고 발표했습니다. Anthropic 창업자 Dario는 이를 'AI의 MRI'라고 표현하며, 거짓말·기만 등 문제 성향을 신뢰성 있게 탐지·조정하는 방식이라고 설명했습니다.

https://x.com/deedydas/status/2019453156393119871

#goodfire #modelinterpretability #funding #aisafety

Deedy (@deedydas) on X

Excited to announce that Goodfire just raised at $1.25B to understand and steer AI models directly from model weights! We don't really understand how AI works today. Anthropic founder Dario says, an "MRI for AI" that can reliably detect problematic tendencies (lying/deception,

X (formerly Twitter)

sayzard Jan 21

fly51fly (@fly51fly)

Google DeepMind 연구진(J. Kramár 등)은 Gemini 모델을 분석하기 위한 '프로덕션 수준의 프로브(probes)' 구축 방법을 제안합니다. 모델 내부 표현을 안정적으로 검사·모니터링하는 실무적 파이프라인과 모범 사례를 제시하여 대규모 모델 해석·검증 작업을 산업 현장에 적용하기 위한 실용적 지침을 제공합니다 (arXiv:2601.11516).

https://x.com/fly51fly/status/2013730352901279902

#gemini #probing #deepmind #modelinterpretability

fly51fly (@fly51fly) on X

[LG] Building Production-Ready Probes For Gemini J Kramár, J Engels, Z Wang, B Chughtai... [Google DeepMind] (2026) https://t.co/u63iRswBiS

X (formerly Twitter)

tejiri Dec 4

OpenAI acquires Neptune to supercharge AI model visibility and research tools #AIresearch #MachineLearning #OpenAI

OpenAI's acquisition of Neptune aims to enhance model interpretability by providing researchers with deeper insights into model behavior, facilitating more efficient experimentation and training processes. Neptune's technology will be integrated into OpenAI's existing infrastructure, enabling more effective...

#OpenAI #Neptune #ModelInterpretability #MachineLearning