Mastodawn

Marcus Williams (@Marcus_J_W)

OpenAI가 내부 코딩 트래픽의 99.9%를 자사 최강 모델로 모니터링해 비정상 정렬(misalignment)을 탐지하고, 전체 작업 흐름을 검토해 의심 행동을 조기에 발견·에스컬레이션하며 안전장치를 강화하고 있다고 밝혔다.

https://x.com/Marcus_J_W/status/2034677345681068140

#openai #aisafety #monitoring #coding #alignment

Marcus Williams (@Marcus_J_W) on X

Sharing some of the work I’ve been doing at OpenAI: we now monitor 99.9% of internal coding traffic for misalignment using our most powerful models, reviewing full trajectories to catch suspicious behavior, escalate serious cases quickly, and strengthen our safeguards over time.

X (formerly Twitter)