Mastodawn

Max Kaufmann (@Max_A_Kaufmann)

강화학습(RL) 훈련이 LLM의 CoT(Chain of Thought)를 모호하게 만들 수 있는지 다루며, 훈련 전에 이러한 obfuscation 발생 여부를 예측하는 새로운 프레임워크를 Google DeepMind가 제안했다는 연구 소개다.

https://x.com/Max_A_Kaufmann/status/2039404338855149861

#googledemind #llm #cot #reinforcementlearning #research

Max Kaufmann (@Max_A_Kaufmann) on X

Is training against the CoT always bad? RL training can lead to obfuscated CoT making it difficult to 'read an LLMs thoughts'. How can we predict when obfuscation occurs?🤔 Our new @GoogleDeepMind paper introduces a framework to predict this before training starts!

X (formerly Twitter)