Как я поймал Трансформер на читерстве: гроккинг, математика и Mechanistic Interpretability

Феномен Grokking и Mechanistic Interpretability — главные тренды в исследованиях лабораторий уровня OpenAI и Anthropic. Я решил потрогать эти концепции своими руками на уровне тензоров. Цель казалась тривиальной: заставить кастомный микро-Трансформер (всего 1М параметров) выучить базовую арифметику с нуля. Однако вместо математического гения я получил ленивого мошенника. Эта статья — инженерный детектив о том, как нейросети пытаются нас обмануть (Specification Gaming), и как вскрытие Attention-матриц помогает поймать их за руку. Вскрыть Трансформер

https://habr.com/ru/articles/1008656/

#machine_learning #transformers #grokking #mechanistic_interpretability #pytorch #specification_gaming #ai_alignment

Как я поймал Трансформер на читерстве: гроккинг, математика и Mechanistic Interpretability

Феномен  Grokking  и  Mechanistic Interpretability  — главные тренды в исследованиях лабораторий уровня OpenAI и Anthropic. Я решил потрогать эти концепции своими руками на уровне...

Хабр

Avinav Sahoo(he/ him/ his) (@avinavsahoo)

Pulseinnovas India가 Grokking 훈련을 획기적으로 최적화했다고 주장합니다. 이전에 수만 에폭이 필요하던 Grokking을 동일 데이터 기준으로 500 에폭 미만으로 줄였고, 비용을 약 70% 절감했다는 결과를 발표하며 성능·비용 향상 사례를 공유했습니다.

https://x.com/avinavsahoo/status/2014398783795429430

#grokking #training #efficiency #pulseinnovas #ml

Avinav Sahoo(he/ him/ his) (@avinavsahoo) on X

At Pulseinnovas India we have done the impossible. Grokking used to take tens of thousands of epoch we have reduced it less than 500 epochs on the same data @elonmusk @sama @AnjneyMidha @bonatsos @a16z @deedydas @demishassabis @sundarpichai @anandmahindra reduced the cost by 70%

X (formerly Twitter)

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking

https://arxiv.org/abs/2509.21519

#HackerNews #ProvableScalingLaws #FeatureEmergence #LearningDynamics #Grokking #AIResearch

$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization

While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open question whether there is a mathematical framework to characterize what kind of features emerge, how and in which conditions it happens from training, for complex structured inputs. We propose a novel framework, named $\mathbf{Li_2}$, that captures three key stages for the grokking behavior of 2-layer nonlinear networks: (I) Lazy learning, (II) independent feature learning and (III) interactive feature learning, characterized by the structure of backpropagated gradient $G_F$ across layers. In (I), $G_F$ is random, and top layer overfits to random hidden representation. In (II), the gradient of each node (column of $G_F$) only depends on its own activation, and thus each hidden node learns their representation independently from $G_F$, which now carries information about target labels, thanks to weight decay. Interestingly, the independent dynamics follows exactly the gradient ascent of an energy function $E$, and its local maxima are precisely the emerging features. We study whether these local-optima induced features are generalizable, their representation power, and how they change on sample size, in group arithmetic tasks. Finally, in (III), we provably show how hidden nodes interact, and how $G_F$ changes to focus on missing features that need to be learned. Our study sheds lights on roles played by key hyperparameters such as weight decay, learning rate and sample sizes in grokking, leads to provable scaling laws of memorization and generalization, and reveals the underlying cause why recent optimizers such as Muon can be effective, from the first principles of gradient dynamics. Our analysis can be extended to multi-layer architectures.

arXiv.org
« Campaigns driving tens of millions of views to adult or phishing sites in just days. » #x #ai #grokking #web #porn #malvertizing #online #phishing #future #threats [ https://gbhackers.com/hackers-exploit-xs-grok-ai/ ] #informatique
Hackers Exploit X’s Grok AI to Push Malicious Links Through Ads

Malicious actors have found a new way to slip harmful links into X’s promoted posts by tricking Grok, the platform’s AI assistant.

GBHackers Security | #1 Globally Trusted Cyber Security News Platform
"Behold, the ultimate 120-page miracle cure for your terminal-phobia, offering salvation to those too enlightened to read the actual manual. 📚💸 Pay what you want, because apparently, 'Grokking' the command line shouldn't bankrupt you, unless you count the cost of overused #buzzwords. 🙄✨"
https://commandline.stribny.name/ #miraclecure #terminalphobia #paywhatyouwant #commandline #grokking #HackerNews #ngated
Command Line Handbook

e509 — Maverick and Marbles

e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/

e509 — Maverick and Marbles | Games At Work dot Biz

stories and discussion all around AI, LLMs, llamas, generated Quake, grokking, generalization and much more.

Games At Work dot Biz | Play games with us!

e509 — Maverick and Marbles

e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/

e509 — Maverick and Marbles | Games At Work dot Biz

stories and discussion all around AI, LLMs, llamas, generated Quake, grokking, generalization and much more.

Games At Work dot Biz | Play games with us!

Grokking at Edge of Numerical Stability
https://arxiv.org/abs/2501.04697
https://old.reddit.com/r/MachineLearning/comments/1i34keg/grokking_at_the_edge_of_numerical_stability
https://en.wikipedia.org/wiki/Grokking_(machine_learning)

* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra performance/unexpected abilities
* unexp./accid. finding
* mechanisms starting to unravel

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071
https://news.ycombinator.com/item?id=40495149

#LLM #ML #grokking #NN #emergence #generalization

Grokking at the Edge of Numerical Stability

Grokking, the sudden generalization that occurs after prolonged overfitting, is a surprising phenomenon challenging our understanding of deep learning. Although significant progress has been made in understanding grokking, the reasons behind the delayed generalization and its dependence on regularization remain unclear. In this work, we argue that without regularization, grokking tasks push models to the edge of numerical stability, introducing floating point errors in the Softmax function, which we refer to as Softmax Collapse (SC). We demonstrate that SC prevents grokking and that mitigating SC enables grokking without regularization. Investigating the root cause of SC, we find that beyond the point of overfitting, the gradients strongly align with what we call the naïve loss minimization (NLM) direction. This component of the gradient does not alter the model's predictions but decreases the loss by scaling the logits, typically by scaling the weights along their current direction. We show that this scaling of the logits explains the delay in generalization characteristic of grokking and eventually leads to SC, halting further learning. To validate our hypotheses, we introduce two key contributions that address the challenges in grokking tasks: StableMax, a new activation function that prevents SC and enables grokking without regularization, and $\perp$Grad, a training algorithm that promotes quick generalization in grokking tasks by preventing NLM altogether. These contributions provide new insights into grokking, elucidating its delayed generalization, reliance on regularization, and the effectiveness of existing grokking-inducing methods. Code for this paper is available at https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability.

arXiv.org