LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

https://dnhkng.github.io/posts/rys-ii/

LLM Neuroanatomy II: Modern LLM Hacking and hints of a Universal Language?

In Part 1, I described how duplicating a block of seven middle layers in Qwen2-72B — no weight changes, no training — produced the #1 model on the HuggingFace Open LLM Leaderboard. The method, which I called RYS (Repeat Your Self), was discovered using nothing but hard math probes and EQ-Bench on a pair of RTX 4090s.

David Noel Ng
Apologies if I missed this in the article (or in the first article in the series) - what happens if you add two copies of the layer set? Does performance improve over adding one copy of the layer set?

Author here: That was done in this blog post, in the beam search. I started with the best re-layer configs, and iteratively added more blocks, including the same multiple times, during a long beam search.

It turns out this does not help (somewhat surprisingly).

Actually not surprised.
I guess this is for the same reason “say it twice” [1] is working. Because LLm are trained as causal language model, past token cannot attend to future token.
One copy of the layer set solve this.
[1]https://arxiv.org/html/2512.14982v1
Prompt Repetition Improves Non-Reasoning LLMs