LLM Neuroanatomy II: Duplicating Transformer Layers Without Training Produces Top Models

2026-03-24T15:29:03.998Z·2 min read

Part 2 asks the critical question: was RYS a fluke of Qwen2-72B, or is it a general property of Transformers?

The RYS Method: Repeat Your Self

In Part 1 of the LLM Neuroanatomy series, the author demonstrated that duplicating a block of seven middle layers in Qwen2-72B — with no weight changes and no training — produced the #1 model on the HuggingFace Open LLM Leaderboard. The method was discovered using math probes and EQ-Bench on a pair of RTX 4090s.

Part 2 asks the critical question: was RYS a fluke of Qwen2-72B, or is it a general property of Transformers?

The Answer: Yes, It's General

After testing on modern models including Qwen3.5-27B, MiniMax, and GLM-4.7, the answer is clear: relayering survives as a general technique. The full analysis required 3,024 beam search candidates, a surrogate model scoring 2 million configurations, and a unified validation sweep.

Evidence of a Universal "Thinking Space"

The most fascinating finding comes from a cross-lingual similarity experiment. The author computed cosine similarity of hidden states across layers for sentences in English and Chinese about the same topics:

Cross-language, same content: 0.920 mean similarity
Same-language, different content: 0.882
Cross-language, different content: 0.852

In the middle layers of the model, two sentences about the same topic are more similar than two sentences in the same language about different topics — even when one is in English and the other is Chinese. The model's internal representation cares more about what you're saying than what language you're saying it in.

The Three-Phase Hypothesis Confirmed

The research confirms a three-phase structure in Transformers:

Early layers (encoding): Rapid convergence across all inputs
Middle layers (reasoning): Near-perfect similarity in a format-agnostic "thinking space"
Late layers (decoding): Divergence back to surface form (language, style)

Practical Implications

The RYS technique works because the middle reasoning layers can be duplicated without disrupting the encode-decode pipeline. This opens up possibilities for:

Free performance gains on existing models without retraining
Model architecture optimization through targeted layer duplication
Understanding LLM internals through systematic probing

The author has released scanning code and new RYS models for the community.

ai llm research transformer neuroscience qwen opensource deeplearning

Comments0