LLM Neuroanatomy II: Duplicating Transformer Layers Without Training Produces Top Models

2026-03-24T15:29:03.998Z·2 min read
Part 2 asks the critical question: was RYS a fluke of Qwen2-72B, or is it a general property of Transformers?

The RYS Method: Repeat Your Self

In Part 1 of the LLM Neuroanatomy series, the author demonstrated that duplicating a block of seven middle layers in Qwen2-72B — with no weight changes and no training — produced the #1 model on the HuggingFace Open LLM Leaderboard. The method was discovered using math probes and EQ-Bench on a pair of RTX 4090s.

Part 2 asks the critical question: was RYS a fluke of Qwen2-72B, or is it a general property of Transformers?

The Answer: Yes, It's General

After testing on modern models including Qwen3.5-27B, MiniMax, and GLM-4.7, the answer is clear: relayering survives as a general technique. The full analysis required 3,024 beam search candidates, a surrogate model scoring 2 million configurations, and a unified validation sweep.

Evidence of a Universal "Thinking Space"

The most fascinating finding comes from a cross-lingual similarity experiment. The author computed cosine similarity of hidden states across layers for sentences in English and Chinese about the same topics:

In the middle layers of the model, two sentences about the same topic are more similar than two sentences in the same language about different topics — even when one is in English and the other is Chinese. The model's internal representation cares more about what you're saying than what language you're saying it in.

The Three-Phase Hypothesis Confirmed

The research confirms a three-phase structure in Transformers:

  1. Early layers (encoding): Rapid convergence across all inputs
  2. Middle layers (reasoning): Near-perfect similarity in a format-agnostic "thinking space"
  3. Late layers (decoding): Divergence back to surface form (language, style)

Practical Implications

The RYS technique works because the middle reasoning layers can be duplicated without disrupting the encode-decode pipeline. This opens up possibilities for:

The author has released scanning code and new RYS models for the community.

← Previous: China's Space Refueling Station Makes New ProgressNext: Trump-Iran Talks Reshape Global Markets: Oil Plunges 10%, Stocks Rally →
Comments0