Darkness Visible: GPT-2's Final MLP Layer Decoded as a 27-Neuron Exception Handler

Available in: 中文
2026-04-07T17:17:45.600Z·1 min read
A remarkable mechanistic interpretability study has fully decoded the final MLP layer of GPT-2 Small into 27 named neurons organized as a three-tier exception handler. The research reveals how a ti...

A remarkable mechanistic interpretability study has fully decoded the final MLP layer of GPT-2 Small into 27 named neurons organized as a three-tier exception handler. The research reveals how a tiny neural circuit routes knowledge without storing it.

The Discovery

The final MLP (Multi-Layer Perceptron) in GPT-2 Small's last layer isn't storing knowledge — it's routing it. All 3,072 neurons decompose into:

ComponentCountFunction
Core neurons5Reset vocabulary toward function words
Differentiators10Suppress wrong candidates
Specialists5Detect structural boundaries
Consensus neurons7Monitor distinct linguistic dimensions

The Exception Handler Model

The neurons work as a three-tier system:

  1. Default path — Core neurons establish baseline function word usage
  2. Exception detection — Specialists identify when structural boundaries require different handling
  3. Consensus voting — 7 neurons each monitor a linguistic dimension; the crossover point (4-5 of 7 agreeing) statistically sharply determines whether MLP intervention helps or harms

Key Insight: Routing, Not Storage

The study challenges the popular "knowledge neurons" concept. The so-called knowledge neurons at layer 11 of GPT-2 function as routing infrastructure rather than fact storage. They amplify or suppress signals already present in the residual stream from attention layers.

Practical Implications

↗ Original source · 2026-04-07T00:00:00.000Z
← Previous: Bidirectional Entropy Modulation: Rethinking Exploration in Reinforcement Learning for LLM ReasoningNext: Data Attribution in Adaptive Learning: Why Standard Methods Fail When AI Generates Its Own Training Data →
Comments0