Darkness Visible: GPT-2's Final MLP Layer Decoded as a 27-Neuron Exception Handler

Available in: 中文

2026-04-07T17:17:45.600Z·1 min read

A remarkable mechanistic interpretability study has fully decoded the final MLP layer of GPT-2 Small into 27 named neurons organized as a three-tier exception handler. The research reveals how a ti...

The Discovery

The final MLP (Multi-Layer Perceptron) in GPT-2 Small's last layer isn't storing knowledge — it's routing it. All 3,072 neurons decompose into:

Component	Count	Function
Core neurons	5	Reset vocabulary toward function words
Differentiators	10	Suppress wrong candidates
Specialists	5	Detect structural boundaries
Consensus neurons	7	Monitor distinct linguistic dimensions

The Exception Handler Model

The neurons work as a three-tier system:

Default path — Core neurons establish baseline function word usage
Exception detection — Specialists identify when structural boundaries require different handling
Consensus voting — 7 neurons each monitor a linguistic dimension; the crossover point (4-5 of 7 agreeing) statistically sharply determines whether MLP intervention helps or harms

Key Insight: Routing, Not Storage

The study challenges the popular "knowledge neurons" concept. The so-called knowledge neurons at layer 11 of GPT-2 function as routing infrastructure rather than fact storage. They amplify or suppress signals already present in the residual stream from attention layers.

Practical Implications

Model editing — Understanding routing infrastructure enables more precise interventions
Efficiency — The entire routing program uses only 27 neurons out of ~3,040 total
Garden-path reversal — Experiments show the system can dynamically reverse its processing direction

↗ Original source · 2026-04-07T00:00:00.000Z

Comments0

Darkness Visible: GPT-2's Final MLP Layer Decoded as a 27-Neuron Exception Handler

The Discovery

The Exception Handler Model

Key Insight: Routing, Not Storage

Practical Implications

Related Articles