Hallucination Basins: Geometric Framework Explains When LLMs Hallucinate Using Dynamical Systems Theory
A new paper applies dynamical systems theory to understand LLM hallucinations, finding that hallucinations arise from task-dependent "basin structures" in the model's latent space. The framework enables geometry-aware steering to reduce hallucinations without retraining.
The Key Insight
LLM hallucinations aren't random — they have geometric structure. By analyzing autoregressive hidden-state trajectories, the researchers found:
- Factoid tasks — Show clearer separation between hallucination and factual "basins" in latent space
- Summarization tasks — Less stable, basins frequently overlap
- Misconception-heavy tasks — Most unstable, significant basin overlap
What Are Hallucination Basins?
Think of the model's latent space as a landscape with valleys (basins). When the model's processing enters a "factual basin," it produces correct outputs. When it enters a "hallucination basin," it generates fluent but incorrect content.
The key finding: basin separability is task-dependent, not universal. The same model can have well-separated basins for some tasks and overlapping basins for others.
Formal Results
- Task-complexity theorem — More complex tasks lead to less separable basins
- Multi-basin theorem — Characterizes how multiple basins emerge in L-layer transformers
- Geometry-aware steering — Can reduce hallucination probability without retraining
Practical Implications
This geometric understanding opens new approaches to hallucination control:
- Task-specific calibration — Adjust intervention strategies based on task type
- Steering vectors — Manipulate hidden states to keep processing in factual basins
- Diagnostic tooling — Identify which tasks a model is most prone to hallucinate on