Self-Correcting AI Agents Boost Insurance Document Extraction Accuracy from 80% to 95% Without Model Changes
FurtherAI has published an engineering deep-dive on building a self-correcting extraction system for insurance "loss runs" — messy, high-stakes claim history reports — that improved accuracy from 80% to 95% row count without changing the underlying extraction model.
The Problem: Loss Runs
Loss runs are claim history reports used by insurers to price commercial policies — essentially the "credit report" for a business's insurance risk. The challenge:
- Sources: Hundreds of different carriers, each with unique formats
- Size: 1 to 200+ pages per document
- Complexity: ~30 fields per claim need structured extraction
- Inconsistency: No two loss runs look alike
Why Standard Extraction Fails
Real-world patterns that break standard pipelines:
- Cross-table joins: 4 tables across 5 pages describing the same 10 claims, with claim number as the join key
- Distant headers: Policy number appears once on page 5 as a section header, then ~100 claims follow with no repetition — miss it and every claim gets wrong metadata
- Summary rows masquerading as claims: Identical column structure except they're totals, inflating claim counts by 15%
- Implicit blank cells: Blank doesn't mean "empty" — it means "same as the row above"
- Ambiguous $0 claims: Could be closed without payment OR data entry placeholders
The Solution: Self-Correction Loop
Instead of improving the extraction model, FurtherAI built an agent that checks and fixes its own output:
- 10-line validation function outperformed weeks of prompt engineering
- Agent uses tools to verify extracted data against document context
- Iterative self-correction catches edge cases the initial extraction misses
Key Insight
"OCR was never the bottleneck — the real challenge is that every carrier has their own conventions, and the only way to handle them is to reason about what the document is actually trying to communicate."
Implications for Enterprise AI
This pattern — agentic self-correction over model improvement — represents a paradigm shift in enterprise AI:
- Works with any base model (GPT-4, Claude, open-source)
- Cheaper than fine-tuning
- More robust than single-pass extraction
- Applicable to medical records, legal filings, financial documents