QED-Nano: A 4-Billion Parameter Model That Proves Hard Olympiad-Level Math Theorems
Available in: 中文
A new paper introduces QED-Nano, a 4-billion parameter language model that can prove Olympiad-level mathematical theorems — matching or surpassing models 30x its size. The work demonstrates that ma...
A new paper introduces QED-Nano, a 4-billion parameter language model that can prove Olympiad-level mathematical theorems — matching or surpassing models 30x its size. The work demonstrates that mathematical reasoning ability doesn't necessarily require massive scale.
The Achievement
QED-Nano is a 4B parameter model post-trained specifically for Olympiad-level proofs. Despite its small size, it:
- Surpasses much larger open models including Nomos-1 and GPT-OSS-120B
- Approaches the performance of proprietary models like Gemini 3 Pro
- Operates at a fraction of the inference cost
- Is fully open-source with released code, models, and datasets
The Training Recipe
The three-stage training pipeline is the key innovation:
- Supervised Fine-Tuning (SFT) — Distills proof-writing style from DeepSeek-Math-V2, teaching the model good mathematical argumentation
- Reinforcement Learning (RL) — Uses rubric-based rewards to improve proof quality iteratively
- Reasoning Cache RL — Decomposes long proofs into summarize-and-refine cycles, enabling stronger test-time reasoning
Why This Matters
Efficiency
- 4B parameters vs. 120B+ for comparable performance
- Dramatically lower inference costs
- Can run on consumer hardware
Reproducibility
- Proprietary systems (Gemini, GPT) have undisclosed training pipelines
- QED-Nano releases everything: model weights, training code, datasets, evaluation code
- Enables the research community to study and improve mathematical reasoning
Open Source Impact
- Released components: QED-Nano model, QED-Nano-SFT model, FineProofs-SFT dataset, FineProofs-RL dataset, training and evaluation code
- Complete pipeline for future research on open mathematical reasoning
The Bigger Picture
This work challenges the assumption that bigger is always better for AI reasoning. A well-designed training pipeline focused on the right skills (proof writing, iterative refinement) can achieve remarkable results with small models.
← Previous: China's Anti-Exploitation Police Crack Down on Organized Coupon Abuse and Digital Fraud RingsNext: AI Trust OS: Continuous Governance Framework for Autonomous AI Observability in Enterprise Environments →
0