QED-Nano: A 4-Billion Parameter Model That Proves Hard Olympiad-Level Math Theorems

Available in: 中文
2026-04-07T15:31:42.870Z·1 min read
A new paper introduces QED-Nano, a 4-billion parameter language model that can prove Olympiad-level mathematical theorems — matching or surpassing models 30x its size. The work demonstrates that ma...

A new paper introduces QED-Nano, a 4-billion parameter language model that can prove Olympiad-level mathematical theorems — matching or surpassing models 30x its size. The work demonstrates that mathematical reasoning ability doesn't necessarily require massive scale.

The Achievement

QED-Nano is a 4B parameter model post-trained specifically for Olympiad-level proofs. Despite its small size, it:

The Training Recipe

The three-stage training pipeline is the key innovation:

  1. Supervised Fine-Tuning (SFT) — Distills proof-writing style from DeepSeek-Math-V2, teaching the model good mathematical argumentation
  2. Reinforcement Learning (RL) — Uses rubric-based rewards to improve proof quality iteratively
  3. Reasoning Cache RL — Decomposes long proofs into summarize-and-refine cycles, enabling stronger test-time reasoning

Why This Matters

Efficiency

Reproducibility

Open Source Impact

The Bigger Picture

This work challenges the assumption that bigger is always better for AI reasoning. A well-designed training pipeline focused on the right skills (proof writing, iterative refinement) can achieve remarkable results with small models.

↗ Original source · 2026-04-07T00:00:00.000Z
← Previous: China's Anti-Exploitation Police Crack Down on Organized Coupon Abuse and Digital Fraud RingsNext: AI Trust OS: Continuous Governance Framework for Autonomous AI Observability in Enterprise Environments →
Comments0