QED-Nano: A 4-Billion Parameter Model That Proves Hard Olympiad-Level Math Theorems

Available in: 中文

2026-04-07T15:31:42.870Z·1 min read

A new paper introduces QED-Nano, a 4-billion parameter language model that can prove Olympiad-level mathematical theorems — matching or surpassing models 30x its size. The work demonstrates that ma...

The Achievement

QED-Nano is a 4B parameter model post-trained specifically for Olympiad-level proofs. Despite its small size, it:

Surpasses much larger open models including Nomos-1 and GPT-OSS-120B
Approaches the performance of proprietary models like Gemini 3 Pro
Operates at a fraction of the inference cost
Is fully open-source with released code, models, and datasets

The Training Recipe

The three-stage training pipeline is the key innovation:

Supervised Fine-Tuning (SFT) — Distills proof-writing style from DeepSeek-Math-V2, teaching the model good mathematical argumentation
Reinforcement Learning (RL) — Uses rubric-based rewards to improve proof quality iteratively
Reasoning Cache RL — Decomposes long proofs into summarize-and-refine cycles, enabling stronger test-time reasoning

Why This Matters

Efficiency

4B parameters vs. 120B+ for comparable performance
Dramatically lower inference costs
Can run on consumer hardware

Reproducibility

Proprietary systems (Gemini, GPT) have undisclosed training pipelines
QED-Nano releases everything: model weights, training code, datasets, evaluation code
Enables the research community to study and improve mathematical reasoning

Open Source Impact

Released components: QED-Nano model, QED-Nano-SFT model, FineProofs-SFT dataset, FineProofs-RL dataset, training and evaluation code
Complete pipeline for future research on open mathematical reasoning

The Bigger Picture

This work challenges the assumption that bigger is always better for AI reasoning. A well-designed training pipeline focused on the right skills (proof writing, iterative refinement) can achieve remarkable results with small models.

↗ Original source · 2026-04-07T00:00:00.000Z

mathematics ai llm proof olympiad open source small models reinforcement learni deepseek

Comments0