Leanstral: Open-Source Foundation for Trustworthy AI Code Agents
Mistral AI releases Leanstral, the first open-source code agent for Lean 4 — a proof assistant capable of expressing complex mathematical objects and software specifications. With just 6B active parameters using a highly sparse architecture, Leanstral outperforms much larger models on formal proof engineering tasks.
The Problem: Human Review Bottleneck
AI agents have proven highly capable at code generation. Yet as these models are pushed into high-stakes domains — from frontier research mathematics to mission-critical software — the human review process becomes the primary bottleneck. The time and specialized expertise required to manually verify AI-generated code limits engineering velocity.
Enter Leanstral
Mistral AI's answer is Leanstral, a code agent specifically designed for Lean 4 that can both carry out tasks and formally prove implementations against strict specifications. Instead of humans debugging machine-generated logic, they simply dictate what they want.
Key technical details:
- Architecture: Highly sparse, optimized for proof engineering — only 6B active parameters (120B total)
- License: Apache 2.0, fully open-source
- Integration: Available in Mistral's vibe coding environment and through a free API endpoint
- MCP Support: Trained to work with lean-lsp-mcp for maximum performance
Performance: Beating Giants at a Fraction of the Cost
Leanstral-120B-A6B demonstrates significant efficiency advantages over much larger open-source models:
| Model | Active Params | FLTEval Score | Passes Needed |
|---|---|---|---|
| Leanstral | 6B | 38.9+ | 1 |
| GLM5 | 40B | 16.6 | Multiple |
| Kimi-K2.5 | 32B | 20.1 | Multiple |
| Qwen3.5 | 17B | ~38 | 4 |
Against closed-source competitors like Claude Opus 4.6 and Sonnet 4.6, Leanstral remains competitive while being dramatically more cost-efficient through parallel inference with Lean as a perfect verifier.
New Evaluation: FLTEval
Rather than testing isolated competition math problems, Mistral introduces FLTEval — an evaluation suite that tests completion of all formal proofs and correct definition of new mathematical concepts in real pull requests to the FLT (Fermat's Last Theorem) project. This reflects actual proof engineering scenarios rather than toy problems.
What This Means
Leanstral represents a paradigm shift: AI agents that don't just generate code, but can formally verify it against mathematical specifications. This could fundamentally change how we think about AI-assisted software development — moving from "trust but verify" to "prove it works."
Source: Mistral AI Blog | HN: 695 points