RESCORE: LLM Agents Automatically Recover Simulations from Research Papers at 10x Human Speed
Available in: 中文
A new agentic framework called RESCORE uses LLM agents to automatically reconstruct numerical simulations described in control systems research papers, achieving a 10x speedup over manual human rep...
A new agentic framework called RESCORE uses LLM agents to automatically reconstruct numerical simulations described in control systems research papers, achieving a 10x speedup over manual human replication.
The Problem
Reproducing research results is a cornerstone of science, but:
- Parameters are underspecified — Papers omit implementation details
- Ambiguous descriptions — Mathematical notation interpreted differently
- Manual effort — Expert humans spend days or weeks per paper
- Reproducibility crisis — Many published results cannot be verified
RESCORE Framework
A three-component agentic pipeline:
- Analyzer — Reads and understands the paper, extracts simulation specifications
- Coder — Generates executable code to reproduce the simulation
- Verifier — Runs the code, compares outputs against paper figures using visual comparison
The system uses iterative execution feedback — when the simulation doesn't match, it analyzes what went wrong and tries again.
Results
- 40.7% success rate — Task-coherent simulation recovery on a 500-paper benchmark
- 10x speedup — Over manual human replication
- IEEE CDC benchmark — 500 papers from the Conference on Decision and Control
Why 40.7% Is Impressive
Given the complexity of recovering simulations from paper descriptions (missing parameters, implicit assumptions, notation differences), achieving nearly half success with an automated system is remarkable.
Implications
- Reproducibility — Automated verification of published results at scale
- Literature review — Quickly test if published methods work as described
- Research acceleration — Build on verified results faster
- Benchmark and code will be open sourced
← Previous: REAM: Merging Instead of Pruning Mixture-of-Experts Preserves Performance While Cutting MemoryNext: Width Growth in Language Models: Exact Copy Warm Starts Surprisingly Beat Complex Initialization Strategies →
0