Agent-CoEvo: Code and Tests Should Evolve Together — Multi-Agent Framework Outperforms on SWE-bench

Available in: 中文

2026-04-07T23:56:51.530Z·1 min read

A new multi-agent framework called Agent-CoEvo demonstrates that software repair should not optimize code under fixed tests, but instead coevolve both code and tests simultaneously — achieving stat...

The Problem With Current AI Code Repair

Most LLM-based repair systems use a linear pipeline:

Bug Report → Generate Fix → Run Tests → Pass/Fail

Tests are treated as immutable correctness oracles. But real software engineers don't work this way — they often discover that tests themselves contain bugs, missing assumptions, or misinterpreted failure conditions.

The Insight

"Repository-level issue resolution is fundamentally not optimization under fixed tests, but search over evolving behavioral constraints."

When fixing code, the behavioral constraints (tests) should evolve alongside the fix.

Agent-CoEvo Framework

A coevolutionary multi-agent system where:

Code agents propose and refine patches
Test agents propose and refine test modifications
Mutual evaluation — Each evaluates the other
Semantic recombination — Best elements are combined
Iterative refinement — Both code and tests improve together

Results

Benchmark	Metric	Agent-CoEvo
SWE-bench Lite	Repair success	SOTA
SWT-bench Lite	Repair success	SOTA
Test quality	Reproduction	Improved

Why This Matters

Real-world relevance — Matches how actual engineers fix bugs
AI code tools — Directly applicable to Claude Code, Codex, Copilot
SWE-bench — The standard benchmark for AI code repair
Paradigm shift — From "code-only optimization" to "coevolution of implementation and specification"

↗ Original source · 2026-04-07T00:00:00.000Z

Comments0