Toward Automated Verification of Unreviewed AI-Generated Code
A developer proposes shifting from "I must always review AI-generated code" to "I must always verify AI-generated code" — using property-based testing, mutation testing, and constraint enforcement as alternatives to line-by-line human review.
The Experiment
Peter Lavigne had a coding agent generate a FizzBuzz solution, then iteratively verified it against four constraints — without ever reading the code line by line:
- Property-based tests — Using Hypothesis to verify correctness across randomized inputs, including edge cases and large values
- Mutation testing — Using mutmut to ensure the code couldn't be subtly altered without tests failing
- No side effects — Enforcing pure functions
- Static analysis — Type-checking and linting
The Key Insight
The experiment shifted his mindset: instead of reading every line, he verified the code was correct through machine-enforceable constraints. The remaining space of "invalid but passing" programs exists but is small and hard to land in by accident.
Maintainability Doesn't Matter
Perhaps controversially, Lavigne argues that maintainability and readability aren't relevant for verified AI-generated code: "We should treat the output like compiled code." If the constraints prove correctness, you can regenerate the code from scratch whenever changes are needed.
Current Limitations
The overhead of setting up these constraints currently outweighs the cost of just reading the code. But it establishes a baseline that improves as agents and tooling get better.
Implications
This approach suggests a future where AI-generated code is trusted through automated verification rather than human review — potentially enabling much faster software development for well-constrained problems.
Source: Peter Lavigne | HN: 56 points