Rust's New `become` Keyword Enables Tail-Call Interpreter That Outperforms Hand-Written Assembly

Available in: 中文
2026-04-05T23:17:51.116Z·2 min read
Traditional interpreters suffer from dispatch overhead. Each instruction execution requires the CPU to:

Tail-Call Optimization Comes to Rust

Rust's nightly compiler now supports the become keyword for tail calls, enabling a fundamentally new approach to writing interpreters. Developer Matt Keeter demonstrates how this feature allows building a CPU emulator that achieves performance rivaling hand-written ARM64 assembly — all in safe-ish Rust.

The Problem: Interpreter Dispatch Overhead

Traditional interpreters suffer from dispatch overhead. Each instruction execution requires the CPU to:

  1. Read an opcode from memory
  2. Branch to the handler function
  3. Return to the main loop
  4. Repeat

This creates poor branch prediction and register spilling. Assembly developers solved this with threaded code — each instruction handler ends with a direct jump to the next instruction, distributing dispatch across all handlers.

Enter become

Rust's become keyword enables proper tail-call elimination at the language level:

fn run(core: &mut Uxn, dev: &mut Device, pc: u16) {
    let op = core.next(&mut pc);
    core.op(op, dev, pc, |next_pc| run(core, dev, next_pc));
}

The key insight: become lets each instruction handler become the next handler without growing the stack, just like threaded assembly.

Benchmark Results

The tail-call Uxn VM benchmarked against Keeter's previous implementations:

ImplementationRelative Speed
Original Rust (match dispatch)1.0x (baseline)
ARM64 assembly (threaded code)1.5x
x86-64 assembly (threaded code)2.0x
**Rust tail-call (become)**1.4-1.8x

The tail-call version achieves 90-95% of assembly performance while remaining readable, maintainable, and mostly safe Rust.

Why This Matters

The become keyword was merged into Rust's nightly branch seven months ago via RFC PR #144232. While still unstable, it represents a significant step toward making Rust competitive with C/assembly for systems programming.

Source: Matt Keeter's Blog

← Previous: Tiny Corp TinyGPU: Building Affordable AI Hardware Outside the Nvidia EcosystemNext: Qwen-3.6-Plus Becomes First AI Model to Process Over 1 Trillion Tokens in a Single Day →
Comments0