Flow Map Language Models: Generate Coherent Text in a Single Forward Pass with 8x Speedup
Available in: 中文
Researchers have demonstrated that continuous flow-based language models can generate coherent, high-quality text in a single forward pass — achieving 8x speedup over the best distilled discrete di...
Researchers have demonstrated that continuous flow-based language models can generate coherent, high-quality text in a single forward pass — achieving 8x speedup over the best distilled discrete diffusion baselines.
The Problem
Autoregressive LLMs (ChatGPT, Claude) generate text one token at a time:
- N tokens = N sequential forward passes
- Sequential bottleneck — Cannot parallelize within a generation
- Latency — Even with fast inference, long outputs take time
Discrete diffusion models promised parallel generation but failed with few sampling steps — falling apart when trying to do 1-2 steps.
The Solution: Continuous Flows
The key insight: replace discrete jumps with continuous flows on the probability simplex.
| Approach | Steps Needed | Quality at 1 Step | Speedup |
|---|---|---|---|
| Autoregressive | N steps | N/A (sequential) | 1x |
| Discrete diffusion | 50-1000 steps | Poor | Varies |
| Continuous flow (FMLM) | 1 step | High quality | 8x |
How It Works
- Start from random noise on the probability simplex (vocabulary distribution)
- One forward pass through the model transforms noise → coherent text
- Classification, not regression — The denoiser predicts discrete tokens, not continuous values
- Flow maps act as "teleportation" from noise to the final answer
Technical Innovation
- Simplex-based flow — Operates on the probability simplex (vocab distributions per position)
- Discrete denoiser — Uses classification (predicting token IDs) rather than regression
- Continuous paths — Smooth interpolation between noise and real text avoids the discretization artifacts that plague discrete diffusion
Results
- 8x faster than best distilled discrete diffusion baselines
- Matches 8-step discrete diffusion quality in just 1 step
- High-quality output — Coherent text generation, not gibberish
Why It Matters
If this approach scales to large models:
- Real-time generation — 8x faster text generation for chatbots, code assistants
- Parallel processing — All token positions computed simultaneously
- Cost reduction — Fewer forward passes = less compute = cheaper inference
- Architecture shift — May eventually challenge the autoregressive paradigm that dominates current LLMs
← Previous: Cambodia Unveils Statue of Magawa the Heroic Landmine-Sniffing Rat (200 Points on HN)Next: Fox News Partners with Kalshi to Integrate Prediction Market Forecasts Across All Platforms →
0