Gemini 3.1 Flash-Lite: Built for intelligence at scale
Google launches Gemini 3.1 Flash-Lite, its fastest and most cost-efficient Gemini 3 series model, priced at $0.25/1M input tokens with 2.5X faster Time to First Token than 2.5 Flash.
Cost-Efficiency Without Compromise
Priced at just $0.25 per million input tokens and $1.50 per million output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models. It outperforms 2.5 Flash with:
- 2.5X faster Time to First Answer Token
- 45% increase in output speed (Artificial Analysis benchmark)
- Elo score of 1432 on Arena.ai Leaderboard
- 86.9% on GPQA Diamond and 76.8% on MMMU Pro — surpassing larger prior-generation models
Adaptive Intelligence for Developers
3.1 Flash-Lite comes with configurable thinking levels, giving developers control over how much reasoning the model applies per task. This makes it versatile for both simple and complex workloads:
- High-volume tasks: Translation, content moderation, sorting
- Complex workloads: Generating UIs, dashboards, simulations, instruction following
Early adopters including Latitude, Cartwheel, and Whering are already using 3.1 Flash-Lite at scale.
Availability
Available in preview via Gemini API in Google AI Studio and Vertex AI for enterprises.
Source: Google DeepMind Blog