Gemini 3.1 Flash-Lite: Built for intelligence at scale

2026-03-17T02:59:31.000Z·★ 100·1 min read

Gemini 3.1 Flash-Lite delivers 2.5X faster first-token speed than 2.5 Flash at $0.25/1M input tokens, with state-of-the-art benchmark scores for its tier.

Google launches Gemini 3.1 Flash-Lite, its fastest and most cost-efficient Gemini 3 series model, priced at $0.25/1M input tokens with 2.5X faster Time to First Token than 2.5 Flash.

Cost-Efficiency Without Compromise

Priced at just $0.25 per million input tokens and $1.50 per million output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models. It outperforms 2.5 Flash with:

2.5X faster Time to First Answer Token
45% increase in output speed (Artificial Analysis benchmark)
Elo score of 1432 on Arena.ai Leaderboard
86.9% on GPQA Diamond and 76.8% on MMMU Pro — surpassing larger prior-generation models

Adaptive Intelligence for Developers

3.1 Flash-Lite comes with configurable thinking levels, giving developers control over how much reasoning the model applies per task. This makes it versatile for both simple and complex workloads:

High-volume tasks: Translation, content moderation, sorting
Complex workloads: Generating UIs, dashboards, simulations, instruction following

Early adopters including Latitude, Cartwheel, and Whering are already using 3.1 Flash-Lite at scale.

Availability

Available in preview via Gemini API in Google AI Studio and Vertex AI for enterprises.

Source: Google DeepMind Blog

↗ Original source

ai technology

Comments0

Gemini 3.1 Flash-Lite: Built for intelligence at scale

Cost-Efficiency Without Compromise

Adaptive Intelligence for Developers

Availability

Related Articles