Google Debuts Gemini 3.1 Flash Lite, Emphasizing Speed and Lower Cost

Google unveiled Gemini 3.1 Flash Lite, positioning it as the fastest and most cost-effective model within the Gemini 3 lineup. Available in preview via the Gemini API in Google AI Studio and through Vertex AI for enterprises, the model carries lower per-token pricing and delivers measured gains in latency and output speed versus the prior 2.5 Flash release. Benchmarks show improvements on reasoning and multimodal tasks, and early adopters report performance comparable to larger models for complex inputs.

Key Points

Gemini 3.1 Flash Lite is available in preview through the Gemini API in Google AI Studio and via Vertex AI for enterprises.
Pricing is $0.25 per million input tokens and $1.50 per million output tokens, and Google positions the model as the fastest and most cost-efficient in the Gemini 3 series.
Google cites benchmark improvements over 2.5 Flash, including a 2.5x faster Time to First Answer Token and a 45% increase in output speed; the model also recorded an Elo score of 1432 on Arena.ai and scored 86.9% on GPQA Diamond and 76.8% on MMMU Pro.

Google announced Gemini 3.1 Flash Lite on Tuesday, introducing a lower-cost, higher-speed variant in its Gemini 3 family aimed at developers and enterprise customers. The model is accessible now in preview form through the Gemini API in Google AI Studio and is also offered to enterprise users on Vertex AI.

Pricing for Gemini 3.1 Flash Lite is set at $0.25 per million input tokens and $1.50 per million output tokens. Google characterized the model as its quickest and most cost-efficient option in the Gemini 3 series.

On performance, Google said Gemini 3.1 Flash Lite bests its predecessor, 2.5 Flash, according to internal benchmarking. The company reported a 2.5 times faster Time to First Answer Token and a 45% increase in output speed on the Artificial Analysis benchmark. In leaderboard and benchmark comparisons, the model achieved an Elo score of 1432 on the Arena.ai Leaderboard, posted 86.9% on GPQA Diamond, and scored 76.8% on MMMU Pro.

Google also stated that Gemini 3.1 Flash Lite surpasses some larger Gemini models from prior generations, including 2.5 Flash, on reasoning and multimodal understanding benchmarks. The company highlighted a set of dynamic thinking capabilities in the model that let developers tune how much processing the model applies to a given task.

Those dynamic thinking features are described as targeted at managing high-frequency workloads - examples provided include high-volume translation and content moderation - and at handling more complex, compute-intensive tasks such as generating user interfaces and creating simulations.

Several companies are already piloting or using the model in early stages. Google named Latitude, Cartwheel and Whering as users of Gemini 3.1 Flash Lite. Early testers cited by Google reported that the model handles complex inputs with a level of precision comparable to larger-tier models while maintaining adherence to instructions.

Key takeaways:

Gemini 3.1 Flash Lite is available in preview to developers via the Gemini API in Google AI Studio and to enterprises via Vertex AI.
The model is priced at $0.25 per million input tokens and $1.50 per million output tokens, and is promoted as the fastest, most cost-efficient in the Gemini 3 line.
Benchmark results supplied by Google include a 2.5x faster Time to First Answer Token and a 45% faster output speed versus 2.5 Flash, plus leaderboard and benchmark scores cited above.

Risks and uncertainties:

Performance claims rely on benchmark results presented by Google; independent verification of those specific metrics is not detailed in the announcement - this affects enterprise procurement and benchmarking processes in cloud AI services.
Early adopter feedback is limited to the companies named; broader enterprise-scale behavior, including instruction adherence and precision under diverse production workloads, remains to be seen - this has implications for operations relying on translation and content moderation.

Risks

Benchmark claims are presented by Google and may require independent validation for enterprise procurement decisions - this impacts cloud AI and enterprise software buyers.
Limited early adopter reporting leaves uncertainty about performance and instruction adherence at scale across varied production workloads such as high-volume translation and content moderation.

Menu

Key Points

Risks

More from Stock Markets