Google announced Gemini 3.1 Flash Lite on Tuesday, introducing a lower-cost, higher-speed variant in its Gemini 3 family aimed at developers and enterprise customers. The model is accessible now in preview form through the Gemini API in Google AI Studio and is also offered to enterprise users on Vertex AI.
Pricing for Gemini 3.1 Flash Lite is set at $0.25 per million input tokens and $1.50 per million output tokens. Google characterized the model as its quickest and most cost-efficient option in the Gemini 3 series.
On performance, Google said Gemini 3.1 Flash Lite bests its predecessor, 2.5 Flash, according to internal benchmarking. The company reported a 2.5 times faster Time to First Answer Token and a 45% increase in output speed on the Artificial Analysis benchmark. In leaderboard and benchmark comparisons, the model achieved an Elo score of 1432 on the Arena.ai Leaderboard, posted 86.9% on GPQA Diamond, and scored 76.8% on MMMU Pro.
Google also stated that Gemini 3.1 Flash Lite surpasses some larger Gemini models from prior generations, including 2.5 Flash, on reasoning and multimodal understanding benchmarks. The company highlighted a set of dynamic thinking capabilities in the model that let developers tune how much processing the model applies to a given task.
Those dynamic thinking features are described as targeted at managing high-frequency workloads - examples provided include high-volume translation and content moderation - and at handling more complex, compute-intensive tasks such as generating user interfaces and creating simulations.
Several companies are already piloting or using the model in early stages. Google named Latitude, Cartwheel and Whering as users of Gemini 3.1 Flash Lite. Early testers cited by Google reported that the model handles complex inputs with a level of precision comparable to larger-tier models while maintaining adherence to instructions.
Key takeaways:
- Gemini 3.1 Flash Lite is available in preview to developers via the Gemini API in Google AI Studio and to enterprises via Vertex AI.
- The model is priced at $0.25 per million input tokens and $1.50 per million output tokens, and is promoted as the fastest, most cost-efficient in the Gemini 3 line.
- Benchmark results supplied by Google include a 2.5x faster Time to First Answer Token and a 45% faster output speed versus 2.5 Flash, plus leaderboard and benchmark scores cited above.
Risks and uncertainties:
- Performance claims rely on benchmark results presented by Google; independent verification of those specific metrics is not detailed in the announcement - this affects enterprise procurement and benchmarking processes in cloud AI services.
- Early adopter feedback is limited to the companies named; broader enterprise-scale behavior, including instruction adherence and precision under diverse production workloads, remains to be seen - this has implications for operations relying on translation and content moderation.