your123 - stock.adobe.com

Google advances Gemini with low-cost Flash-Lite 2.5

The cloud provider's model release targets businesses looking for performance, cost-effectiveness and precision.

With the general availability of Google's Gemini 2.5 Flash-Lite, enterprise developers now have more choice and tools when looking for a model that offers performance and low cost.

Gemini 2.5 Flash-Lite has lower latency than the tech giant's 2.0 Flash-Lite and 2.0 Flash. It's also Google's lowest-cost model, with a price of $0.10 per million input tokens and $0.40 per million output tokens. In comparison, Gemini 2.5 Flash costs $0.30 per million input tokens and $2.50 per million output tokens. Gemini 2.5 Flash-Lite also gives developers access to a million-token context window, controllable thinking budgets and support for tools like Grounding with Google Search, Code Execution and URL Context, Google said.

Gemini 2.5 Flash-Lite is an example of how generative AI vendors, including Google competitors OpenAI and Anthropic, are continuing to improve each of their main model families to fit enterprise needs.

Google released a preview of Gemini 2.5 Flash-Lite in June.

Striking a balance

With the model, Google is responding to the needs of enterprise developers, who are trying to balance the need for accuracy, speed and cost with the models they need to build their AI applications, said Arun Chandrasekaran, an analyst at Gartner.

"It is impossible to get all three right simultaneously," Chandrasekaran said. "If a model is very accurate, it's likely to be more expensive and slower."

I would like to believe that a lot of the use cases for this model might be in the language domain and perhaps in the coding domain.
Arun ChandrasekaranAnalyst, Gartner

Chandrasekaran said Google is betting on numerous applications, such as content generation, summarization and coding, for which customers would prefer a smaller model because of the speed and cost.

"I would like to believe that a lot of the use cases for this model might be in the language domain and perhaps in the coding domain," he said. "Google is making incremental releases with every model release."

While Google positions this model as a balance of performance and speed, enterprises should be more critical, said Rowan Curran, an analyst at Forrester Research.

"The way for enterprises to look at it is one tool in a toolkit when you're trying to look at the performance of various types of applications that rely on large language models," Curran said.

He added that there are times when enterprises need a model that has a high-quality response or produces a high-quality image or video output. For that, Veo 3 or the 2.5 Pro models might be good. Or maybe an enterprise developer is trying to do something quicker, so Veo 2 might be the best option.

"The continued improvements on each of these tiers of capabilities are really important for enterprises to continue to see moving forward because there is a varying landscape of needs for models depending upon what the use case or what part of the use case," Curran continued. "Enterprises should be seeking models that fit their use case."

Esther Shittu is an Informa TechTarget news writer and podcast host covering AI software and systems.

Dig Deeper on AI technologies