Fannie Mae, Freddie Mac shares tumble after conservatorship comments
Investing.com -- Google (NASDAQ:GOOGL) has released the stable version of Gemini 2.5 Flash-Lite, completing its lineup of 2.5 models ready for production use. The new model is designed to balance performance and cost while maintaining quality for latency-sensitive tasks like translation and classification.
Gemini 2.5 Flash-Lite offers lower latency than previous 2.0 Flash-Lite and 2.0 Flash models across a broad range of prompts. It is priced at $0.10 per million input tokens and $0.40 per million output tokens, making it Google’s lowest-cost 2.5 model. The company has also reduced audio input pricing by 40% from the preview launch.
Despite its cost efficiency, the model demonstrates higher quality than 2.0 Flash-Lite across benchmarks including coding, math, science, reasoning, and multimodal understanding. Users get access to a 1 million-token context window, controllable thinking budgets, and support for native tools like Grounding with Google Search, Code Execution, and URL Context.
Several companies have already implemented the new model. Satlyt, a decentralized space computing platform, has seen a 45% reduction in latency for critical onboard diagnostics and a 30% decrease in power consumption. HeyGen uses the model to automate video planning and translate videos into over 180 languages. DocsHound leverages it to process long videos and extract screenshots with low latency, while Evertune uses it to analyze how brands are represented across AI models.
Developers can start using Gemini 2.5 Flash-Lite by specifying "gemini-2.5-flash-lite" in their code. Google plans to remove the preview alias on August 25, 2025. The model is available in Google AI Studio and Vertex (NASDAQ:VRTX) AI.
This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.