Google has released Gemma 4, a new family of open artificial intelligence models designed to run directly on local devices, as the company expands its push into developer tools and reduces reliance on cloud-based AI services.
The models, announced April 2, are built for what Google describes as “advanced reasoning and agentic workflows,” signalling a shift beyond chatbots toward systems that can execute tasks and integrate into software environments. In its release, Google said “Gemma 4 delivers an unprecedented level of intelligence per parameter," enabling developers to run capable AI systems without the heavy computing costs typically associated with large models.
Gemma 4 comes in four sizes, from lightweight 2B and 4B models for phones and edge devices to 26B and 31B models that fit on a single 80GB GPU. The smaller models offer responsive offline AI, while the larger versions deliver reasoning and memory rivaling much bigger proprietary systems. For everyday workflows, this can speed up real tasks like coding locally, processing long documents, or handling audio and visual inputs without relying on external servers.
1. It’s Google’s most capable open model yet
Google describes Gemma 4 as its “most intelligent open models to date,” built for advanced reasoning and agentic workflows. The company says it delivers “an unprecedented level of intelligence-per-parameter,” meaning it can handle complex tasks without requiring massive compute. This positions it as a more efficient alternative to larger models. It also builds on strong adoption, with over 400 million downloads of earlier Gemma versions. For developers, this signals a maturing open AI ecosystem backed by Google.
2. It’s designed to run on your own hardware
Unlike many AI models that rely on cloud infrastructure, Gemma 4 is built to run locally across a range of devices. These include smartphones, laptops, and GPUs, depending on the model size. This makes it easier to build AI tools that don’t depend on constant internet access. It also reduces latency and improves speed for real-time tasks. For teams handling sensitive data, it offers more control and privacy.
3. It comes in four model sizes for different use cases
Gemma 4 is available in four variants: 2B, 4B, 26B, and 31B. The smaller models are optimised for mobile and edge devices, while the larger ones deliver more advanced reasoning on higher-end hardware. Google says the 31B model ranks among the top open models globally, outperforming systems much larger than it. This gives developers flexibility depending on their needs and resources. It also lowers the barrier to entry for building with AI.
4. It goes beyond chat to power AI agents and workflows
Gemma 4 is built for more than just text generation. It supports multi-step reasoning, structured outputs, and function calling, enabling developers to build AI agents that can execute tasks. This includes automating workflows, interacting with APIs, and handling complex processes. It also supports multimodal inputs like images, video, and audio. The goal is to make AI more useful in real-world applications, not just conversations.
5. It’s open and can be freely modified
Gemma 4 is released under an Apache 2.0 license, meaning developers can use, modify, and deploy it without restrictions. This gives teams full control over how they integrate AI into their products. It also avoids vendor lock-in, a key concern with proprietary models. Google says this approach is meant to support a more collaborative AI ecosystem. For developers, it opens up more flexibility to experiment and build at scale.
