What is Gemma 4? Google's Open Model (April 2026)
What is Gemma 4? Google’s Open Model (April 2026)
Gemma 4 is Google DeepMind’s most capable family of open-weight AI models, released on April 2, 2026 under the Apache 2.0 license. It comes in four sizes — two edge models (E2B, E4B), a 26B MoE, and a 31B Dense — all natively multimodal and backed by first-class tooling across Ollama, vLLM, MLX, and Hugging Face.
Last verified: April 19, 2026
The quick answer
Gemma 4 is the best open-weight model you can run on your own hardware in April 2026. The 31B model ranks #3 on the open-model Arena leaderboard, the 26B MoE offers the best quality-per-compute in its class, and the E2B/E4B edge models can run on phones and laptops. License is full Apache 2.0 with no commercial restrictions.
Why it matters
Open-weights AI has been split between:
- Meta’s Llama — largest, but locked behind a 700M-MAU community license
- Alibaba’s Qwen — strong coding, but some China-aligned safety tuning
- Mistral / DeepSeek / others — niche strengths, smaller ecosystems
Gemma 4 is the first model that is simultaneously: Apache 2.0, state-of-the-art per parameter, natively multimodal, and backed by Google’s ecosystem (Vertex AI, Kaggle, AI Studio, Android AICore).
That combination makes it the new default open model for most new projects.
The four Gemma 4 sizes
| Size | Params (active) | Context | Min VRAM (Q4) | Target hardware |
|---|---|---|---|---|
| E2B | 2B (0.5B active) | 128K | 3 GB | Phones, any Mac |
| E4B | 4B (1B active) | 128K | 5 GB | 16 GB Mac, GTX 4060 |
| 26B | 26B (4B active) MoE | 256K | 16 GB | RTX 4090, M4 Pro 48GB |
| 31B | 31B Dense | 256K | 20 GB | RTX 4090, M3/M4 Max 64GB |
All four sizes ship in both base and instruction-tuned variants.
Benchmarks
| Benchmark | Gemma 4 31B | Qwen 3.5 35B | Llama 4 400B | DeepSeek V4 |
|---|---|---|---|---|
| AIME 2026 | 89.2% | 86.7% | 88.3% | 42.5% |
| LiveCodeBench v6 | 80.0% | 82.4% | 77.1% | 68.0% |
| MMLU-Pro | 82.1% | 80.8% | 81.5% | 78.6% |
| GPQA Diamond | 75.3% | 73.6% | 74.8% | 70.1% |
| MMMU (vision) | 76.9% | 72.1% | 70.4% | — |
| Arena Elo (open) | #3 | #4 | #5 | #7 |
Gemma 4 31B leads math, multimodal, and general reasoning. Qwen 3.5 keeps the coding crown. Llama 4 400B is competitive only at its massive size.
Key features
- Apache 2.0 — fully open for commercial use, no MAU cap
- Multimodal: text, images, and audio all first-class inputs
- Long context: 128K on edge, 256K on 26B/31B
- Day-one tooling: official support in Ollama, vLLM, SGLang, MLX, llama.cpp, Hugging Face Transformers, PyTorch, JAX, Keras
- Enterprise-ready: Google ships Gemma 4 through Vertex AI with security, audit, and compliance
- MoE efficiency: the 26B-A4B variant uses 4B active parameters but performs like a 26B dense model
How to run Gemma 4
Ollama (easiest)
ollama run gemma4:31b
# or the MoE
ollama run gemma4:26b-a4b
# or the tiny edge model
ollama run gemma4:e4b
MLX (Apple Silicon)
pip install mlx-lm
mlx_lm.generate --model mlx-community/gemma-4-31b-4bit --prompt "Explain MoE"
vLLM (production GPU)
vllm serve google/gemma-4-31b --max-model-len 262144
Hugging Face Transformers
from transformers import pipeline
pipe = pipeline("text-generation", model="google/gemma-4-31b-it", device_map="auto")
print(pipe("Summarize Apache 2.0.")[0]["generated_text"])
Google AI Studio / Vertex AI
- Free in AI Studio (rate-limited)
- Paid via Vertex AI at Google’s standard open-model rates
Gemma 4 vs Gemini 3.1 Pro
| Dimension | Gemma 4 | Gemini 3.1 Pro |
|---|---|---|
| License | Apache 2.0 | Closed, API only |
| Weights | Downloadable | Not released |
| Max size | 31B | Undisclosed (frontier) |
| Context | 256K | 1M |
| Best at | Local / edge / self-hosted | Full frontier, long-context, video |
Gemma 4 is Google’s open line; Gemini is the closed frontier line. They target different users: Gemma 4 is for people who want to self-host; Gemini is for people who want the absolute best and are happy to pay API fees.
When to choose Gemma 4
- ✅ You need an open-weights model for self-hosting
- ✅ Commercial deployment without license friction
- ✅ Local / edge / on-device AI
- ✅ Privacy-sensitive workloads (no data leaves your server)
- ✅ Multimodal apps with limited budget
- ✅ Apple Silicon / consumer GPU target hardware
When to choose something else
- ❌ Pure coding assistant → Qwen 3.5 Coder 32B
- ❌ Maximum open scale → Llama 4 400B
- ❌ 1M+ context window → Gemini 3.1 Pro or Llama 4
- ❌ Best closed frontier → Claude Opus 4.7 / GPT-5.4
Bottom line
Gemma 4 is the new default open-weight model for April 2026 and beyond. Apache 2.0, natively multimodal, runs on consumer hardware, matches or beats everything in its size class, and backed by Google’s ecosystem. If you are starting a new self-hosted project — whether that is a local RAG app, a private agent, or an on-device assistant — start with Gemma 4.