Gemma 4 is Google DeepMind's newest family of open-weight AI models, released April 2, 2026, under Apache 2.0. It ships in four sizes — E2B (2B), E4B (4B), 26B MoE, and 31B Dense — all natively multimodal (text + vision + audio) with 128K–256K context windows. The 31B model ranks #3 on the open-model Arena leaderboard.

Yes. Gemma 4 is released under the Apache 2.0 license — fully open for commercial use with no monthly-active-user cap (unlike Meta's Llama Community License). You can download weights from Hugging Face, run them locally via Ollama, vLLM, MLX, or llama.cpp, or use Google's Vertex AI and AI Studio.

How does Gemma 4 compare to Llama 4?

On quality per parameter, Gemma 4 beats Llama 4. The 31B Gemma 4 scores 89.2% on AIME 2026 and 80.0% on LiveCodeBench v6, beating Llama 4's 88.3% and 77.1% — despite being roughly 13× smaller. Llama 4 still wins when you need 10M-token context or 400B-scale MoE.

Can I run Gemma 4 on a Mac?

Yes, easily. Gemma 4 E2B runs on any M-series Mac (3 GB RAM). E4B runs on a 16 GB MacBook. The 26B MoE runs on an M4 Pro 48 GB. The 31B Dense model needs an M3 Max or M4 Max with 64 GB. First-class MLX support means Apple Silicon is a supported target, not an afterthought.

Quick Answer

What is Gemma 4? Google's Open Model (April 2026)

Published: April 19, 2026

What is Gemma 4? Google’s Open Model (April 2026)

Gemma 4 is Google DeepMind’s most capable family of open-weight AI models, released on April 2, 2026 under the Apache 2.0 license. It comes in four sizes — two edge models (E2B, E4B), a 26B MoE, and a 31B Dense — all natively multimodal and backed by first-class tooling across Ollama, vLLM, MLX, and Hugging Face.

Last verified: April 19, 2026

The quick answer

Gemma 4 is the best open-weight model you can run on your own hardware in April 2026. The 31B model ranks #3 on the open-model Arena leaderboard, the 26B MoE offers the best quality-per-compute in its class, and the E2B/E4B edge models can run on phones and laptops. License is full Apache 2.0 with no commercial restrictions.

Why it matters

Open-weights AI has been split between:

Meta’s Llama — largest, but locked behind a 700M-MAU community license
Alibaba’s Qwen — strong coding, but some China-aligned safety tuning
Mistral / DeepSeek / others — niche strengths, smaller ecosystems

Gemma 4 is the first model that is simultaneously: Apache 2.0, state-of-the-art per parameter, natively multimodal, and backed by Google’s ecosystem (Vertex AI, Kaggle, AI Studio, Android AICore).

That combination makes it the new default open model for most new projects.

The four Gemma 4 sizes

Size	Params (active)	Context	Min VRAM (Q4)	Target hardware
E2B	2B (0.5B active)	128K	3 GB	Phones, any Mac
E4B	4B (1B active)	128K	5 GB	16 GB Mac, GTX 4060
26B	26B (4B active) MoE	256K	16 GB	RTX 4090, M4 Pro 48GB
31B	31B Dense	256K	20 GB	RTX 4090, M3/M4 Max 64GB

All four sizes ship in both base and instruction-tuned variants.

Benchmarks

Benchmark	Gemma 4 31B	Qwen 3.5 35B	Llama 4 400B	DeepSeek V4
AIME 2026	89.2%	86.7%	88.3%	42.5%
LiveCodeBench v6	80.0%	82.4%	77.1%	68.0%
MMLU-Pro	82.1%	80.8%	81.5%	78.6%
GPQA Diamond	75.3%	73.6%	74.8%	70.1%
MMMU (vision)	76.9%	72.1%	70.4%	—
Arena Elo (open)	#3	#4	#5	#7

Gemma 4 31B leads math, multimodal, and general reasoning. Qwen 3.5 keeps the coding crown. Llama 4 400B is competitive only at its massive size.

Key features

Apache 2.0 — fully open for commercial use, no MAU cap
Multimodal: text, images, and audio all first-class inputs
Long context: 128K on edge, 256K on 26B/31B
Day-one tooling: official support in Ollama, vLLM, SGLang, MLX, llama.cpp, Hugging Face Transformers, PyTorch, JAX, Keras
Enterprise-ready: Google ships Gemma 4 through Vertex AI with security, audit, and compliance
MoE efficiency: the 26B-A4B variant uses 4B active parameters but performs like a 26B dense model

How to run Gemma 4

Ollama (easiest)

ollama run gemma4:31b
# or the MoE
ollama run gemma4:26b-a4b
# or the tiny edge model
ollama run gemma4:e4b

MLX (Apple Silicon)

pip install mlx-lm
mlx_lm.generate --model mlx-community/gemma-4-31b-4bit --prompt "Explain MoE"

vLLM (production GPU)

vllm serve google/gemma-4-31b --max-model-len 262144

Hugging Face Transformers

from transformers import pipeline
pipe = pipeline("text-generation", model="google/gemma-4-31b-it", device_map="auto")
print(pipe("Summarize Apache 2.0.")[0]["generated_text"])

Google AI Studio / Vertex AI

Free in AI Studio (rate-limited)
Paid via Vertex AI at Google’s standard open-model rates

Gemma 4 vs Gemini 3.1 Pro

Dimension	Gemma 4	Gemini 3.1 Pro
License	Apache 2.0	Closed, API only
Weights	Downloadable	Not released
Max size	31B	Undisclosed (frontier)
Context	256K	1M
Best at	Local / edge / self-hosted	Full frontier, long-context, video

Gemma 4 is Google’s open line; Gemini is the closed frontier line. They target different users: Gemma 4 is for people who want to self-host; Gemini is for people who want the absolute best and are happy to pay API fees.

When to choose Gemma 4

✅ You need an open-weights model for self-hosting
✅ Commercial deployment without license friction
✅ Local / edge / on-device AI
✅ Privacy-sensitive workloads (no data leaves your server)
✅ Multimodal apps with limited budget
✅ Apple Silicon / consumer GPU target hardware

When to choose something else

❌ Pure coding assistant → Qwen 3.5 Coder 32B
❌ Maximum open scale → Llama 4 400B
❌ 1M+ context window → Gemini 3.1 Pro or Llama 4
❌ Best closed frontier → Claude Opus 4.7 / GPT-5.4

Bottom line

Gemma 4 is the new default open-weight model for April 2026 and beyond. Apache 2.0, natively multimodal, runs on consumer hardware, matches or beats everything in its size class, and backed by Google’s ecosystem. If you are starting a new self-hosted project — whether that is a local RAG app, a private agent, or an on-device assistant — start with Gemma 4.