AI agents · OpenClaw · self-hosting · automation

Quick Answer

What is Gemma 4? Google's Open Model (April 2026)

Published:

What is Gemma 4? Google’s Open Model (April 2026)

Gemma 4 is Google DeepMind’s most capable family of open-weight AI models, released on April 2, 2026 under the Apache 2.0 license. It comes in four sizes — two edge models (E2B, E4B), a 26B MoE, and a 31B Dense — all natively multimodal and backed by first-class tooling across Ollama, vLLM, MLX, and Hugging Face.

Last verified: April 19, 2026

The quick answer

Gemma 4 is the best open-weight model you can run on your own hardware in April 2026. The 31B model ranks #3 on the open-model Arena leaderboard, the 26B MoE offers the best quality-per-compute in its class, and the E2B/E4B edge models can run on phones and laptops. License is full Apache 2.0 with no commercial restrictions.

Why it matters

Open-weights AI has been split between:

  • Meta’s Llama — largest, but locked behind a 700M-MAU community license
  • Alibaba’s Qwen — strong coding, but some China-aligned safety tuning
  • Mistral / DeepSeek / others — niche strengths, smaller ecosystems

Gemma 4 is the first model that is simultaneously: Apache 2.0, state-of-the-art per parameter, natively multimodal, and backed by Google’s ecosystem (Vertex AI, Kaggle, AI Studio, Android AICore).

That combination makes it the new default open model for most new projects.

The four Gemma 4 sizes

SizeParams (active)ContextMin VRAM (Q4)Target hardware
E2B2B (0.5B active)128K3 GBPhones, any Mac
E4B4B (1B active)128K5 GB16 GB Mac, GTX 4060
26B26B (4B active) MoE256K16 GBRTX 4090, M4 Pro 48GB
31B31B Dense256K20 GBRTX 4090, M3/M4 Max 64GB

All four sizes ship in both base and instruction-tuned variants.

Benchmarks

BenchmarkGemma 4 31BQwen 3.5 35BLlama 4 400BDeepSeek V4
AIME 202689.2%86.7%88.3%42.5%
LiveCodeBench v680.0%82.4%77.1%68.0%
MMLU-Pro82.1%80.8%81.5%78.6%
GPQA Diamond75.3%73.6%74.8%70.1%
MMMU (vision)76.9%72.1%70.4%
Arena Elo (open)#3#4#5#7

Gemma 4 31B leads math, multimodal, and general reasoning. Qwen 3.5 keeps the coding crown. Llama 4 400B is competitive only at its massive size.

Key features

  • Apache 2.0 — fully open for commercial use, no MAU cap
  • Multimodal: text, images, and audio all first-class inputs
  • Long context: 128K on edge, 256K on 26B/31B
  • Day-one tooling: official support in Ollama, vLLM, SGLang, MLX, llama.cpp, Hugging Face Transformers, PyTorch, JAX, Keras
  • Enterprise-ready: Google ships Gemma 4 through Vertex AI with security, audit, and compliance
  • MoE efficiency: the 26B-A4B variant uses 4B active parameters but performs like a 26B dense model

How to run Gemma 4

Ollama (easiest)

ollama run gemma4:31b
# or the MoE
ollama run gemma4:26b-a4b
# or the tiny edge model
ollama run gemma4:e4b

MLX (Apple Silicon)

pip install mlx-lm
mlx_lm.generate --model mlx-community/gemma-4-31b-4bit --prompt "Explain MoE"

vLLM (production GPU)

vllm serve google/gemma-4-31b --max-model-len 262144

Hugging Face Transformers

from transformers import pipeline
pipe = pipeline("text-generation", model="google/gemma-4-31b-it", device_map="auto")
print(pipe("Summarize Apache 2.0.")[0]["generated_text"])

Google AI Studio / Vertex AI

  • Free in AI Studio (rate-limited)
  • Paid via Vertex AI at Google’s standard open-model rates

Gemma 4 vs Gemini 3.1 Pro

DimensionGemma 4Gemini 3.1 Pro
LicenseApache 2.0Closed, API only
WeightsDownloadableNot released
Max size31BUndisclosed (frontier)
Context256K1M
Best atLocal / edge / self-hostedFull frontier, long-context, video

Gemma 4 is Google’s open line; Gemini is the closed frontier line. They target different users: Gemma 4 is for people who want to self-host; Gemini is for people who want the absolute best and are happy to pay API fees.

When to choose Gemma 4

  • ✅ You need an open-weights model for self-hosting
  • ✅ Commercial deployment without license friction
  • ✅ Local / edge / on-device AI
  • ✅ Privacy-sensitive workloads (no data leaves your server)
  • ✅ Multimodal apps with limited budget
  • ✅ Apple Silicon / consumer GPU target hardware

When to choose something else

  • ❌ Pure coding assistant → Qwen 3.5 Coder 32B
  • ❌ Maximum open scale → Llama 4 400B
  • ❌ 1M+ context window → Gemini 3.1 Pro or Llama 4
  • ❌ Best closed frontier → Claude Opus 4.7 / GPT-5.4

Bottom line

Gemma 4 is the new default open-weight model for April 2026 and beyond. Apache 2.0, natively multimodal, runs on consumer hardware, matches or beats everything in its size class, and backed by Google’s ecosystem. If you are starting a new self-hosted project — whether that is a local RAG app, a private agent, or an on-device assistant — start with Gemma 4.