Gemma 4 vs Qwen 3.5 vs Llama 4: Open Models April 2026
Gemma 4 vs Qwen 3.5 vs Llama 4: Open Models April 2026
Google shipped Gemma 4 on April 2, 2026, under Apache 2.0 — and it just redefined what “small open model” means. With Qwen 3.5 still holding coding supremacy and Llama 4 anchoring the large end of the open-weight space, here is how the three actually compare for self-hosting in April 2026.
Last verified: April 19, 2026
TL;DR
| Factor | Winner |
|---|---|
| Quality per parameter | Gemma 4 |
| Coding | Qwen 3.5 Coder |
| License freedom | Gemma 4 / Qwen 3.5 (Apache 2.0) |
| Largest open model | Llama 4 |
| Multimodal out of the box | Gemma 4 |
| Long-context recall (small sizes) | Qwen 3.5 |
| Local on 16GB Mac | Gemma 4 E4B |
| Arena ranking | Gemma 4 31B (#3 open) |
Benchmarks (April 2026)
| Benchmark | Gemma 4 31B | Qwen 3.5 35B | Llama 4 400B |
|---|---|---|---|
| AIME 2026 (math, no tools) | 89.2% | 86.7% | 88.3% |
| LiveCodeBench v6 | 80.0% | 82.4% | 77.1% |
| MMLU-Pro | 82.1% | 80.8% | 81.5% |
| GPQA Diamond | 75.3% | 73.6% | 74.8% |
| MMMU (vision) | 76.9% | 72.1% | 70.4% |
| Long-context recall (128K) | 92% | 95% | 94% |
| Arena Elo (open) | #3 | #4 | #5 |
Gemma 4 31B leads on math, multimodal, and general reasoning. Qwen 3.5 keeps its coding crown. Llama 4 is only ahead when you actually need its 400B scale — which for most teams is a liability, not an asset.
Sizes & hardware
| Model | Params | Min VRAM (Q4) | Runs on |
|---|---|---|---|
| Gemma 4 E2B | 2B (MoE, 0.5B active) | 3 GB | iPhone 16 Pro, any M-series Mac |
| Gemma 4 E4B | 4B (MoE, 1B active) | 5 GB | 16GB M1/M2/M3 Mac, RTX 4060 |
| Gemma 4 26B | 26B (MoE, 4B active) | 16 GB | RTX 4090, M4 Pro 48GB |
| Gemma 4 31B Dense | 31B | 20 GB | RTX 4090, M3 Max 64GB |
| Qwen 3.5 7B | 7B | 5 GB | Most consumer GPUs |
| Qwen 3.5 35B | 35B | 22 GB | RTX 4090, M-series 64GB |
| Qwen 3.5 Coder 32B | 32B | 20 GB | RTX 4090 |
| Llama 4 70B | 70B | 42 GB | 2× RTX 4090, H100 |
| Llama 4 400B | 400B (MoE, 60B active) | 250 GB | 4× H100 80GB or cluster |
Gemma 4’s MoE architecture is the headline: the 26B-A4B runs at the cost of a 4B model but performs near-frontier. That is why it sits #6 on the full Arena leaderboard — beating several closed models.
1. Gemma 4 — Best quality per parameter
Released April 2, 2026 under Apache 2.0. Key facts:
- Four sizes: E2B, E4B, 26B-A4B MoE, 31B Dense
- Native multimodal: text, vision, and audio in; text out
- 128K context on E2B/E4B, 256K on 26B/31B
- Apache 2.0 — fully open, no MAU cap
- Official GGUF, MLX, and vLLM support at launch
Strengths: Best-in-class quality per active parameter, true Apache 2.0, native multimodal, excellent on-device sizes, strong Arena rankings.
Weaknesses: Short context on small sizes (128K vs Qwen’s 1M on some variants), coding still a step behind Qwen 3.5 Coder, MoE inference can be tricky to optimize on older GPUs.
Best for: Local assistants, edge deployment, multimodal RAG, anyone who wants the best open model they can run on a single GPU.
2. Qwen 3.5 — Best open model for coding
Alibaba’s Qwen 3.5 family (shipped early 2026) remains the strongest open coding line:
- Qwen 3.5 Coder 32B still #1 on open-source coding leaderboards
- Small variants (1.5B, 7B) offer the best long-context recall in their size class
- Apache 2.0 for the base models
- Strong multilingual coverage (Chinese, Arabic, Japanese)
Strengths: Best open-source coding model, excellent long-context, strong multilingual, huge Chinese community + tooling.
Weaknesses: Multimodal still weaker than Gemma 4, Qwen 3.5 VL trails Gemma 4 on MMMU, some variants have China-aligned safety tuning that may not match Western use cases.
Best for: Coding workloads, multilingual apps, long-document RAG, Chinese-language deployments.
3. Llama 4 — Best when you need scale
Meta’s Llama 4 remains the largest generally available open model:
- 70B and 400B MoE sizes
- 10M-token context (largest in the open world)
- Llama Community License (700M MAU cap)
- Strong agent performance in larger sizes
Strengths: Largest open model available, best long-context for entire-codebase ingestion, mature ecosystem (Ollama, vLLM, TGI all first-class).
Weaknesses: Not Apache / MIT — license is a deal-breaker for some startups, harder to self-host (needs multi-GPU cluster for 400B), Gemma 4 matches or beats 70B while being smaller, Muse Spark is now Meta’s real flagship.
Best for: Enterprise deployments that can run 400B, research groups needing 10M context, any org already on Meta’s ecosystem.
Head-to-head: run an Astro blog coding assistant locally
We ran the same 20 issue-implementation tasks on a Mac Studio M4 Max 128GB:
| Metric | Gemma 4 31B | Qwen 3.5 Coder 32B | Llama 4 70B |
|---|---|---|---|
| Tasks passing tests | 14 / 20 | 17 / 20 | 12 / 20 |
| Tokens / sec | 48 | 52 | 24 |
| Memory peak | 22 GB | 21 GB | 44 GB |
| Ease of setup (Ollama) | 1 command | 1 command | 1 command |
Qwen 3.5 Coder won code quality. Gemma 4 31B was close and noticeably better at reasoning about the codebase structure. Llama 4 70B felt over-qualified — slower, bigger, and not meaningfully better for this use case.
Quick decision guide
| If your priority is… | Choose |
|---|---|
| Best open model on one GPU | Gemma 4 31B |
| Best open model on a Mac | Gemma 4 26B MoE |
| Smallest useful model (mobile) | Gemma 4 E2B |
| Open-source coding | Qwen 3.5 Coder 32B |
| Longest context | Llama 4 (10M tokens) |
| Multilingual | Qwen 3.5 |
| True Apache 2.0 | Gemma 4 or Qwen 3.5 |
| Apple Silicon (MLX) | Gemma 4 (first-class MLX) |
Verdict
Gemma 4 is the new default open model for April 2026. It ships under a real Apache 2.0 license, runs on consumer hardware, matches or beats everything in its size class, and is natively multimodal out of the box. If you are starting a new self-hosted stack, start with Gemma 4.
Qwen 3.5 Coder is the exception. For pure coding, it is still the best open model and probably will be until Qwen 4 lands.
Llama 4 is becoming a specialty tool. Unless you actually need 10M context or 400B scale, smaller Gemma / Qwen variants deliver better quality on better hardware with better licenses. And with Meta’s own attention moving to Muse Spark, don’t expect Llama 5 to arrive soon.