Best Open-Weights Coding Models May 2026: Top 5 Ranked
Best Open-Weights Coding Models May 2026: Top 5 Ranked
Five Chinese open-weights coding models lead the May 2026 stack, all within a 5-7 percentage point gap to Claude Opus 4.7 on SWE-Bench Pro at 5-15% of the cost. Picked correctly, they replace 70-80% of frontier-model API spend. Here’s how to choose between DeepSeek V4 Pro, Kimi K2.6, GLM-5.1, Qwen 3.6 Plus, and MiniMax M2.7 — and when to escalate to Claude Opus 4.7 or Mythos Preview.
Last verified: May 5, 2026
The ranking
| # | Model | Best For | SWE-Bench Pro | License | Output $/1M |
|---|---|---|---|---|---|
| 1 | Kimi K2.6 | Best price/performance | 58.6% | Modified MIT | $0.95 |
| 2 | DeepSeek V4 Pro Max | Best agentic / long-horizon | ~58% | DeepSeek License | ~$1.50 |
| 3 | GLM-5.1 | Best self-hosted enterprise | 58.4% | MIT | ~$1.20 |
| 4 | Qwen 3.6 Plus | Best multilingual + tool use | ~57% | Apache 2.0 | ~$1.00 |
| 5 | MiniMax M2.7 | Best native multimodal + voice | — | Restrictive | ~$1.10 |
Sources: BenchLM Chinese leaderboard (April 2026), Atlas Cloud comparison (April 2026), artificialanalysis.ai (April 2026), Arena.ai Code Arena (April 26, 2026).
#1: Kimi K2.6 (Moonshot AI)
Released: April 20, 2026.
Why #1: Best price/performance. SWE-Bench Pro 58.6%, Code Arena WebDev #6 at 1,529 Elo, $0.95 per 1M output tokens. Fits the cost-sensitive coding-agent default for 80% of teams.
Strengths:
- Cheapest top-tier open-weights model.
- Mature ecosystem (broad provider availability: Atlas Cloud, Together AI, DeepInfra, OpenRouter).
- 256K context window.
- Self-hostable on 4× H200.
Weaknesses:
- Modified MIT license (some commercial-use ambiguity vs pure MIT).
- Trails DeepSeek V4 Pro Max on agentic GDPval-AA score (1484 vs 1554).
- 256K context vs Opus 4.7’s 1M.
Pick if: You want the best balance of capability, price, and ecosystem maturity for general coding workloads.
Detailed comparison: Kimi K2.6 vs Claude Opus 4.7
#2: DeepSeek V4 Pro Max (DeepSeek)
Released: April 2026 (V4 Pro Max, V4 Pro, V4 Flash variants).
Why #2: Best agentic real-world performance. GDPval-AA score of 1554 leads all open-weights models. DeepSeek V4 Pro Max ties Kimi K2.6 on the BenchLM aggregate at 87.
Strengths:
- Best GDPval-AA score among open weights (1554) — leads on agentic real-world tasks.
- 1M-token context window matches Opus 4.7 and GPT-5 Pro.
- Three variants (Pro Max, Pro, Flash) for cost/speed trade-offs.
- Strong fine-tune ecosystem.
Weaknesses:
- DeepSeek License has more commercial restrictions than MIT.
- Pricing varies more by host than Kimi or GLM.
- SWE-Bench Pro score is variant-dependent.
Pick if: You’re running long-horizon agentic workflows (multi-step coding agents, autonomous loops, RAG over large codebases) where real-world agentic performance matters more than benchmark SWE-Bench Pro scores on individual tickets.
#3: GLM-5.1 (Z.ai / Zhipu AI)
Released: April 7, 2026.
Why #3: Best self-hosted enterprise option. Pure MIT license, 754B parameter MoE, SWE-Bench Pro 58.4%.
Strengths:
- Pure MIT license — most enterprise-friendly licensing in the open-weights space.
- 754B parameters (largest by parameter count).
- Trained on Huawei Ascend chips (NVIDIA-independent stack).
- GDPval-AA 1535 (strong agentic).
Weaknesses:
- 754B parameters require substantial inference infrastructure.
- BenchLM aggregate (83) trails Kimi K2.6 and DeepSeek V4 Pro (both 87).
- Less mature ecosystem than Kimi or DeepSeek.
Pick if: You need self-hosted deployment for enterprise / sovereign / air-gapped use cases and prioritize maximum legal clarity (MIT license).
#4: Qwen 3.6 Plus (Alibaba)
Released: Q1 2026 (Qwen 3.6 family with Max-Preview and Plus variants).
Why #4: Best multilingual coding + tool use. Strong on agentic coding workloads with Apache 2.0 licensing.
Strengths:
- Apache 2.0 license (very enterprise-friendly).
- Best multilingual support (especially Chinese, Korean, Japanese).
- Strong tool-use behavior in agent loops.
- 3.6-35B-A3B + 3.6 Plus variants (MoE for efficiency).
Weaknesses:
- BenchLM Chinese leaderboard score (~79) trails the top 3.
- Smaller community than DeepSeek or Kimi.
- SWE-Bench Pro slightly below 58%.
Pick if: You need multilingual coding (especially CJK) or you want Apache 2.0 licensing for the cleanest legal posture.
#5: MiniMax M2.7 (MiniMax)
Released: Q1 2026.
Why #5: Best native multimodal + voice. The only top-5 open-weights model with strong native voice and multimodal handling.
Strengths:
- Native multimodal (text + image + voice) without separate models.
- MiniMax CLI agent tooling.
- Strong on voice-first applications.
Weaknesses:
- Open weights but more restrictive commercial license.
- Pure-coding benchmarks below the top 4.
- Smaller ecosystem outside MiniMax-hosted infrastructure.
Pick if: You’re building voice-first AI applications, multimodal agents (image + code), or products where MiniMax’s CLI tooling is a fit.
How to choose
A simple decision tree:
Need self-hosted with MIT license? → GLM-5.1
Need cheapest top-tier coding? → Kimi K2.6
Running long agentic loops? → DeepSeek V4 Pro Max
Multilingual or Apache 2.0 required? → Qwen 3.6 Plus
Voice / multimodal first? → MiniMax M2.7
None of the above? → Default to Kimi K2.6
For most teams, Kimi K2.6 is the default and you only deviate if a specific constraint (license, agent depth, multimodal) drives a different choice.
How they compare to closed frontier models
| Capability | Best Open-Weights | Closed Frontier | Gap |
|---|---|---|---|
| SWE-Bench Pro | Kimi K2.6 (58.6%) | Opus 4.7 (64.3%), Mythos Preview (~77.8%) | 5.7 / 19.2 points |
| Code Arena WebDev | Kimi K2.6 (1,529 Elo) | Opus 4.7 (1,565 Elo) | 36 Elo |
| 1M context | DeepSeek V4 Pro / GLM-5.1 | Opus 4.7, GPT-5 Pro | comparable |
| Tool use reliability | Strong | Best (Opus 4.7) | small |
| Output price ($/1M) | $0.95 (Kimi) | $75 (Opus 4.7) | 79x |
For 70-80% of coding workflows, open-weights are sufficient. For the hardest 10-20%, escalate to Opus 4.7 or Mythos Preview via a router pattern.
Bundled access via OpenCode Go
The cheapest entry point in May 2026 is OpenCode Go:
- $5 first month, $10/month thereafter.
- 5-hour rate limits across GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo-V2.5-Pro, MiMo-V2.5, Qwen 3.5 Plus, Qwen 3.6 Plus, MiniMax M2.5, MiniMax M2.7.
- Best for individual developers experimenting with the open-weights stack.
Per-token APIs from Atlas Cloud, Together AI, DeepInfra, and OpenRouter are better for production at scale.
Self-hosting cost analysis
Rough breakeven points for self-hosting vs hosted API (May 2026):
| Model | Hardware | Hourly cost | Breakeven |
|---|---|---|---|
| Kimi K2.6 | 4× H200 | ~$8 | ~30B tokens/month |
| GLM-5.1 | 8× H200 | ~$16 | ~50B tokens/month |
| DeepSeek V4 Pro | 8× H200 | ~$16 | ~50B tokens/month |
| Qwen 3.6 Plus | 4× H200 | ~$8 | ~30B tokens/month |
Above the breakeven, self-hosting wins on cost and gives you full data control. Below, hosted APIs are simpler.
Bottom line
In May 2026, Kimi K2.6 is the default open-weights coding model, with DeepSeek V4 Pro Max as the agentic specialist, GLM-5.1 as the self-hosted enterprise pick, Qwen 3.6 Plus as the multilingual specialist, and MiniMax M2.7 as the multimodal specialist. All five are within 5-7 percentage points of Claude Opus 4.7 on SWE-Bench Pro at 5-15% of the cost. For high-volume coding workloads, the right answer is a router that defaults to open-weights and escalates to Opus 4.7 or Mythos Preview only for hardest tasks.
Sources: BenchLM.ai Chinese leaderboard (April 2026), Atlas Cloud comparison (April 2026), artificialanalysis.ai (April 2026), DeepLearning.AI The Batch (April 2026), Arena.ai Code Arena (April 26, 2026), opencode.ai/go pricing (May 2026), llm-stats.com (May 2026).