What is the best open-weights coding model in May 2026?

DeepSeek V4 Pro Max leads agentic real-world performance at 1554 GDPval-AA. Kimi K2.6 leads price/performance at 58.6% SWE-Bench Pro for $0.95 per 1M output tokens. GLM-5.1 leads enterprise self-hosting at 754B parameters under MIT license. Qwen 3.6 Plus leads multilingual coding agents. MiniMax M2.7 leads native multimodal + voice. The best depends on your specific use case — there's no single winner in May 2026.

Are open-weights coding models as good as Claude Opus 4.7?

Almost. Top open-weights models score 58-58.6% on SWE-Bench Pro vs Claude Opus 4.7 at 64.3% — a 5-7 percentage point gap. On Arena.ai Code Arena WebDev, the best open-weights model (Kimi K2.6 at 1,529 Elo) sits 36 Elo behind Opus 4.7 (1,565). The gap is real but small. For 70-80% of coding tasks, open-weights are sufficient. For the hardest 20%, Opus 4.7 or Mythos Preview still win.

Which open-weights model has the most permissive license?

GLM-5.1 from Z.ai is released under pure MIT license — the most enterprise-friendly. Kimi K2.6 uses Modified MIT (with some restrictions on training competing frontier models). DeepSeek V4 uses the DeepSeek License (commercial-friendly with some restrictions). Qwen 3.6 uses Apache 2.0 with some clauses. MiniMax M2.7 has weights available with more restrictive commercial terms. For maximum legal clarity, GLM-5.1 wins.

What's the cheapest way to access these models?

OpenCode Go (opencode.ai/go) bundles GLM-5.1, Kimi K2.6, MiMo-V2.5-Pro, MiniMax M2.7, Qwen 3.6 Plus, and others for $5 first month / $10/month with 5-hour rate limits. For per-token APIs, Atlas Cloud, Together AI, and DeepInfra host most models at $0.30-$1.50 per 1M tokens. For self-hosting, Kimi K2.6 needs 4× H200; GLM-5.1 needs 8× H200 minimum due to 754B parameter count.

Quick Answer

Best Open-Weights Coding Models May 2026: Top 5 Ranked

Published: May 5, 2026

Best Open-Weights Coding Models May 2026: Top 5 Ranked

Five Chinese open-weights coding models lead the May 2026 stack, all within a 5-7 percentage point gap to Claude Opus 4.7 on SWE-Bench Pro at 5-15% of the cost. Picked correctly, they replace 70-80% of frontier-model API spend. Here’s how to choose between DeepSeek V4 Pro, Kimi K2.6, GLM-5.1, Qwen 3.6 Plus, and MiniMax M2.7 — and when to escalate to Claude Opus 4.7 or Mythos Preview.

Last verified: May 5, 2026

The ranking

#	Model	Best For	SWE-Bench Pro	License	Output $/1M
1	Kimi K2.6	Best price/performance	58.6%	Modified MIT	$0.95
2	DeepSeek V4 Pro Max	Best agentic / long-horizon	~58%	DeepSeek License	~$1.50
3	GLM-5.1	Best self-hosted enterprise	58.4%	MIT	~$1.20
4	Qwen 3.6 Plus	Best multilingual + tool use	~57%	Apache 2.0	~$1.00
5	MiniMax M2.7	Best native multimodal + voice	—	Restrictive	~$1.10

Sources: BenchLM Chinese leaderboard (April 2026), Atlas Cloud comparison (April 2026), artificialanalysis.ai (April 2026), Arena.ai Code Arena (April 26, 2026).

#1: Kimi K2.6 (Moonshot AI)

Released: April 20, 2026.

Why #1: Best price/performance. SWE-Bench Pro 58.6%, Code Arena WebDev #6 at 1,529 Elo, $0.95 per 1M output tokens. Fits the cost-sensitive coding-agent default for 80% of teams.

Strengths:

Cheapest top-tier open-weights model.
Mature ecosystem (broad provider availability: Atlas Cloud, Together AI, DeepInfra, OpenRouter).
256K context window.
Self-hostable on 4× H200.

Weaknesses:

Modified MIT license (some commercial-use ambiguity vs pure MIT).
Trails DeepSeek V4 Pro Max on agentic GDPval-AA score (1484 vs 1554).
256K context vs Opus 4.7’s 1M.

Pick if: You want the best balance of capability, price, and ecosystem maturity for general coding workloads.

Detailed comparison: Kimi K2.6 vs Claude Opus 4.7

#2: DeepSeek V4 Pro Max (DeepSeek)

Released: April 2026 (V4 Pro Max, V4 Pro, V4 Flash variants).

Why #2: Best agentic real-world performance. GDPval-AA score of 1554 leads all open-weights models. DeepSeek V4 Pro Max ties Kimi K2.6 on the BenchLM aggregate at 87.

Strengths:

Best GDPval-AA score among open weights (1554) — leads on agentic real-world tasks.
1M-token context window matches Opus 4.7 and GPT-5 Pro.
Three variants (Pro Max, Pro, Flash) for cost/speed trade-offs.
Strong fine-tune ecosystem.

Weaknesses:

DeepSeek License has more commercial restrictions than MIT.
Pricing varies more by host than Kimi or GLM.
SWE-Bench Pro score is variant-dependent.

Pick if: You’re running long-horizon agentic workflows (multi-step coding agents, autonomous loops, RAG over large codebases) where real-world agentic performance matters more than benchmark SWE-Bench Pro scores on individual tickets.

#3: GLM-5.1 (Z.ai / Zhipu AI)

Released: April 7, 2026.

Why #3: Best self-hosted enterprise option. Pure MIT license, 754B parameter MoE, SWE-Bench Pro 58.4%.

Strengths:

Pure MIT license — most enterprise-friendly licensing in the open-weights space.
754B parameters (largest by parameter count).
Trained on Huawei Ascend chips (NVIDIA-independent stack).
GDPval-AA 1535 (strong agentic).

Weaknesses:

754B parameters require substantial inference infrastructure.
BenchLM aggregate (83) trails Kimi K2.6 and DeepSeek V4 Pro (both 87).
Less mature ecosystem than Kimi or DeepSeek.

Pick if: You need self-hosted deployment for enterprise / sovereign / air-gapped use cases and prioritize maximum legal clarity (MIT license).

#4: Qwen 3.6 Plus (Alibaba)

Released: Q1 2026 (Qwen 3.6 family with Max-Preview and Plus variants).

Why #4: Best multilingual coding + tool use. Strong on agentic coding workloads with Apache 2.0 licensing.

Strengths:

Apache 2.0 license (very enterprise-friendly).
Best multilingual support (especially Chinese, Korean, Japanese).
Strong tool-use behavior in agent loops.
3.6-35B-A3B + 3.6 Plus variants (MoE for efficiency).

Weaknesses:

BenchLM Chinese leaderboard score (~79) trails the top 3.
Smaller community than DeepSeek or Kimi.
SWE-Bench Pro slightly below 58%.

Pick if: You need multilingual coding (especially CJK) or you want Apache 2.0 licensing for the cleanest legal posture.

#5: MiniMax M2.7 (MiniMax)

Released: Q1 2026.

Why #5: Best native multimodal + voice. The only top-5 open-weights model with strong native voice and multimodal handling.

Strengths:

Native multimodal (text + image + voice) without separate models.
MiniMax CLI agent tooling.
Strong on voice-first applications.

Weaknesses:

Open weights but more restrictive commercial license.
Pure-coding benchmarks below the top 4.
Smaller ecosystem outside MiniMax-hosted infrastructure.

Pick if: You’re building voice-first AI applications, multimodal agents (image + code), or products where MiniMax’s CLI tooling is a fit.

How to choose

A simple decision tree:

Need self-hosted with MIT license? → GLM-5.1
Need cheapest top-tier coding? → Kimi K2.6
Running long agentic loops? → DeepSeek V4 Pro Max
Multilingual or Apache 2.0 required? → Qwen 3.6 Plus
Voice / multimodal first? → MiniMax M2.7
None of the above? → Default to Kimi K2.6

For most teams, Kimi K2.6 is the default and you only deviate if a specific constraint (license, agent depth, multimodal) drives a different choice.

How they compare to closed frontier models

Capability	Best Open-Weights	Closed Frontier	Gap
SWE-Bench Pro	Kimi K2.6 (58.6%)	Opus 4.7 (64.3%), Mythos Preview (~77.8%)	5.7 / 19.2 points
Code Arena WebDev	Kimi K2.6 (1,529 Elo)	Opus 4.7 (1,565 Elo)	36 Elo
1M context	DeepSeek V4 Pro / GLM-5.1	Opus 4.7, GPT-5 Pro	comparable
Tool use reliability	Strong	Best (Opus 4.7)	small
Output price ($/1M)	$0.95 (Kimi)	$75 (Opus 4.7)	79x

For 70-80% of coding workflows, open-weights are sufficient. For the hardest 10-20%, escalate to Opus 4.7 or Mythos Preview via a router pattern.

Bundled access via OpenCode Go

The cheapest entry point in May 2026 is OpenCode Go:

$5 first month, $10/month thereafter.
5-hour rate limits across GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo-V2.5-Pro, MiMo-V2.5, Qwen 3.5 Plus, Qwen 3.6 Plus, MiniMax M2.5, MiniMax M2.7.
Best for individual developers experimenting with the open-weights stack.

Per-token APIs from Atlas Cloud, Together AI, DeepInfra, and OpenRouter are better for production at scale.

Self-hosting cost analysis

Rough breakeven points for self-hosting vs hosted API (May 2026):

Model	Hardware	Hourly cost	Breakeven
Kimi K2.6	4× H200	~$8	~30B tokens/month
GLM-5.1	8× H200	~$16	~50B tokens/month
DeepSeek V4 Pro	8× H200	~$16	~50B tokens/month
Qwen 3.6 Plus	4× H200	~$8	~30B tokens/month

Above the breakeven, self-hosting wins on cost and gives you full data control. Below, hosted APIs are simpler.

Bottom line

In May 2026, Kimi K2.6 is the default open-weights coding model, with DeepSeek V4 Pro Max as the agentic specialist, GLM-5.1 as the self-hosted enterprise pick, Qwen 3.6 Plus as the multilingual specialist, and MiniMax M2.7 as the multimodal specialist. All five are within 5-7 percentage points of Claude Opus 4.7 on SWE-Bench Pro at 5-15% of the cost. For high-volume coding workloads, the right answer is a router that defaults to open-weights and escalates to Opus 4.7 or Mythos Preview only for hardest tasks.

Sources: BenchLM.ai Chinese leaderboard (April 2026), Atlas Cloud comparison (April 2026), artificialanalysis.ai (April 2026), DeepLearning.AI The Batch (April 2026), Arena.ai Code Arena (April 26, 2026), opencode.ai/go pricing (May 2026), llm-stats.com (May 2026).