Which open-weights coding model is best in May 2026?

It's roughly a three-way tie at the top. DeepSeek V4 Pro (Max) leads BenchLM's Chinese leaderboard at 87 and GDPval-AA at 1554 (best agentic real-world performance among open weights). Kimi K2.6 also scores 87 on BenchLM with 58.6% on SWE-Bench Pro at $0.95/1M tokens. GLM-5.1 scores 83 on BenchLM with 58.4% on SWE-Bench Pro under MIT license — making it the best choice for self-hosted enterprise deployment. The 'best' depends on whether you prioritize raw capability (DeepSeek V4 Pro), price (Kimi K2.6), or self-hosting flexibility (GLM-5.1).

How does Kimi K2.6 compare to Claude Opus 4.7?

Kimi K2.6 trails Claude Opus 4.7 on coding benchmarks (58.6% vs 64.3% on SWE-Bench Pro), and on Arena.ai's Code Arena WebDev leaderboard, Kimi K2.6 sits at 1,529 Elo — sixth among 67 models — behind Claude Opus 4.7 at 1,565 Elo as of April 26, 2026. The capability gap is real but small. Kimi K2.6's price ($0.95 per 1M tokens) is roughly 15-20x cheaper than Claude Opus 4.7. For cost-sensitive workflows or self-hosted deployment, Kimi is the smart pick. For ceiling capability on hard tasks, Opus 4.7 still wins.

Is DeepSeek V4 Pro better than Kimi K2.6?

On the BenchLM aggregate score, they tie at 87. On agentic real-world tasks (GDPval-AA), DeepSeek V4 Pro Max scores 1554 vs Kimi K2.6's 1484 — meaningful gap favoring DeepSeek for autonomous agent workloads. On SWE-Bench Pro alone, Kimi K2.6 is slightly ahead (58.6% vs DeepSeek V4 Pro's similar but variant-dependent score). For agent loops with long horizons, pick DeepSeek V4 Pro. For pure coding edits and short-form work, Kimi K2.6 is competitive at lower cost.

What's the cheapest way to run these models?

Three options. (1) OpenCode Go: $5 first month then $10/month with 5-hour rate limits across GLM-5.1, Kimi K2.6, MiMo-V2.5-Pro, MiniMax M2.7, Qwen 3.6 Plus, and others — best for individual devs. (2) Atlas Cloud, Together AI, or DeepInfra hosted APIs: $0.60-$1.50 per 1M tokens depending on model and tier. (3) Self-hosted on H100/H200 (or rented Blackwell): viable for GLM-5.1 (MIT license) and Kimi K2.6 (Modified MIT). Self-hosting becomes cost-effective above ~50B tokens per month.

Quick Answer

Kimi K2.6 vs GLM-5.1 vs DeepSeek V4 Pro Coding (May 2026)

Published: May 5, 2026

Kimi K2.6 vs GLM-5.1 vs DeepSeek V4 Pro Coding (May 2026)

Three open-weights Chinese coding models lead the May 2026 stack. Kimi K2.6 (Moonshot, April 20), GLM-5.1 (Z.ai, April 7), and DeepSeek V4 Pro (DeepSeek, April 2026 release). All three sit in BenchLM’s Tier-A range (83-87), all three have permissive licenses, and all three are 80%+ as capable as Claude Opus 4.7 on coding benchmarks at 5-15% of the cost. Here’s how they compare.

Last verified: May 5, 2026

At-a-glance comparison

Model	BenchLM Aggregate	SWE-Bench Pro	GDPval-AA	License	Best For
Kimi K2.6	87	58.6%	1484	Modified MIT	Best price/perf for coding
DeepSeek V4 Pro (Max)	87	~58%	1554	DeepSeek License (commercial)	Agentic / long-horizon agents
GLM-5.1	83	58.4%	1535	MIT	Self-hosted enterprise
Claude Opus 4.7 (reference)	—	64.3%	—	Closed	Ceiling capability

Sources: BenchLM Chinese leaderboard (April 2026), Atlas Cloud comparison post (April 2026), artificialanalysis.ai DeepSeek V4 article (April 2026), Arena.ai Code Arena WebDev (April 26, 2026).

Kimi K2.6 (Moonshot AI, April 20, 2026)

Strengths:

Best aggregate Chinese-leaderboard score at 87 (tied with DeepSeek V4 Pro).
58.6% SWE-Bench Pro — slightly ahead of GLM-5.1 (58.4%) on this specific benchmark.
6th place on Arena.ai Code Arena WebDev at 1,529 Elo (April 26, 2026).
$0.95 per 1M tokens on most hosted-API providers — best raw price.
Open weights under Modified MIT license.

Weaknesses:

Below Claude Opus 4.7 on hardest coding tasks.
Modified MIT (vs pure MIT) creates some commercial-use ambiguity.
Agent-loop performance (GDPval-AA 1484) trails DeepSeek V4 Pro (1554).

Pick Kimi K2.6 if: Cost matters and you’re running coding edits, code review, or moderate-length agentic tasks. Best price/performance ratio in the open-weights category.

GLM-5.1 (Z.ai / Zhipu AI, April 7, 2026)

Strengths:

Pure MIT license — the most enterprise-friendly licensing among Chinese open-weights leaders.
754B parameters with MoE routing — largest by parameter count among the three.
58.4% SWE-Bench Pro — within rounding of Kimi K2.6.
GDPval-AA 1535 — solid agentic performance.
Trained on Huawei Ascend chips — strategically important for sovereignty / non-NVIDIA stacks.

Weaknesses:

BenchLM aggregate score (83) trails Kimi K2.6 and DeepSeek V4 Pro (both 87).
754B parameters require substantial inference infrastructure for self-hosting.
Less mature ecosystem (fewer fine-tunes, smaller community) than Kimi or DeepSeek.

Pick GLM-5.1 if: You need self-hosted deployment with maximum legal clarity (MIT license) and you have the GPU infrastructure to run a 754B MoE model. Best for sovereign / air-gapped enterprise.

DeepSeek V4 Pro (Max) (April 2026)

Strengths:

Best agentic real-world score at 1554 GDPval-AA — leads all open-weights models.
Tied for top BenchLM aggregate at 87.
1M-token context window — matches Claude Opus 4.7 and GPT-5 Pro on long-context tasks.
Multiple variants (V4 Pro Max, V4 Pro, V4 Flash) for different cost/speed trade-offs.
Mature ecosystem — DeepSeek has the largest fine-tune and tooling ecosystem among Chinese labs.

Weaknesses:

DeepSeek License has more restrictions than MIT (some commercial use cases require approval).
Pricing varies more by host than Kimi or GLM.
SWE-Bench Pro score is variant-dependent and not always the headline number.

Pick DeepSeek V4 Pro if: You’re running long-horizon agentic workflows (multi-step coding agents, RAG over large codebases, autonomous loops) where GDPval-AA-style real-world performance matters more than SWE-Bench Pro on isolated tickets.

Pricing (May 2026)

Hosted-API pricing varies by provider; representative rates:

Model	Input ($/1M)	Output ($/1M)	Notes
Kimi K2.6	~$0.30	~$0.95	Cheapest top-tier
GLM-5.1	~$0.40	~$1.20	MIT, viable self-host
DeepSeek V4 Pro Max	~$0.60	~$1.50	Best for agents
Claude Opus 4.7 (reference)	$15	$75	15-50x more expensive

For comparison, Claude Opus 4.7 is roughly 15x more expensive on input and 50-75x more expensive on output. The cost savings are dramatic for high-volume workloads.

Bundled access via OpenCode Go

OpenCode Go (opencode.ai/go) bundles access to GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo-V2.5-Pro, MiMo-V2.5, Qwen 3.5 Plus, Qwen 3.6 Plus, MiniMax M2.5, and MiniMax M2.7 for $5 first month / $10/month thereafter with 5-hour rate limits. For individual developers experimenting across the open-weights stack, this is the cheapest entry point in May 2026.

Self-hosting cost analysis

For enterprises considering self-hosting:

Inference cost (rough estimates, May 2026):

Kimi K2.6 on 4× H200: ~$8/hour fully loaded, breaks even with API at ~30B tokens/month.
GLM-5.1 on 8× H200: ~$16/hour fully loaded, breaks even at ~50B tokens/month.
DeepSeek V4 Pro on 8× H200: similar to GLM-5.1.

Self-hosting becomes economically rational for:

High-volume workloads (>50B tokens/month).
Data-sovereignty requirements (EU AI Act, GDPR, defense use cases).
Air-gapped deployment where API access is impossible.
Long-running fine-tunes requiring custom variants.

For everyone else, hosted APIs (Atlas Cloud, Together AI, DeepInfra, OpenCode Go) are simpler and often cheaper.

How they compare to Claude Opus 4.7

The Chinese open-weights stack closes most of the gap to Claude Opus 4.7:

Task type	Best open-weights	Gap to Opus 4.7
SWE-Bench Pro	Kimi K2.6 (58.6%)	-5.7 points
Code Arena WebDev	Kimi K2.6 (1529 Elo)	-36 Elo
Agentic loops (GDPval-AA)	DeepSeek V4 Pro (1554)	unknown*
1M context	DeepSeek V4 Pro / GLM-5.1	comparable
Cost	Kimi K2.6 ($0.95/1M output)	50-75x cheaper

*Anthropic doesn’t publish GDPval-AA scores publicly.

For 80% of coding workflows, open-weights are now sufficient. For the hardest 20% (complex multi-file refactors, novel architecture work, debugging at the limit of model capability), Claude Opus 4.7 still wins — and Mythos Preview (~77.8% SWE-Bench Pro) extends that lead further.

Decision framework

Pick by primary use case:

Cheapest top-tier coding → Kimi K2.6. Best price, strong SWE-Bench Pro, broad availability.
Self-hosted enterprise → GLM-5.1. MIT license, mature, supports sovereign infrastructure.
Agentic / long-horizon work → DeepSeek V4 Pro Max. Best GDPval-AA, mature ecosystem, 1M context.
Maximum capability regardless of cost → Claude Opus 4.7 or Mythos Preview.
Multi-model fallback → Use OpenCode Go for $10/month bundled access.

What’s coming next

Three releases to watch through Q2 2026:

DeepSeek V5 — rumored for late Q2 2026 with improved reasoning and tool use.
Kimi K3 — Moonshot has hinted at a Q3 2026 release with substantial scale increase.
GLM-6 — Z.ai roadmap suggests H2 2026.

The open-weights coding gap to Western frontier models will likely narrow further through 2026.

Bottom line

In May 2026, Kimi K2.6 is the best price/performance pick, DeepSeek V4 Pro Max is the best agentic / long-horizon pick, and GLM-5.1 is the best self-hosted enterprise pick with its MIT license. All three are within 5-7 percentage points of Claude Opus 4.7 on SWE-Bench Pro at 5-15% of the cost. For cost-sensitive coding workloads, the Chinese open-weights stack is now competitive enough that not running it as a fallback is leaving money on the table.

Sources: BenchLM.ai Chinese LLM leaderboard (April 2026), Atlas Cloud “Kimi K2.6 vs GLM 5.1 vs Qwen 3.6 Plus vs MiniMax M2.7” (April 2026), artificialanalysis.ai “DeepSeek V4 Pro and Flash” (April 2026), deeplearning.ai “Kimi K2.6 Matches Open Qwen3.6 Max” (April 2026), Towards AI “I Tested Kimi K2.6 vs GLM-5.1 on 15 Real Coding Tasks” (April 2026), opencode.ai/go pricing (May 2026), AkitaOnRails LLM Coding Benchmark April 2026.