AI agents · OpenClaw · self-hosting · automation

Quick Answer

Best Open-Weights Coding Models May 2026: Top 5 Ranked

Published:

Best Open-Weights Coding Models May 2026: Top 5 Ranked

Five Chinese open-weights coding models lead the May 2026 stack, all within a 5-7 percentage point gap to Claude Opus 4.7 on SWE-Bench Pro at 5-15% of the cost. Picked correctly, they replace 70-80% of frontier-model API spend. Here’s how to choose between DeepSeek V4 Pro, Kimi K2.6, GLM-5.1, Qwen 3.6 Plus, and MiniMax M2.7 — and when to escalate to Claude Opus 4.7 or Mythos Preview.

Last verified: May 5, 2026

The ranking

#ModelBest ForSWE-Bench ProLicenseOutput $/1M
1Kimi K2.6Best price/performance58.6%Modified MIT$0.95
2DeepSeek V4 Pro MaxBest agentic / long-horizon~58%DeepSeek License~$1.50
3GLM-5.1Best self-hosted enterprise58.4%MIT~$1.20
4Qwen 3.6 PlusBest multilingual + tool use~57%Apache 2.0~$1.00
5MiniMax M2.7Best native multimodal + voiceRestrictive~$1.10

Sources: BenchLM Chinese leaderboard (April 2026), Atlas Cloud comparison (April 2026), artificialanalysis.ai (April 2026), Arena.ai Code Arena (April 26, 2026).

#1: Kimi K2.6 (Moonshot AI)

Released: April 20, 2026.

Why #1: Best price/performance. SWE-Bench Pro 58.6%, Code Arena WebDev #6 at 1,529 Elo, $0.95 per 1M output tokens. Fits the cost-sensitive coding-agent default for 80% of teams.

Strengths:

  • Cheapest top-tier open-weights model.
  • Mature ecosystem (broad provider availability: Atlas Cloud, Together AI, DeepInfra, OpenRouter).
  • 256K context window.
  • Self-hostable on 4× H200.

Weaknesses:

  • Modified MIT license (some commercial-use ambiguity vs pure MIT).
  • Trails DeepSeek V4 Pro Max on agentic GDPval-AA score (1484 vs 1554).
  • 256K context vs Opus 4.7’s 1M.

Pick if: You want the best balance of capability, price, and ecosystem maturity for general coding workloads.

Detailed comparison: Kimi K2.6 vs Claude Opus 4.7

#2: DeepSeek V4 Pro Max (DeepSeek)

Released: April 2026 (V4 Pro Max, V4 Pro, V4 Flash variants).

Why #2: Best agentic real-world performance. GDPval-AA score of 1554 leads all open-weights models. DeepSeek V4 Pro Max ties Kimi K2.6 on the BenchLM aggregate at 87.

Strengths:

  • Best GDPval-AA score among open weights (1554) — leads on agentic real-world tasks.
  • 1M-token context window matches Opus 4.7 and GPT-5 Pro.
  • Three variants (Pro Max, Pro, Flash) for cost/speed trade-offs.
  • Strong fine-tune ecosystem.

Weaknesses:

  • DeepSeek License has more commercial restrictions than MIT.
  • Pricing varies more by host than Kimi or GLM.
  • SWE-Bench Pro score is variant-dependent.

Pick if: You’re running long-horizon agentic workflows (multi-step coding agents, autonomous loops, RAG over large codebases) where real-world agentic performance matters more than benchmark SWE-Bench Pro scores on individual tickets.

#3: GLM-5.1 (Z.ai / Zhipu AI)

Released: April 7, 2026.

Why #3: Best self-hosted enterprise option. Pure MIT license, 754B parameter MoE, SWE-Bench Pro 58.4%.

Strengths:

  • Pure MIT license — most enterprise-friendly licensing in the open-weights space.
  • 754B parameters (largest by parameter count).
  • Trained on Huawei Ascend chips (NVIDIA-independent stack).
  • GDPval-AA 1535 (strong agentic).

Weaknesses:

  • 754B parameters require substantial inference infrastructure.
  • BenchLM aggregate (83) trails Kimi K2.6 and DeepSeek V4 Pro (both 87).
  • Less mature ecosystem than Kimi or DeepSeek.

Pick if: You need self-hosted deployment for enterprise / sovereign / air-gapped use cases and prioritize maximum legal clarity (MIT license).

#4: Qwen 3.6 Plus (Alibaba)

Released: Q1 2026 (Qwen 3.6 family with Max-Preview and Plus variants).

Why #4: Best multilingual coding + tool use. Strong on agentic coding workloads with Apache 2.0 licensing.

Strengths:

  • Apache 2.0 license (very enterprise-friendly).
  • Best multilingual support (especially Chinese, Korean, Japanese).
  • Strong tool-use behavior in agent loops.
  • 3.6-35B-A3B + 3.6 Plus variants (MoE for efficiency).

Weaknesses:

  • BenchLM Chinese leaderboard score (~79) trails the top 3.
  • Smaller community than DeepSeek or Kimi.
  • SWE-Bench Pro slightly below 58%.

Pick if: You need multilingual coding (especially CJK) or you want Apache 2.0 licensing for the cleanest legal posture.

#5: MiniMax M2.7 (MiniMax)

Released: Q1 2026.

Why #5: Best native multimodal + voice. The only top-5 open-weights model with strong native voice and multimodal handling.

Strengths:

  • Native multimodal (text + image + voice) without separate models.
  • MiniMax CLI agent tooling.
  • Strong on voice-first applications.

Weaknesses:

  • Open weights but more restrictive commercial license.
  • Pure-coding benchmarks below the top 4.
  • Smaller ecosystem outside MiniMax-hosted infrastructure.

Pick if: You’re building voice-first AI applications, multimodal agents (image + code), or products where MiniMax’s CLI tooling is a fit.

How to choose

A simple decision tree:

Need self-hosted with MIT license? → GLM-5.1
Need cheapest top-tier coding? → Kimi K2.6
Running long agentic loops? → DeepSeek V4 Pro Max
Multilingual or Apache 2.0 required? → Qwen 3.6 Plus
Voice / multimodal first? → MiniMax M2.7
None of the above? → Default to Kimi K2.6

For most teams, Kimi K2.6 is the default and you only deviate if a specific constraint (license, agent depth, multimodal) drives a different choice.

How they compare to closed frontier models

CapabilityBest Open-WeightsClosed FrontierGap
SWE-Bench ProKimi K2.6 (58.6%)Opus 4.7 (64.3%), Mythos Preview (~77.8%)5.7 / 19.2 points
Code Arena WebDevKimi K2.6 (1,529 Elo)Opus 4.7 (1,565 Elo)36 Elo
1M contextDeepSeek V4 Pro / GLM-5.1Opus 4.7, GPT-5 Procomparable
Tool use reliabilityStrongBest (Opus 4.7)small
Output price ($/1M)$0.95 (Kimi)$75 (Opus 4.7)79x

For 70-80% of coding workflows, open-weights are sufficient. For the hardest 10-20%, escalate to Opus 4.7 or Mythos Preview via a router pattern.

Bundled access via OpenCode Go

The cheapest entry point in May 2026 is OpenCode Go:

  • $5 first month, $10/month thereafter.
  • 5-hour rate limits across GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo-V2.5-Pro, MiMo-V2.5, Qwen 3.5 Plus, Qwen 3.6 Plus, MiniMax M2.5, MiniMax M2.7.
  • Best for individual developers experimenting with the open-weights stack.

Per-token APIs from Atlas Cloud, Together AI, DeepInfra, and OpenRouter are better for production at scale.

Self-hosting cost analysis

Rough breakeven points for self-hosting vs hosted API (May 2026):

ModelHardwareHourly costBreakeven
Kimi K2.64× H200~$8~30B tokens/month
GLM-5.18× H200~$16~50B tokens/month
DeepSeek V4 Pro8× H200~$16~50B tokens/month
Qwen 3.6 Plus4× H200~$8~30B tokens/month

Above the breakeven, self-hosting wins on cost and gives you full data control. Below, hosted APIs are simpler.

Bottom line

In May 2026, Kimi K2.6 is the default open-weights coding model, with DeepSeek V4 Pro Max as the agentic specialist, GLM-5.1 as the self-hosted enterprise pick, Qwen 3.6 Plus as the multilingual specialist, and MiniMax M2.7 as the multimodal specialist. All five are within 5-7 percentage points of Claude Opus 4.7 on SWE-Bench Pro at 5-15% of the cost. For high-volume coding workloads, the right answer is a router that defaults to open-weights and escalates to Opus 4.7 or Mythos Preview only for hardest tasks.

Sources: BenchLM.ai Chinese leaderboard (April 2026), Atlas Cloud comparison (April 2026), artificialanalysis.ai (April 2026), DeepLearning.AI The Batch (April 2026), Arena.ai Code Arena (April 26, 2026), opencode.ai/go pricing (May 2026), llm-stats.com (May 2026).