Which is the best Chinese open-weights coding model in May 2026?

There is no single winner — the four are within 3 points of each other on SWE-Bench Pro. (1) Kimi K2.6 (58.6%) is the best balance of capability, ecosystem support, and self-host accessibility. (2) GLM-5.1 (58.4%) is the new leader on raw benchmarks and the strongest 'Western frontier challenger.' (3) MiniMax M2.7 (56.22%) is the best at agentic engineering with self-evolution training tricks. (4) DeepSeek V4 has the cheapest pricing tier ($0.30/1M output for V4 Flash) and the broadest cloud availability. For most teams: start with Kimi K2.6 if you can self-host, GLM-5.1 if you can't, DeepSeek V4 Flash for cost-sensitive batch work.

Are these models really open-weights?

Yes — all four shipped with open weights in the late-April 2026 release wave. Kimi K2.6 (Moonshot AI), GLM-5.1 (Z.ai), MiniMax M2.7, and DeepSeek V4 all released model weights publicly under permissive licenses. You can download them from Hugging Face and run them on your own infrastructure. This is the key distinction from Western frontier models — Claude 4.7 Opus, GPT-5.5, and Gemini 2.6 Pro are all closed-weights. The Chinese open-weights cluster is what State of AI: May 2026 calls 'four Chinese labs released open-weights coding models inside a 12-day window'.

How do they compare on price?

Output token pricing as of May 2026: DeepSeek V4 Flash $0.30/1M, MiniMax M2.7 $1.20/1M, Kimi K2.6 ~$0.95/1M, GLM-5.1 ~$1.10/1M, DeepSeek V4 Pro Max $1.50/1M. All four are 25-250x cheaper per token than Claude Opus 4.7 ($75/1M output). The hosted-API price gaps are small enough that latency and ecosystem support matter more than raw cost. Self-hosting any of them on your own GPUs eliminates per-token costs entirely, replaced by hardware capex / cloud compute.

Should I use these instead of Claude or GPT-5.5?

For 70-90% of routine coding tasks: yes, the cost savings are dramatic. For the hardest 5-10% of tasks (novel architecture, deep debugging, high-stakes refactors): closed-weights frontier models still win meaningfully. The pragmatic answer in May 2026 is a router pattern — Tier 1 on a Chinese open-weights model (DeepSeek V4 Flash or Kimi K2.6), Tier 2 on a higher-capability open-weights model (Kimi K2.6 or GLM-5.1), Tier 3 on Claude Opus 4.7 or Mythos for the hardest 5%. Teams report 85-95% total cost savings with under 10% quality loss using this pattern.

Quick Answer

MiniMax M2.7 vs Kimi K2.6 vs GLM-5.1 vs DeepSeek V4 (May 2026)

Published: May 6, 2026

MiniMax M2.7 vs Kimi K2.6 vs GLM-5.1 vs DeepSeek V4 (May 2026)

Four Chinese AI labs released open-weights coding models inside a 12-day window in late April 2026 — Z.ai’s GLM-5.1, MiniMax M2.7, Moonshot’s Kimi K2.6, and DeepSeek V4 — all landing at roughly the same capability ceiling on agentic engineering at meaningfully lower cost than Western frontier models. This is the single most important development in the open-weights coding space this year. Here’s how they compare in May 2026.

Last verified: May 6, 2026

The four models at a glance

Model	Released	Parameters	SWE-Bench Pro	Output price	Best for
GLM-5.1 (Z.ai)	April 2026	360B MoE	58.4%	~$1.10/1M	Strongest benchmark scores, hosted API
Kimi K2.6 (Moonshot)	April 2026	~1T MoE	58.6%	~$0.95/1M	Best ecosystem, self-host friendly
MiniMax M2.7	April 2026	229B	56.22%	$1.20/1M	Self-evolving agentic training
DeepSeek V4 (Pro / Flash)	April 2026	~700B MoE	57-58%	$0.30-1.50/1M	Cheapest API, broadest availability

For comparison: Claude Opus 4.7 outputs at $75/1M tokens — 50-250x more expensive than this Chinese open-weights cluster.

What each model is best at

GLM-5.1 (Z.ai)

The benchmark leader. Strongest reported scores on SWE-Bench Pro and Terminal Bench 2 among the four. Z.ai ships first-class hosted API access and aggressive coding-plan pricing in their Coding Plan tier (favorable for high-volume API customers).

Strength: Top-tier benchmark numbers, strong reasoning on hard tasks.
Weakness: Smaller third-party tooling ecosystem than Kimi or DeepSeek.
Pick if: You want maximum capability among open-weights options and you’re API-first.

Kimi K2.6 (Moonshot AI)

The default starting point. Strong benchmarks (58.6% SWE-Bench Pro), the most mature self-host story among the four, and the broadest tooling support — Cline, Roo Code, Aider, OpenCode, and most other coding harnesses ship Kimi K2.6 presets.

Strength: Ecosystem support, balance of capability/cost, self-host viability.
Weakness: Slightly behind GLM-5.1 on raw benchmark numbers.
Pick if: You want one model to default to and may eventually self-host.

MiniMax M2.7

The agentic specialist. Released March 18, 2026 with a “self-evolution” training approach — the model learns from a curated library of agentic engineering trajectories. Hits 62.7% on MiniMax’s internal complex-task benchmark, which they claim is close to Sonnet 4.6 performance.

Strength: Strong on long-horizon agentic tasks, $1.20/1M output is competitive.
Weakness: Text-only (no multimodal); benchmark scores trail GLM-5.1 / Kimi K2.6 on standard suites.
Pick if: You’re building agent loops with long contexts and many tool calls.

DeepSeek V4 (Pro and Flash)

The cost leader. Two variants — V4 Flash at $0.30/1M output is the cheapest credible coding model on the market; V4 Pro Max at $1.50/1M output offers Pro-tier capability. Broadest cloud availability (every major aggregator including OpenRouter, Together, Fireworks).

Strength: Price, availability, mature provider ecosystem.
Weakness: V4 Flash trails the others on hard tasks; needs router pattern to compete on capability.
Pick if: You’re cost-sensitive or running batch / high-volume workloads.

Benchmark comparison

Numbers reported as of May 2026 (sources: BestBlogs comparison, MiniMax internal, Artificial Analysis):

Benchmark	GLM-5.1	Kimi K2.6	MiniMax M2.7	DeepSeek V4
SWE-Bench Pro	58.4	58.6	56.22	57-58
Terminal Bench 2	~57	~58	57.0	~56
VIBE-Pro	~56	~57	55.6	~55
GDPval-AA (ELO)	—	—	1495	—

The headline: all four cluster within 2-3 points of each other. Field reports (per the BestBlogs comparison) say the actual production differences come from tool-call stability, context recovery on long agent loops, and prompt-format compatibility — not raw benchmark scores.

Pricing in detail (May 2026)

Hosted API output token rates per 1M tokens:

Model	Input	Output	Cached input
DeepSeek V4 Flash	$0.07	$0.30	~$0.014
Kimi K2.6	$0.20	~$0.95	~$0.04
GLM-5.1	$0.25	~$1.10	~$0.05
MiniMax M2.7	$0.30	$1.20	$0.06
DeepSeek V4 Pro Max	$0.40	$1.50	~$0.08
Claude Opus 4.7 (reference)	$15	$75	$1.50

Self-hosting on a single 8x H100 node: roughly $25-40/hour depending on cloud, supporting all four models at production throughput. Break-even vs hosted APIs is roughly 30-80M output tokens/month — heavy users save by self-hosting.

Tool / harness support

Which coding harnesses ship presets for each model in May 2026:

Harness	GLM-5.1	Kimi K2.6	MiniMax M2.7	DeepSeek V4
Cline	✅	✅	✅	✅
Roo Code	✅	✅	✅	✅
Aider	✅	✅	⚠️ Partial	✅
OpenCode	✅	✅	✅	✅
Claude Code	Custom	Custom	Custom	Custom
Cursor	⚠️ Manual	✅	⚠️ Manual	✅

Kimi K2.6 has the deepest tooling integration; DeepSeek V4 is close behind. GLM-5.1 and MiniMax M2.7 are catching up but require manual configuration in some tools.

When to pick each

Pick GLM-5.1 if:

You want the highest open-weights capability score.
You’re API-first and don’t plan to self-host.
You’re already on Z.ai’s Coding Plan or willing to add it.

Pick Kimi K2.6 if:

You want one model to standardize on across your team.
You may eventually self-host for compliance or cost.
You use Cline, Roo Code, Aider, or OpenCode as your primary harness.

Pick MiniMax M2.7 if:

You’re building long-horizon agent loops with many tool calls.
You prefer the OpenAI-compatible API surface MiniMax provides.
Self-evolution training (a research-y angle) is interesting to your use case.

Pick DeepSeek V4 if:

Cost is the primary constraint.
You’re running batch / high-volume workloads.
You want the most provider availability (OpenRouter, Together, Fireworks all ship DeepSeek V4).

Don’t pick just one — use a router

The most common production pattern in May 2026 is multi-tier routing across these models:

Tier 1 (~70% traffic):  DeepSeek V4 Flash    $0.30/1M
Tier 2 (~25% traffic):  Kimi K2.6 or GLM-5.1 ~$1.00/1M
Tier 3 (~5% traffic):   Claude Opus 4.7      $75/1M

This blended pattern saves 85-95% on coding API costs vs running everything on Opus 4.7, with under 10% reported quality loss for typical workloads.

What’s next

Three things to watch:

DeepSeek V5 rumored for Q2-Q3 2026. Expected to push Tier 1 capability higher, making the router pattern even cheaper.
Kimi K3 rumored for Q3 2026. Expected to close the remaining gap to Western frontier on hard reasoning.
Western frontier response. OpenAI’s GPT-OSS line and Meta’s Llama 5 are expected to push back on the Chinese open-weights lead through 2026.

Bottom line

In May 2026, the four-Chinese-open-weights cluster (GLM-5.1, Kimi K2.6, MiniMax M2.7, DeepSeek V4) is functionally one tier of capability at 25-250x cheaper than Claude Opus 4.7 or GPT-5.5. Within the cluster, GLM-5.1 leads on benchmarks, Kimi K2.6 leads on ecosystem, MiniMax M2.7 leads on agentic tasks, DeepSeek V4 leads on cost. Most teams should pick one as a default (Kimi K2.6 is the safest bet) and use a router to escalate the hardest 5% to a Western frontier model. The era of “one frontier model for everything” is over.

Sources: State of AI: May 2026 (Air Street Press, Nathan Benaich), BestBlogs Chinese flagship LLM comparison (April 2026), MiniMax M2.7 release notes, OpenCode model documentation (May 2026), Together AI model pages (May 2026), DEV Community late-April 2026 Chinese LLM stack analysis.