What is Kimi K2.7 Code and when did it launch?

Kimi K2.7 Code is Moonshot AI's open-weight coding model released on June 12, 2026. It is a 1-trillion-parameter Mixture-of-Experts (MoE) model with 32 billion parameters active per token across 384 experts (8 selected + 1 shared), a 256K-token context window, and a Modified MIT license. Weights are on Hugging Face as moonshotai/Kimi-K2.7-Code (~595 GB on disk). It is tuned for long-horizon agentic software engineering and uses roughly 30% fewer reasoning tokens than Kimi K2.6 while posting higher scores on Moonshot's own benchmarks. API pricing via the Kimi platform is $0.95 per million input tokens and $4.00 per million output tokens.

How does Kimi K2.7 Code compare to Claude Opus 4.7 on coding?

On Moonshot's company-reported benchmarks, Claude Opus 4.8 beats K2.7 Code on most coding evals: Kimi Code Bench v2 (67.4 vs 62.0), Program Bench (63.8 vs 53.6), and MLS Bench Lite (42.8 vs 35.1). Independent third-party benchmarks (SWE-bench Verified, GPQA, BrowseComp) are not yet available for K2.7 Code as of June 18, 2026. Practically: Claude Opus 4.7 / 4.8 is still ahead on quality for the hardest production coding work, but K2.7 Code closes the gap meaningfully and is roughly 5x cheaper to run at the API level. For Claude-class quality at one-fifth the cost, K2.7 Code is the strongest open-weight option available today.

How does Kimi K2.7 Code compare to GPT-5.5 on coding?

GPT-5.5 leads on Moonshot's reported benchmarks: Kimi Code Bench v2 (69.0 vs 62.0), Program Bench (69.1 vs 53.6), and MCP Mark Verified (92.9 vs 81.1). GPT-5.5 is the strongest pure-coding closed model on these evals. K2.7 Code's argument against it is not quality at the top — it's cost, openness, and self-hosting. GPT-5.5 is API-only and priced like a frontier model. K2.7 Code is open-weight under a Modified MIT license, $0.95 / $4.00 per million tokens hosted, and can be deployed on your own infrastructure with vLLM, SGLang, or KTransformers. For teams that need data residency, regulated workloads, or just predictable inference cost, K2.7 Code is the most compelling option since DeepSeek V4.

Is Kimi K2.7 Code a good fit for production agents in 2026?

Yes, with caveats. K2.7 Code is designed for long-horizon agentic coding — its 256K context, mandatory thinking mode, and strong MCP tool-use scores make it well-suited to agents that iterate over a repo, call tools, and self-correct. The caveats: (1) thinking mode cannot be disabled, sampling is locked to temperature 1.0 / top_p 0.95, which limits some agent designs that need deterministic outputs; (2) the model weights are ~595 GB, so self-hosting requires serious infrastructure; (3) all benchmark gains are company-reported as of June 18, 2026 — wait for independent SWE-bench Verified numbers before betting a critical workload on it. For agents where you control the loop and can absorb the inference latency of forced thinking, it's a serious production option. For cost-sensitive, throughput-heavy agents, it's the best open-weight pick available right now.

Should I switch from Claude or GPT to Kimi K2.7 Code?

Switch in three cases: (1) Cost is the constraint — at $0.95 / $4.00 per million tokens, K2.7 Code is roughly 5x cheaper than Claude Opus 4.8 and significantly cheaper than GPT-5.5; (2) Open weights matter — regulated industries, sovereign-AI deployments, or teams that want to fine-tune the base model; (3) Long-context agentic loops dominate your usage — K2.7 Code's 30% reasoning-token reduction directly lowers your bill on agent workloads. Don't switch if you need maximum quality on the hardest SWE-bench-style tasks, deterministic outputs, or if your team is already deeply integrated with Claude Code or OpenAI's Agents SDK. The cleanest pattern in mid-2026 is to use Claude or GPT-5.5 for the hardest 10% of tasks and K2.7 Code for the long tail.

How does Kimi K2.7 Code compare to DeepSeek V4?

Both are Chinese open-weight frontier models, both are MoE, and both are aimed at the same value proposition: frontier coding quality at a fraction of closed-model cost. As of June 18, 2026, K2.7 Code is the newer release (June 12) and ships with higher Moonshot-reported coding scores than K2.6. DeepSeek V4 still has a stronger independent benchmark track record (SWE-bench Verified, AIME, GPQA) because its scores are independently confirmed. The honest answer: pick K2.7 Code if you need long agentic loops with strong MCP tool use; pick DeepSeek V4 if you need maximum verified quality and don't need K2.7's 256K context. Both are dramatically cheaper than Claude or GPT and worth holding in your model routing layer.

Quick Answer

Kimi K2.7 Code vs Claude Opus 4.7 vs GPT-5.5 for Coding (June 2026)

Published: June 18, 2026

Kimi K2.7 Code vs Claude Opus 4.7 vs GPT-5.5 for Coding (June 2026)

Moonshot AI’s Kimi K2.7 Code dropped on Hugging Face on June 12, 2026 — a 1-trillion-parameter open-weight coding model with a 256K context window, ~30% fewer reasoning tokens than K2.6, and pricing roughly 5x cheaper than Claude Opus 4.8. It’s the strongest open-weight coding model of 2026 so far. Here’s the honest comparison against the two closed frontier models most teams are already paying for.

Last verified: June 18, 2026.

TL;DR

Kimi K2.7 Code: Open-weight, 1T MoE (32B active), 256K context, $0.95 / $4.00 per M tokens. Best open option in mid-2026.
Claude Opus 4.7 / 4.8: Closed, highest quality on hardest tasks, ~5x more expensive at the API level.
GPT-5.5: Closed, top scores on Moonshot’s reported coding benchmarks, API-only, frontier pricing.
Cost vs quality: GPT-5.5 ≥ Claude Opus 4.8 > Kimi K2.7 Code on quality. Kimi is ~5x cheaper.
Caveat: All K2.7 Code benchmark gains are Moonshot-reported. Independent SWE-bench Verified scores not yet available.

Specs at a glance

Feature	Kimi K2.7 Code	Claude Opus 4.7 / 4.8	GPT-5.5
Released	June 12, 2026	Opus 4.7 ~Feb 2026; 4.8 May 2026	Early 2026
Architecture	1T MoE, 32B active, 384 experts	Closed (dense, undisclosed)	Closed (undisclosed)
Context window	256K	200K	~256K
License	Modified MIT, open weights	Proprietary	Proprietary
Self-hosting	Yes (~595 GB weights)	No	No
Thinking mode	Mandatory, cannot disable	Optional	Optional
Input price (per 1M tokens)	$0.95	~$15.00	~$10.00
Output price (per 1M tokens)	$4.00	~$75.00	~$30.00
MCP / tool use	Strong	Strong	Strong
Vision input	Yes (MoonViT 400M)	Yes	Yes

Pricing is approximate; check provider sites for current rates. Kimi pricing is from Moonshot’s Kimi platform.

Benchmark picture (Moonshot-reported, June 12, 2026)

These are the numbers Moonshot published on the K2.7 Code model card. They are company-reported, not independently verified. Treat as direction-of-travel, not as ground truth.

Benchmark	Kimi K2.6	Kimi K2.7 Code	GPT-5.5	Claude Opus 4.8
Kimi Code Bench v2	50.9	62.0	69.0	67.4
Program Bench	48.3	53.6	69.1	63.8
MLS Bench Lite	26.7	35.1	35.5	42.8
Kimi Claw 24/7 Bench	42.9	46.9	52.8	50.4
MCP Atlas	69.4	76.0	79.4	81.3
MCP Mark Verified	72.8	81.1	92.9	76.4

The pattern is consistent: GPT-5.5 leads on raw coding benchmarks, Claude Opus 4.8 leads on tool-use and end-to-end ML engineering tasks (MLS Bench Lite, MCP Atlas), and K2.7 Code is roughly +10 to +15 points behind both — at one-fifth the cost.

What’s not in the table: SWE-bench Verified, BrowseComp, GPQA Diamond. These are the benchmarks practitioners actually trust, and as of June 18, 2026 there are no independent K2.7 Code numbers on any of them. Expect those scores to land in the next 2–4 weeks; revisit your decision when they do.

When Kimi K2.7 Code is the right choice

You’re paying a lot for API calls. At $0.95 / $4.00, the math against Claude Opus 4.8 is roughly 5x. For agent workloads burning tens of millions of tokens per day, that compounds fast.
You need long agentic loops. 256K context plus 30% fewer reasoning tokens than K2.6 means agentic runs are both longer and cheaper per step.
You need open weights. Regulated industries, sovereign-AI requirements, fine-tuning of the base model, or just freedom from vendor lock-in.
You want a strong MCP tool-use model. K2.7 Code’s 81.1 on MCP Mark Verified is the best open-weight number on that eval as of June 18, 2026.
You’re already on DeepSeek V4 and want to A/B. Same open-weight value proposition; K2.7 has stronger long-context characteristics.

When Claude Opus 4.7 / 4.8 is still the right choice

You need the hardest end-to-end ML engineering performance. Claude Opus 4.8’s 42.8 on MLS Bench Lite is the top score in the table.
You’re building with Claude Code or Claude Agent SDK. The tooling, sub-agents, skills, and Cascade-replacement workflows are tuned for Claude.
Deterministic outputs matter. K2.7 Code locks sampling to temperature 1.0 / top_p 0.95 and forces thinking mode on. Claude lets you control both.
You’re inside the Claude Max billing model. If your usage already fits Pro or Max, the marginal cost story flips toward Claude.

When GPT-5.5 is the right choice

You want top scores on pure coding benchmarks. GPT-5.5’s 69.0 on Kimi Code Bench v2 and 92.9 on MCP Mark Verified are the highest in the table.
You’re building on OpenAI Agents SDK. Mature, production-tested, broad tool ecosystem (Operator, Code Interpreter, file search).
You need broad multi-provider support but with OpenAI as primary. OpenAI’s tool ecosystem and OpenAI-compatible API standard remain the de facto baseline.

Honest caveats

All benchmark numbers above are Moonshot-published. Independent SWE-bench Verified numbers for K2.7 Code do not exist yet. Don’t bet a critical workload on these scores alone.
Self-hosting K2.7 Code is heavy. ~595 GB weights. Realistically you’re using a hosted endpoint (Kimi platform, Hyperbolic, Together, CometAPI) unless you have multi-H100 / H200 capacity available.
Forced thinking mode is real. If your agent design needs determinism or non-thinking responses, K2.7 Code is the wrong pick.
Geopolitics. Kimi is a Moonshot AI release (Beijing-based). For some buyers, particularly US government, defense, or regulated finance, that’s a non-starter. Open weights mitigate this somewhat — you can self-host — but the org-level approval process still applies.

Recommendation by use case

Use case	Pick
Highest-quality production coding agent	Claude Opus 4.7 / 4.8
Top-end coding benchmark performance	GPT-5.5
Cost-optimized long agentic loops	Kimi K2.7 Code
Open weights / sovereign / regulated	Kimi K2.7 Code or DeepSeek V4
Fine-tunable coding base model	Kimi K2.7 Code
MCP-heavy multi-tool agent	All three; Kimi cheapest, GPT-5.5 strongest
Determinism-required pipeline	Claude Opus or GPT-5.5 (Kimi locks sampling)

Sources

Codersera, “Kimi K2.7 Code: The Complete Guide — Benchmarks, Pricing & How to Use,” June 12, 2026.
FelloAI, “Kimi K2.7 Code: Specs, Benchmarks and Price,” June 15, 2026.
Kingy AI, “Kimi K2.7 Code Released: Benchmarks, Specs, and How It Compares,” June 12, 2026.
CometAPI, “Kimi K2.7 Code: Benchmarks, Architecture, Pricing & Access,” June 15, 2026.
Lushbinary, “Kimi K2.7 Code Developer Guide,” June 13, 2026.
Hugging Face: moonshotai/Kimi-K2.7-Code model card.

This page will be updated when independent SWE-bench Verified scores for Kimi K2.7 Code are published.

Kimi K2.7 Code vs Claude Opus 4.7 vs GPT-5.5 for Coding (June 2026)

TL;DR

Specs at a glance

Benchmark picture (Moonshot-reported, June 12, 2026)

When Kimi K2.7 Code is the right choice

When Claude Opus 4.7 / 4.8 is still the right choice

When GPT-5.5 is the right choice

Honest caveats

Recommendation by use case

Sources

Related pages