What is Cohere's North Mini Code?

North Mini Code is Cohere's first model purpose-built for developers, released on June 9, 2026 under the Apache 2.0 license. It is a 30B-parameter Mixture-of-Experts model with 3B active parameters, designed for agentic software engineering, terminal-based agentic tasks, and high-quality code generation. The model uses a decoder-only Transformer with 128 experts (8 activated per token), interleaved sliding-window attention with RoPE and global attention. Context length is 256K with 64K max generation. Minimum inference hardware is 1x H100 at FP8. On the Artificial Analysis Coding Index it scores 33.4, competitive with Qwen 3.6 35B-A3B at 35.2 and significantly above GLM-4.7-Flash at 25.9.

How does North Mini Code compare to Devstral Small 2?

North Mini Code wins on speed and modern architecture; Devstral wins on incumbency. Cohere reports North Mini Code achieves up to 2.8x higher output throughput than Devstral Small 2 under identical concurrency and hardware. North Mini Code is a 30B MoE with 3B active; Devstral Small 2 is a 24B dense model. Both are Apache 2.0 licensed. North Mini Code scores 33.4 on the Artificial Analysis Coding Index, materially ahead of Devstral Small 2. Devstral's advantage is ecosystem maturity — it has been the small-coding-model default in OpenCode and similar harnesses for months. North Mini Code is days old. For new deployments, North Mini Code is the better default. For existing Devstral-based pipelines, the migration cost has to be weighed.

How does North Mini Code compare to Qwen 3.6 35B-A3B?

Qwen 3.6 35B-A3B slightly wins on raw coding capability — 35.2 vs 33.4 on the Artificial Analysis Coding Index. North Mini Code wins on a few non-benchmark dimensions: trained across multiple agent scaffolds for robustness (Cohere built it to be agent-harness-agnostic), better speed under typical concurrency, native OpenCode harness integration, and Cohere's sovereign-AI positioning for buyers who need a non-Chinese small-coding model. For pure capability per parameter, Qwen 3.6 is still the small-MoE coding leader. For agent-harness robustness and Western sovereign deployments, North Mini Code is the better fit. The gap on the index is small enough that for most workloads either will work; the deciding factors are ecosystem and procurement.

Should I use small coding models in production?

Yes, in two patterns. Pattern one: routing — use a small model (North Mini Code, Devstral Small 2, Qwen 3.6 35B-A3B) for the 70-80% of agentic coding tasks where small-model quality is sufficient, and route only the hardest 10-20% to a frontier model (Claude Fable 5, GLM-5.2, GPT-5.5). This pattern dominates cost-conscious production stacks in June 2026. Pattern two: edge deployment — small models with 3B-6B active parameters can self-host on single-GPU instances, fitting use cases that frontier models cannot economically serve (per-tenant isolation, sovereign edge, offline coding agents). North Mini Code's 3B active makes it the smallest credible frontier-coding model as of June 2026.

Quick Answer

North Mini Code vs Devstral Small 2 vs Qwen 3.6: Small Coding Models

Published: June 19, 2026

North Mini Code vs Devstral Small 2 vs Qwen 3.6: Small Coding Models

Cohere shipped North Mini Code on June 9, 2026 — its first developer-purpose model, a 30B/3B MoE under Apache 2.0. It enters a class of small coding models that has become surprisingly competitive in 2026: Devstral Small 2, Qwen 3.6 35B-A3B, gpt-oss-20B, GLM-4.7-Flash, Gemma 4 26B-A4B. Here is how the three serious contenders stack up.

Last verified: June 19, 2026.

TL;DR

North Mini Code (Cohere, June 9): 30B/3B MoE, Apache 2.0, agent-harness-robust, fastest. 33.4 on AA Coding Index.
Devstral Small 2 (Mistral): 24B dense, Apache 2.0, OpenCode incumbent, slower per Cohere’s testing.
Qwen 3.6 35B-A3B (Alibaba): 35B/~3B MoE, top capability per parameter at 35.2 on AA Coding Index.
Best pattern: Pick North Mini Code for new deployments; keep Qwen 3.6 for raw capability; migrate from Devstral if speed matters.

Direct comparison

Spec	North Mini Code	Devstral Small 2	Qwen 3.6 35B-A3B
Lab	Cohere	Mistral	Alibaba
Release	June 9, 2026	Active 2026	Active 2026
Architecture	MoE (128 experts, 8 active)	Dense	MoE
Total parameters	30B	24B	35B
Active parameters	3B	24B (dense)	~3B
Context length	256K	128K-256K	256K
Max generation	64K	varies	varies
License	Apache 2.0	Apache 2.0	Apache 2.0
AA Coding Index	33.4	Below 33.4	35.2
AA Intelligence Index	27.6	not directly reported	not directly reported
Speed (Cohere reported)	~199 tok/s	1x baseline	comparable
Cohere relative throughput	2.8x Devstral	1x baseline	comparable
Min inference hardware	1x H100 @ FP8	1x H100 @ FP16	1x H100 @ FP8
Vision input	No	No	No
Reasoning capability	Yes	Limited	Yes
Tool calling	Yes	Yes	Yes
Structured outputs	Yes	Yes	Yes
OpenCode native	Yes	Yes	Yes
Cohere Model Vault	Yes	No	No
Hugging Face weights	Yes (bf16/fp8/w4a16)	Yes	Yes

When North Mini Code wins

New small-coding-model deployments. Best speed, modern MoE architecture, agent-harness-robust training.
OpenCode-based agentic stacks. Native integration and Cohere’s design for OpenCode as the reference harness.
Sovereign Western procurement. Canadian lab, Apache 2.0, ITAR-friendly profile vs Chinese alternatives.
You need Cohere Model Vault deployment. Managed inference environment for Cohere customers.
2.8x throughput on equivalent hardware vs Devstral matters at production concurrency.

When Devstral Small 2 wins

You already run Devstral in production. Migration cost is non-trivial; if it works, don’t change it.
Dense model preference. Some inference setups handle dense models more predictably than MoE.
Mistral ecosystem integration. Mistral’s hosted offerings remain mature and well-priced.

When Qwen 3.6 35B-A3B wins

Pure coding-index capability. 35.2 vs 33.4 is a real, if small, gap.
Alibaba Cloud / Chinese deployment. First-party hosting on Alibaba Cloud, deep Qwen ecosystem.
You already have a Qwen pipeline. Qwen 3.6 is a drop-in upgrade from Qwen 3.5.

Where small coding models fit in the routing stack

Workload	Recommended tier
Hardest 10-20% agentic coding	Claude Fable 5, GLM-5.2, GPT-5.5
Bulk agentic coding (60-70%)	DeepSeek V4 Pro, Kimi K2.7 Code
Routine code generation (15-25%)	North Mini Code, Qwen 3.6 35B-A3B, Devstral Small 2
Edge / single-GPU coding	North Mini Code, Qwen 3.6 35B-A3B
Per-tenant isolated coding	North Mini Code, Qwen 3.6 35B-A3B

The small-coding tier in mid-2026 has become surprisingly competitive. With 3B active parameters fitting on a single H100 at FP8, deployment cost is roughly 1/10 of a frontier model run. For routine code generation — boilerplate, refactors, single-function edits, structured output workflows — this tier handles the work at a fraction of the price.

The Cohere strategy read

Cohere has been positioning aggressively for the sovereign-AI market: enterprise procurement, non-Chinese provenance, Apache 2.0 licensing, harness-agnostic training. North Mini Code is consistent with that strategy. The pitch is not “we beat Qwen 3.6 on raw coding-index points” (they don’t, by 1.8). The pitch is: “you can deploy us in environments where Qwen, DeepSeek, GLM, or Kimi face procurement friction, and our speed is materially better than Devstral.”

For Western enterprise buyers building sovereign coding agents in June 2026, that is the actual competitive landscape. North Mini Code earns a top-3 default-choice slot for this segment.

How to try it

OpenCode: Free in your harness of choice. Best way to evaluate against your real coding tasks.
Cohere API: Dashboard at dashboard.cohere.com.
Hugging Face weights: bf16, fp8, w4a16 variants under CohereLabs/North-Mini-Code-1.0.
OpenRouter / Model Vault: Managed inference at variable pricing.
Self-hosted: 1x H100 at FP8 is the published minimum.

The honest read

North Mini Code is not the most capable small coding model in the world — Qwen 3.6 35B-A3B holds that title by 1.8 index points. It is the fastest, the most agent-harness-robust, the only Cohere-class option, and the cleanest sovereign-Western Apache 2.0 small-coding model shipped in 2026. For most new deployments inside the small-coding tier, that combination is enough to make it the default pick. For existing Qwen or Devstral pipelines, the migration math is workload-specific.

Either way, the small-coding tier in June 2026 is now competitive enough that any production stack that routes everything to a frontier model is leaving 80-95% of model cost on the table.