AI agents · OpenClaw · self-hosting · automation

Quick Answer

North Mini Code vs Devstral Small 2 vs Qwen 3.6: Small Coding Models

Published:

North Mini Code vs Devstral Small 2 vs Qwen 3.6: Small Coding Models

Cohere shipped North Mini Code on June 9, 2026 — its first developer-purpose model, a 30B/3B MoE under Apache 2.0. It enters a class of small coding models that has become surprisingly competitive in 2026: Devstral Small 2, Qwen 3.6 35B-A3B, gpt-oss-20B, GLM-4.7-Flash, Gemma 4 26B-A4B. Here is how the three serious contenders stack up.

Last verified: June 19, 2026.

TL;DR

  • North Mini Code (Cohere, June 9): 30B/3B MoE, Apache 2.0, agent-harness-robust, fastest. 33.4 on AA Coding Index.
  • Devstral Small 2 (Mistral): 24B dense, Apache 2.0, OpenCode incumbent, slower per Cohere’s testing.
  • Qwen 3.6 35B-A3B (Alibaba): 35B/~3B MoE, top capability per parameter at 35.2 on AA Coding Index.
  • Best pattern: Pick North Mini Code for new deployments; keep Qwen 3.6 for raw capability; migrate from Devstral if speed matters.

Direct comparison

SpecNorth Mini CodeDevstral Small 2Qwen 3.6 35B-A3B
LabCohereMistralAlibaba
ReleaseJune 9, 2026Active 2026Active 2026
ArchitectureMoE (128 experts, 8 active)DenseMoE
Total parameters30B24B35B
Active parameters3B24B (dense)~3B
Context length256K128K-256K256K
Max generation64Kvariesvaries
LicenseApache 2.0Apache 2.0Apache 2.0
AA Coding Index33.4Below 33.435.2
AA Intelligence Index27.6not directly reportednot directly reported
Speed (Cohere reported)~199 tok/s1x baselinecomparable
Cohere relative throughput2.8x Devstral1x baselinecomparable
Min inference hardware1x H100 @ FP81x H100 @ FP161x H100 @ FP8
Vision inputNoNoNo
Reasoning capabilityYesLimitedYes
Tool callingYesYesYes
Structured outputsYesYesYes
OpenCode nativeYesYesYes
Cohere Model VaultYesNoNo
Hugging Face weightsYes (bf16/fp8/w4a16)YesYes

When North Mini Code wins

  • New small-coding-model deployments. Best speed, modern MoE architecture, agent-harness-robust training.
  • OpenCode-based agentic stacks. Native integration and Cohere’s design for OpenCode as the reference harness.
  • Sovereign Western procurement. Canadian lab, Apache 2.0, ITAR-friendly profile vs Chinese alternatives.
  • You need Cohere Model Vault deployment. Managed inference environment for Cohere customers.
  • 2.8x throughput on equivalent hardware vs Devstral matters at production concurrency.

When Devstral Small 2 wins

  • You already run Devstral in production. Migration cost is non-trivial; if it works, don’t change it.
  • Dense model preference. Some inference setups handle dense models more predictably than MoE.
  • Mistral ecosystem integration. Mistral’s hosted offerings remain mature and well-priced.

When Qwen 3.6 35B-A3B wins

  • Pure coding-index capability. 35.2 vs 33.4 is a real, if small, gap.
  • Alibaba Cloud / Chinese deployment. First-party hosting on Alibaba Cloud, deep Qwen ecosystem.
  • You already have a Qwen pipeline. Qwen 3.6 is a drop-in upgrade from Qwen 3.5.

Where small coding models fit in the routing stack

WorkloadRecommended tier
Hardest 10-20% agentic codingClaude Fable 5, GLM-5.2, GPT-5.5
Bulk agentic coding (60-70%)DeepSeek V4 Pro, Kimi K2.7 Code
Routine code generation (15-25%)North Mini Code, Qwen 3.6 35B-A3B, Devstral Small 2
Edge / single-GPU codingNorth Mini Code, Qwen 3.6 35B-A3B
Per-tenant isolated codingNorth Mini Code, Qwen 3.6 35B-A3B

The small-coding tier in mid-2026 has become surprisingly competitive. With 3B active parameters fitting on a single H100 at FP8, deployment cost is roughly 1/10 of a frontier model run. For routine code generation — boilerplate, refactors, single-function edits, structured output workflows — this tier handles the work at a fraction of the price.

The Cohere strategy read

Cohere has been positioning aggressively for the sovereign-AI market: enterprise procurement, non-Chinese provenance, Apache 2.0 licensing, harness-agnostic training. North Mini Code is consistent with that strategy. The pitch is not “we beat Qwen 3.6 on raw coding-index points” (they don’t, by 1.8). The pitch is: “you can deploy us in environments where Qwen, DeepSeek, GLM, or Kimi face procurement friction, and our speed is materially better than Devstral.”

For Western enterprise buyers building sovereign coding agents in June 2026, that is the actual competitive landscape. North Mini Code earns a top-3 default-choice slot for this segment.

How to try it

  • OpenCode: Free in your harness of choice. Best way to evaluate against your real coding tasks.
  • Cohere API: Dashboard at dashboard.cohere.com.
  • Hugging Face weights: bf16, fp8, w4a16 variants under CohereLabs/North-Mini-Code-1.0.
  • OpenRouter / Model Vault: Managed inference at variable pricing.
  • Self-hosted: 1x H100 at FP8 is the published minimum.

The honest read

North Mini Code is not the most capable small coding model in the world — Qwen 3.6 35B-A3B holds that title by 1.8 index points. It is the fastest, the most agent-harness-robust, the only Cohere-class option, and the cleanest sovereign-Western Apache 2.0 small-coding model shipped in 2026. For most new deployments inside the small-coding tier, that combination is enough to make it the default pick. For existing Qwen or Devstral pipelines, the migration math is workload-specific.

Either way, the small-coding tier in June 2026 is now competitive enough that any production stack that routes everything to a frontier model is leaving 80-95% of model cost on the table.