What Is North Mini Code? Cohere's First Developer Model Explained
What Is North Mini Code? Cohere’s First Developer Model Explained
Cohere released North Mini Code on June 9, 2026 — a 30-billion-parameter MoE model with 3 billion active parameters, the first model in Cohere’s new “North” family, designed for agentic software engineering. It is Apache 2.0 open-weight and ships with explicit OpenCode harness integration. Here is what it is, how it performs, and where it fits.
Last verified: June 19, 2026.
TL;DR
- North Mini Code is Cohere’s first developer-purpose model — Mixture-of-Experts, 30B total / 3B active.
- Released June 9, 2026, Apache 2.0 license.
- 256K context length, 64K max generation, text-only.
- Minimum inference hardware: 1x H100 at FP8.
- 33.4 on AA Coding Index — top-3 in the small-coding-model tier.
- Up to 2.8x higher throughput than Devstral Small 2 on Cohere’s testing.
- First Cohere model designed harness-agnostic for OpenCode and similar agentic frameworks.
The headline specs
| Spec | North Mini Code |
|---|---|
| Lab | Cohere |
| Release | June 9, 2026 |
| Architecture | MoE (128 experts, 8 active per token) |
| Total parameters | 30 billion |
| Active parameters | 3 billion |
| Attention | Interleaved sliding-window (RoPE) + global (no positional) at 3:1 ratio |
| Activation | SwiGLU |
| Router | Sigmoid + top-k |
| Context length | 256K |
| Max generation | 64K |
| Vision input | No |
| License | Apache 2.0 |
| AA Coding Index | 33.4 |
| AA Intelligence Index | 27.6 |
| GDPval-AA | 14% |
| Tau2-Bench Telecom | 37% |
| Throughput (Cohere reported) | ~199 output tokens/sec |
| Throughput vs Devstral Small 2 | Up to 2.8x |
| Min inference hardware | 1x H100 @ FP8, 1x H100 @ FP4 |
What Cohere built it for
North Mini Code is the first model in Cohere’s announced “next generation” model family. The thesis is explicit in Cohere’s positioning:
- Small enough to deploy anywhere. 3B active parameters fits cleanly on a single H100. This is the “sovereign developer ecosystem” tier — environments where multi-GPU inference is impractical, but you still need real coding capability.
- Agent-harness-robust. Cohere trained across multiple scaffolds rather than over-fitting to one. The pitch is that production coding agents fail when models are over-optimized for a single harness; North Mini Code is designed to work well across OpenCode, custom MCP setups, terminal-based agents, and so on.
- Speed-first. Output throughput targets matter as much as raw capability when the workload is many parallel agentic tasks on shared infrastructure.
- Western sovereign provenance. Cohere is Canadian; Apache 2.0 license; no Chinese-lab procurement friction. This matters for some enterprise and government buyers.
Architecture in plain English
North Mini Code is a decoder-only Transformer with a sparse Mixture-of-Experts feed-forward block. Each layer has 128 expert sub-networks; for each token, the router selects 8 of them. This is the standard 2026 MoE recipe and gives 30B total parameter capacity with only 3B activated per token.
The attention block interleaves two patterns at a 3:1 ratio:
- Sliding-window attention with RoPE (3 of every 4 layers): efficient over long contexts, captures local dependencies.
- Global attention with no positional embeddings (1 of every 4 layers): captures long-range dependencies, no positional bias.
The single dense feed-forward layer before the sparse MoE layers is a deliberate choice for routing stability.
Where it fits in the routing stack
Production agentic coding stacks in mid-2026 typically use 3-4 model tiers:
| Tier | Workload share | Best models |
|---|---|---|
| Frontier closed | 10-20% (hardest tasks) | Claude Fable 5, GPT-5.5, GLM-5.2 (open top) |
| Open-weight top | 50-70% (bulk agentic) | DeepSeek V4 Pro, Kimi K2.7 Code, GLM-5.2 |
| Small coding | 15-25% (routine) | North Mini Code, Qwen 3.6 35B-A3B, Devstral Small 2 |
| Edge / on-device | Variable | North Mini Code (FP4), Qwen 3.6, gpt-oss-20B |
North Mini Code’s natural slot is the bottom two tiers. The 3B active parameter footprint means deployment cost is roughly 1/10 of a frontier model run, and the speed advantage matters at high concurrency.
How to try North Mini Code
| Path | Best for | Notes |
|---|---|---|
| OpenCode | Real-world coding agent evaluation | Free in your harness of choice |
| Cohere API | Production deployment via Cohere’s hosted service | Dashboard at dashboard.cohere.com |
| Cohere Model Vault | Dedicated managed inference for enterprise | Single-tenant, custom SLAs |
| OpenRouter | Drop-in for OpenAI-compatible stacks | Available |
| Hugging Face | Self-host, fine-tune | bf16, fp8, w4a16 variants |
| Single H100 self-host | Solo dev, isolated deployments | FP8 fits 1x H100 |
For most evaluation work, OpenCode is the fastest path. For production at scale, the Cohere API or Cohere Model Vault are the managed options. For sovereign or air-gapped deployments, self-host the Hugging Face weights.
What North Mini Code is not
- Not a general-purpose chatbot. Trained for coding; non-coding agentic scores are weaker (14% on GDPval-AA, 37% on Tau2-Bench Telecom).
- Not the smartest coding model in its size class by every benchmark. Qwen 3.6 35B-A3B edges it on raw AA Coding Index (35.2 vs 33.4).
- Not multimodal. No vision input — for screenshot-to-code workflows, route elsewhere.
- Not a frontier-replacement. This is the routine-coding tier, not the hardest-tasks tier.
The strategic read
North Mini Code is Cohere’s strongest competitive move in 18 months. By targeting the small-coding-model tier with a fast, agent-harness-robust, Western-sovereign offering, Cohere claims a slot that DeepSeek, Qwen, GLM, and Kimi cannot easily reach for some Western enterprise buyers — specifically those where Chinese-lab provenance is a procurement blocker.
For most developers, the choice between North Mini Code, Qwen 3.6 35B-A3B, and Devstral Small 2 is workload-specific and small. For Western enterprise procurement, North Mini Code is now the default pick at this tier.
Either way, the existence of a Cohere developer model materially raises the floor of the small-coding-model tier and makes routing patterns that bypass frontier models for the majority of tasks more credible than they were a month ago.