How does North Mini Code perform on benchmarks?

Strong for its size class, not the absolute leader. On the Artificial Analysis Coding Index (a weighted average of Terminal-Bench Hard and SciCode), North Mini Code scores 33.4, significantly above GLM-4.7-Flash at 25.9 and competitive with Qwen 3.6 35B-A3B at 35.2. On the Artificial Analysis Intelligence Index, it scores 27.6, above gpt-oss-20B at 24.5 and just below Mistral Small 4 (119B/6.5B active) at 27.8. On non-coding agentic tasks it scores lower (14% on GDPval-AA, 37% on Tau2-Bench Telecom), which Cohere acknowledges — the model is explicitly tuned for coding, not general agentic work.

What is special about North Mini Code's training?

Two things. First, Cohere trained the model across multiple agent scaffolds rather than optimizing for a single harness. The pitch is that real-world coding agents need to run robustly inside many different orchestration frameworks (OpenCode, custom MCP harnesses, terminal agents), and a model trained on one harness often fails on others. North Mini Code is trained to be harness-agnostic. Second, the speed-throughput optimization: Cohere reports up to 2.8x higher output throughput than Devstral Small 2 under identical concurrency and hardware. This matters at production scale where many parallel agentic tasks share GPU capacity.

Where should I use North Mini Code in a production stack?

Best fit: the routine-coding tier in a multi-model routing setup. For agentic coding workloads, route the hardest 10-20% of tasks to a frontier model (Claude Fable 5, GLM-5.2, or GPT-5.5), the bulk 60-70% to a top open-weight model (DeepSeek V4 Pro, Kimi K2.7 Code), and the routine 15-25% (boilerplate generation, single-function edits, structured-output workflows, edge deployments) to a small coding model. North Mini Code is one of the top three picks for that bottom tier, alongside Qwen 3.6 35B-A3B and Devstral Small 2. For Western sovereign procurement and OpenCode harness integration, North Mini Code is the default choice.

Quick Answer

What Is North Mini Code? Cohere's First Developer Model Explained

Q: What is North Mini Code?

North Mini Code is Cohere's first model purpose-built for developers, released on June 9, 2026 under the Apache 2.0 license. It is a 30-billion-parameter Mixture-of-Experts (MoE) model with 3 billion active parameters, designed specifically for agentic software engineering tasks. The architecture uses 128 experts (8 activated per token) with interleaved sliding-window and global attention. Context length is 256K with 64K maximum generation. Minimum inference hardware is one H100 at FP8. North Mini Code marks the launch of Cohere's 'North' model family and positions the company as a Western-aligned, sovereign-AI provider in the small-coding-model tier.

Published: June 19, 2026

What Is North Mini Code? Cohere’s First Developer Model Explained

Cohere released North Mini Code on June 9, 2026 — a 30-billion-parameter MoE model with 3 billion active parameters, the first model in Cohere’s new “North” family, designed for agentic software engineering. It is Apache 2.0 open-weight and ships with explicit OpenCode harness integration. Here is what it is, how it performs, and where it fits.

Last verified: June 19, 2026.

TL;DR

North Mini Code is Cohere’s first developer-purpose model — Mixture-of-Experts, 30B total / 3B active.
Released June 9, 2026, Apache 2.0 license.
256K context length, 64K max generation, text-only.
Minimum inference hardware: 1x H100 at FP8.
33.4 on AA Coding Index — top-3 in the small-coding-model tier.
Up to 2.8x higher throughput than Devstral Small 2 on Cohere’s testing.
First Cohere model designed harness-agnostic for OpenCode and similar agentic frameworks.

The headline specs

Spec	North Mini Code
Lab	Cohere
Release	June 9, 2026
Architecture	MoE (128 experts, 8 active per token)
Total parameters	30 billion
Active parameters	3 billion
Attention	Interleaved sliding-window (RoPE) + global (no positional) at 3:1 ratio
Activation	SwiGLU
Router	Sigmoid + top-k
Context length	256K
Max generation	64K
Vision input	No
License	Apache 2.0
AA Coding Index	33.4
AA Intelligence Index	27.6
GDPval-AA	14%
Tau2-Bench Telecom	37%
Throughput (Cohere reported)	~199 output tokens/sec
Throughput vs Devstral Small 2	Up to 2.8x
Min inference hardware	1x H100 @ FP8, 1x H100 @ FP4

What Cohere built it for

North Mini Code is the first model in Cohere’s announced “next generation” model family. The thesis is explicit in Cohere’s positioning:

Small enough to deploy anywhere. 3B active parameters fits cleanly on a single H100. This is the “sovereign developer ecosystem” tier — environments where multi-GPU inference is impractical, but you still need real coding capability.
Agent-harness-robust. Cohere trained across multiple scaffolds rather than over-fitting to one. The pitch is that production coding agents fail when models are over-optimized for a single harness; North Mini Code is designed to work well across OpenCode, custom MCP setups, terminal-based agents, and so on.
Speed-first. Output throughput targets matter as much as raw capability when the workload is many parallel agentic tasks on shared infrastructure.
Western sovereign provenance. Cohere is Canadian; Apache 2.0 license; no Chinese-lab procurement friction. This matters for some enterprise and government buyers.

Architecture in plain English

North Mini Code is a decoder-only Transformer with a sparse Mixture-of-Experts feed-forward block. Each layer has 128 expert sub-networks; for each token, the router selects 8 of them. This is the standard 2026 MoE recipe and gives 30B total parameter capacity with only 3B activated per token.

The attention block interleaves two patterns at a 3:1 ratio:

Sliding-window attention with RoPE (3 of every 4 layers): efficient over long contexts, captures local dependencies.
Global attention with no positional embeddings (1 of every 4 layers): captures long-range dependencies, no positional bias.

The single dense feed-forward layer before the sparse MoE layers is a deliberate choice for routing stability.

Where it fits in the routing stack

Production agentic coding stacks in mid-2026 typically use 3-4 model tiers:

Tier	Workload share	Best models
Frontier closed	10-20% (hardest tasks)	Claude Fable 5, GPT-5.5, GLM-5.2 (open top)
Open-weight top	50-70% (bulk agentic)	DeepSeek V4 Pro, Kimi K2.7 Code, GLM-5.2
Small coding	15-25% (routine)	North Mini Code, Qwen 3.6 35B-A3B, Devstral Small 2
Edge / on-device	Variable	North Mini Code (FP4), Qwen 3.6, gpt-oss-20B

North Mini Code’s natural slot is the bottom two tiers. The 3B active parameter footprint means deployment cost is roughly 1/10 of a frontier model run, and the speed advantage matters at high concurrency.

How to try North Mini Code

Path	Best for	Notes
OpenCode	Real-world coding agent evaluation	Free in your harness of choice
Cohere API	Production deployment via Cohere’s hosted service	Dashboard at dashboard.cohere.com
Cohere Model Vault	Dedicated managed inference for enterprise	Single-tenant, custom SLAs
OpenRouter	Drop-in for OpenAI-compatible stacks	Available
Hugging Face	Self-host, fine-tune	bf16, fp8, w4a16 variants
Single H100 self-host	Solo dev, isolated deployments	FP8 fits 1x H100

For most evaluation work, OpenCode is the fastest path. For production at scale, the Cohere API or Cohere Model Vault are the managed options. For sovereign or air-gapped deployments, self-host the Hugging Face weights.

What North Mini Code is not

Not a general-purpose chatbot. Trained for coding; non-coding agentic scores are weaker (14% on GDPval-AA, 37% on Tau2-Bench Telecom).
Not the smartest coding model in its size class by every benchmark. Qwen 3.6 35B-A3B edges it on raw AA Coding Index (35.2 vs 33.4).
Not multimodal. No vision input — for screenshot-to-code workflows, route elsewhere.
Not a frontier-replacement. This is the routine-coding tier, not the hardest-tasks tier.

The strategic read

North Mini Code is Cohere’s strongest competitive move in 18 months. By targeting the small-coding-model tier with a fast, agent-harness-robust, Western-sovereign offering, Cohere claims a slot that DeepSeek, Qwen, GLM, and Kimi cannot easily reach for some Western enterprise buyers — specifically those where Chinese-lab provenance is a procurement blocker.

For most developers, the choice between North Mini Code, Qwen 3.6 35B-A3B, and Devstral Small 2 is workload-specific and small. For Western enterprise procurement, North Mini Code is now the default pick at this tier.

Either way, the existence of a Cohere developer model materially raises the floor of the small-coding-model tier and makes routing patterns that bypass frontier models for the majority of tasks more credible than they were a month ago.