AI agents · OpenClaw · self-hosting · automation

Quick Answer

What Is North Mini Code? Cohere's First Developer Model Explained

Published:

What Is North Mini Code? Cohere’s First Developer Model Explained

Cohere released North Mini Code on June 9, 2026 — a 30-billion-parameter MoE model with 3 billion active parameters, the first model in Cohere’s new “North” family, designed for agentic software engineering. It is Apache 2.0 open-weight and ships with explicit OpenCode harness integration. Here is what it is, how it performs, and where it fits.

Last verified: June 19, 2026.

TL;DR

  • North Mini Code is Cohere’s first developer-purpose model — Mixture-of-Experts, 30B total / 3B active.
  • Released June 9, 2026, Apache 2.0 license.
  • 256K context length, 64K max generation, text-only.
  • Minimum inference hardware: 1x H100 at FP8.
  • 33.4 on AA Coding Index — top-3 in the small-coding-model tier.
  • Up to 2.8x higher throughput than Devstral Small 2 on Cohere’s testing.
  • First Cohere model designed harness-agnostic for OpenCode and similar agentic frameworks.

The headline specs

SpecNorth Mini Code
LabCohere
ReleaseJune 9, 2026
ArchitectureMoE (128 experts, 8 active per token)
Total parameters30 billion
Active parameters3 billion
AttentionInterleaved sliding-window (RoPE) + global (no positional) at 3:1 ratio
ActivationSwiGLU
RouterSigmoid + top-k
Context length256K
Max generation64K
Vision inputNo
LicenseApache 2.0
AA Coding Index33.4
AA Intelligence Index27.6
GDPval-AA14%
Tau2-Bench Telecom37%
Throughput (Cohere reported)~199 output tokens/sec
Throughput vs Devstral Small 2Up to 2.8x
Min inference hardware1x H100 @ FP8, 1x H100 @ FP4

What Cohere built it for

North Mini Code is the first model in Cohere’s announced “next generation” model family. The thesis is explicit in Cohere’s positioning:

  • Small enough to deploy anywhere. 3B active parameters fits cleanly on a single H100. This is the “sovereign developer ecosystem” tier — environments where multi-GPU inference is impractical, but you still need real coding capability.
  • Agent-harness-robust. Cohere trained across multiple scaffolds rather than over-fitting to one. The pitch is that production coding agents fail when models are over-optimized for a single harness; North Mini Code is designed to work well across OpenCode, custom MCP setups, terminal-based agents, and so on.
  • Speed-first. Output throughput targets matter as much as raw capability when the workload is many parallel agentic tasks on shared infrastructure.
  • Western sovereign provenance. Cohere is Canadian; Apache 2.0 license; no Chinese-lab procurement friction. This matters for some enterprise and government buyers.

Architecture in plain English

North Mini Code is a decoder-only Transformer with a sparse Mixture-of-Experts feed-forward block. Each layer has 128 expert sub-networks; for each token, the router selects 8 of them. This is the standard 2026 MoE recipe and gives 30B total parameter capacity with only 3B activated per token.

The attention block interleaves two patterns at a 3:1 ratio:

  • Sliding-window attention with RoPE (3 of every 4 layers): efficient over long contexts, captures local dependencies.
  • Global attention with no positional embeddings (1 of every 4 layers): captures long-range dependencies, no positional bias.

The single dense feed-forward layer before the sparse MoE layers is a deliberate choice for routing stability.

Where it fits in the routing stack

Production agentic coding stacks in mid-2026 typically use 3-4 model tiers:

TierWorkload shareBest models
Frontier closed10-20% (hardest tasks)Claude Fable 5, GPT-5.5, GLM-5.2 (open top)
Open-weight top50-70% (bulk agentic)DeepSeek V4 Pro, Kimi K2.7 Code, GLM-5.2
Small coding15-25% (routine)North Mini Code, Qwen 3.6 35B-A3B, Devstral Small 2
Edge / on-deviceVariableNorth Mini Code (FP4), Qwen 3.6, gpt-oss-20B

North Mini Code’s natural slot is the bottom two tiers. The 3B active parameter footprint means deployment cost is roughly 1/10 of a frontier model run, and the speed advantage matters at high concurrency.

How to try North Mini Code

PathBest forNotes
OpenCodeReal-world coding agent evaluationFree in your harness of choice
Cohere APIProduction deployment via Cohere’s hosted serviceDashboard at dashboard.cohere.com
Cohere Model VaultDedicated managed inference for enterpriseSingle-tenant, custom SLAs
OpenRouterDrop-in for OpenAI-compatible stacksAvailable
Hugging FaceSelf-host, fine-tunebf16, fp8, w4a16 variants
Single H100 self-hostSolo dev, isolated deploymentsFP8 fits 1x H100

For most evaluation work, OpenCode is the fastest path. For production at scale, the Cohere API or Cohere Model Vault are the managed options. For sovereign or air-gapped deployments, self-host the Hugging Face weights.

What North Mini Code is not

  • Not a general-purpose chatbot. Trained for coding; non-coding agentic scores are weaker (14% on GDPval-AA, 37% on Tau2-Bench Telecom).
  • Not the smartest coding model in its size class by every benchmark. Qwen 3.6 35B-A3B edges it on raw AA Coding Index (35.2 vs 33.4).
  • Not multimodal. No vision input — for screenshot-to-code workflows, route elsewhere.
  • Not a frontier-replacement. This is the routine-coding tier, not the hardest-tasks tier.

The strategic read

North Mini Code is Cohere’s strongest competitive move in 18 months. By targeting the small-coding-model tier with a fast, agent-harness-robust, Western-sovereign offering, Cohere claims a slot that DeepSeek, Qwen, GLM, and Kimi cannot easily reach for some Western enterprise buyers — specifically those where Chinese-lab provenance is a procurement blocker.

For most developers, the choice between North Mini Code, Qwen 3.6 35B-A3B, and Devstral Small 2 is workload-specific and small. For Western enterprise procurement, North Mini Code is now the default pick at this tier.

Either way, the existence of a Cohere developer model materially raises the floor of the small-coding-model tier and makes routing patterns that bypass frontier models for the majority of tasks more credible than they were a month ago.