Poolside Laguna XS.2 vs Qwen 3.6 vs DeepSeek V4 (May 2026)
Poolside Laguna XS.2 vs Qwen 3.6 vs DeepSeek V4 (May 2026)
On April 28, 2026, US AI startup Poolside released Laguna XS.2 — a 33B-A3B Mixture of Experts model under the Apache 2.0 license, designed for local agentic coding. It’s the most credible Western open-weight coding model in months, and it lands in a market dominated by Chinese open-weight models (Qwen 3.6, DeepSeek V4 Pro, GLM 5.1, Kimi K2.6). Here’s how the three compare for local and self-hosted coding workloads.
Last verified: May 7, 2026
The three at a glance
| Capability | Laguna XS.2 | Qwen 3.6 | DeepSeek V4 Pro |
|---|---|---|---|
| Vendor | Poolside (US) | Alibaba (China) | DeepSeek (China) |
| Released | April 28, 2026 | February 2026 (3.6 series) | March 2026 (V4 Pro) |
| License | Apache 2.0 | Apache 2.0 | DeepSeek License (permissive, with FOU) |
| Architecture | 33B-A3B MoE | Dense + MoE variants | 685B-A37B MoE |
| Sizes available | 33B-A3B | 7B, 72B, 235B-A22B | 685B-A37B |
| Context | 256K | 256K | 128K (1M extended) |
| Quants | NVFP4, INT4 | GGUF, AWQ, INT4 | GGUF, AWQ, INT4 |
| Best raw coding | Mid-tier | High (72B+) | Frontier among open |
| Local machine fit | Excellent (single H100 / RTX 5090) | Mixed (7B easy, 72B hard) | Hard (needs 2-4× H100) |
| Best for | Local single-GPU dev | Production self-hosted | Highest-quality open coding |
What Laguna XS.2 is structurally
Poolside designed Laguna XS.2 specifically for local agentic coding:
- 33B total parameters, 3B activated — extremely cheap inference per token.
- MoE routing that the company tuned for “long-horizon agentic” tasks (multi-step coding work).
- Apache 2.0 licensing — no field-of-use restrictions.
- NVFP4 and INT4 quants day-one — runs on RTX 5090, single H100, or NVIDIA DGX Spark / GB10 mini-systems.
- Hugging Face and direct distribution at huggingface.co/poolside.
The 33B-A3B size means it activates only ~3B parameters per token, so it’s fast enough for tight agentic iteration loops on consumer hardware while having enough total capacity to know about a wide range of code.
Qwen 3.6 in May 2026
Alibaba’s Qwen 3.6 family is the most flexible open-weight series:
- Sizes: 7B, 72B, 235B-A22B.
- Strong on coding — Qwen 3.6 Coder variants are competitive with DeepSeek V4 Pro at smaller sizes.
- Apache 2.0 licensed.
- Wide ecosystem support — vLLM, SGLang, Ollama, llama.cpp all have first-class support.
- Multilingual — strong on Chinese and English code documentation.
Qwen 3.6 is the safe default for production self-hosted serving in May 2026.
DeepSeek V4 Pro in May 2026
DeepSeek V4 Pro remains the open-weight coding frontier:
- 685B total / 37B activated MoE — needs serious infrastructure.
- Best open-weight SWE-bench Verified score in May 2026.
- 128K context with 1M extended via YaRN.
- Permissive license with field-of-use restrictions (no military, no surveillance).
- Cheap inference at scale thanks to MoE sparsity.
DeepSeek V4 Pro is what you run when you have a real GPU cluster and want best-quality open coding.
Where each one wins
Laguna XS.2 wins for…
- Solo developers running coding agents on RTX 4090 / RTX 5090 / Apple M3 Max / M4 Max.
- Air-gapped or offline development environments.
- US government / defense contractors needing US-vendor + Apache 2.0.
- Quick local agent prototyping without API costs.
- Edge / appliance deployments (NVIDIA DGX Spark, Jetson Orin AGX).
Qwen 3.6 wins for…
- Production self-hosted inference at scale with vLLM or SGLang.
- Customers wanting model size flexibility (7B for completion, 72B for agents).
- Multilingual codebases (English + Chinese / Japanese / Korean docs).
- Existing Alibaba Cloud or Tongyi customers.
- Best balance of quality, speed, and ecosystem maturity.
DeepSeek V4 Pro wins for…
- Highest-quality open-weight coding benchmarks.
- Customers with 4-8× H100 / MI325X clusters available.
- Long-horizon agentic tasks where deeper reasoning matters.
- Multi-tenant inference where MoE sparsity reduces per-request cost.
- Workloads that need 1M-token extended context.
Hardware reality
| Model | Min single-GPU | Comfortable | Best at scale |
|---|---|---|---|
| Laguna XS.2 (NVFP4) | RTX 4090 (24GB) | RTX 5090 / H100 (80GB) | 1-2× H100 |
| Qwen 3.6 7B | RTX 4070 (12GB) | RTX 4090 (24GB) | Single H100 |
| Qwen 3.6 72B | 2× RTX 4090 (AWQ INT4) | 2× H100 | 4× H100 |
| Qwen 3.6 235B-A22B | Not feasible | 2-4× H100 | 8× H100 / MI325X |
| DeepSeek V4 Pro | Not feasible | 4× H100 (INT4) | 8× H100 / MI325X |
Laguna XS.2’s appeal is real here — it’s the only model in this list that runs comfortably on a single consumer card.
Coding capability ranking (May 2026 open-weight, rough)
Per llm-stats.com, the LMSYS Chatbot Arena, and several independent SWE-bench runs in May 2026:
- DeepSeek V4 Pro — SWE-bench Verified ~76% (frontier among open weights).
- Qwen 3.6 235B-A22B — ~73%.
- Kimi K2.6 — ~72%.
- GLM 5.1 — ~70%.
- Llama 5 405B — ~68%.
- Qwen 3.6 72B — ~66%.
- Laguna XS.2 33B-A3B — Poolside reports competitive with mid-tier; independent SWE-bench TBD.
- DeepSeek V4 Flash, Minimax M2.7 — mid-60s.
Frontier closed (Opus 4.6/4.7, GPT-5.5) sit at ~80-85%. Laguna XS.2 isn’t trying to beat those — it’s trying to be the best small model you can run locally.
Apache 2.0 vs DeepSeek License — does it matter?
For most developers, no. For enterprise procurement in regulated sectors:
- Apache 2.0 (Laguna XS.2, Qwen 3.6) — no field-of-use restrictions, full commercial use, modification, redistribution.
- DeepSeek License — permissive, but explicitly prohibits military, surveillance, and certain other use cases. Chinese-origin can be a separate procurement issue for US government / defense contracts.
- Llama Community License (Llama 5) — permissive but with restrictions for very large customers (>700M MAU) and acceptable use policy.
For US public-sector or defense workloads, Apache 2.0 + US-vendor (Laguna XS.2) is the cleanest profile. That’s the procurement story Poolside is selling.
Bottom line
In May 2026, Laguna XS.2 is the best new option for local single-GPU agentic coding — it’s the only model in this comparison that runs comfortably on consumer hardware while being Apache 2.0 and backed by a US vendor. Qwen 3.6 is the right default for production self-hosted serving thanks to its size flexibility and ecosystem maturity. DeepSeek V4 Pro remains the frontier for open-weight coding quality when you have the GPU budget. The smart pattern: run Laguna XS.2 on developer laptops for sensitive local work, Qwen 3.6 for shared self-hosted inference, and reserve DeepSeek V4 Pro or frontier closed models for the hardest agentic tasks.
Sources: Poolside official announcement (April 28, 2026), Poolside models page (May 2026), Hugging Face poolside org (May 2026), AI CERTs News coverage (May 4, 2026), Softtechhub coverage (May 5, 2026), AI Chronicle (May 2026), NVIDIA Developer Forums (May 4, 2026), llm-stats.com benchmarks (May 2026).