Is Qwen 3.7 Max open source?

No. This is a major shift for Alibaba. Earlier Qwen models (Qwen 2.5, Qwen 3 base family) were open-weight, but Qwen 3.7 Max is proprietary and API-only. Alibaba is keeping smaller Qwen 3.6 and Qwen3-VL models open, but the 'Max' tier is now closed-weight like GPT-5.5 and Claude Opus 4.7. The strategy: be the cheap closed-weight frontier model that piggybacks on the Anthropic developer ecosystem via API compatibility.

How does Qwen 3.7 Max work with Claude Code?

Qwen 3.7 Max supports the Anthropic API protocol natively. That means existing Claude Code installations, Anthropic SDK clients, and any tool built against /v1/messages can switch to Qwen 3.7 Max by changing the base URL and API key. This is a deliberate go-to-market choice — Alibaba isn't trying to build a new developer ecosystem; they're plugging into the existing Anthropic one at one-sixth the price.

Can Qwen 3.7 Max really run for 35 hours autonomously?

That's Alibaba's internal demo claim — 35 hours of continuous execution, 1,000+ tool calls, on a previously-unseen Zhenwu M890 chip, producing a 10x faster AI kernel than the chip vendor's reference. As of May 27, 2026, no independent third-party reproduction exists. The benchmarks that ARE verified: Terminal-Bench 2.0 score 69.7, SWE-bench Verified 80.4%, KernelBench L3 1.98x median speedup over PyTorch. Treat 35 hours as a marketing ceiling, not a typical workload.

Quick Answer

What is Qwen 3.7 Max? Alibaba's 1M-Context AI (May 2026)

Q: What is Qwen 3.7 Max?

Qwen 3.7 Max is Alibaba's flagship large language model, released May 20, 2026 at Alibaba Cloud Summit. It's a closed-weight reasoning and agent model with a 1,048,576-token context window, designed for long-horizon autonomous tasks. Pricing is $2.50 per million input tokens and $7.50 per million output tokens via Alibaba Cloud Model Studio. It scores 56.6 on Artificial Analysis Intelligence Index, placing it fifth globally and the highest-ranked Chinese model — within 4 points of GPT-5.5 (60.2).

Published: May 27, 2026

What is Qwen 3.7 Max? (May 2026)

Qwen 3.7 Max is Alibaba’s new flagship large language model, launched May 20, 2026. It’s the most credible Chinese frontier model to date — and the first Qwen “Max” tier to ship closed-weight.

Last verified: May 27, 2026.

The 30-second summary

	Qwen 3.7 Max
Vendor	Alibaba
Released	May 20, 2026 (preview May 14, API May 19)
Weights	Closed (API only)
Context window	1,048,576 tokens
Pricing	$2.50 / $7.50 per 1M tokens (in / out)
API protocol	Anthropic-compatible
AA Intelligence Index	56.6 (5th globally)
SWE-bench Verified	80.4%
Apex Math	44.5 (beats Claude Opus 4.7’s 34.5)
Hosted at	Alibaba Cloud Model Studio
Available via	API, Qwen Studio (test), Anthropic SDK clients

What Alibaba changed

Three things make Qwen 3.7 Max different from previous Qwen flagships.

1. It’s closed-weight

This is the biggest shift. Qwen 2.5, Qwen 3, Qwen 3.5, Qwen 3.6 — all open under permissive licenses. Qwen 3.7 Max is proprietary, API-only, no model card download. Alibaba is keeping the smaller models open (Qwen 3.6, Qwen3-VL series), but they’ve split the lineup: open below, closed at the frontier.

The strategy: monetize the frontier tier the same way OpenAI and Anthropic do, while keeping the open-source loyalty alive lower down the stack.

2. It speaks Anthropic’s API protocol

Qwen 3.7 Max implements the Anthropic /v1/messages API natively. That means Claude Code works with Qwen 3.7 Max out of the box — change the base URL, change the API key, you’re done. Same for any tool built on Anthropic’s SDK.

This is the most consequential interoperability decision of 2026 so far. Alibaba is treating Anthropic’s API as the de facto standard for agent tooling, the way HTTP became the standard for application protocols.

3. The 1M-token context window

Quadrupled from Qwen 3.6 Max’s 256K. Output limit is 65,536 tokens. This puts it at parity with Gemini 3.5 Flash for raw context size and well above Claude Opus 4.7 (500K) and GPT-5.5 (400K).

Caveat: independent retrieval-quality benchmarks for the 1M window haven’t published yet. The headline number is real, but how well it actually uses the deep context tail isn’t independently confirmed.

Benchmark performance

Independent and Alibaba-reported scores as of May 27, 2026:

Benchmark	Qwen 3.7 Max	Claude Opus 4.7	GPT-5.5
AA Intelligence Index	56.6	57.3	60.2
SWE-bench Verified	80.4%	80.8%	~81%
Terminal-Bench 2.0	69.7	~75	~72
MCP-Atlas	76.4	–	–
MCP-Mark	60.8	–	–
Apex Math	44.5	34.5	~41
Humanity’s Last Exam	41.4	–	–
YC-Bench (1yr startup sim, virtual revenue)	$2.08M	–	–

The pattern: roughly tied with the closed frontier on coding and reasoning, ahead on math, behind on broad-instruction following.

What the 35-hour claim actually means

Alibaba ran Qwen 3.7 Max on an internal benchmark designed to stress long-horizon autonomy:

Target: write an AI computing kernel for a Zhenwu M890 chip (Alibaba’s own custom silicon, not publicly documented before this demo)
Duration: 35 continuous hours
Tool calls: 1,000+
Output: a kernel that was 10x faster than the chip vendor’s reference implementation

This is impressive — but it’s a controlled internal demo on a chip Alibaba designed. Verified third-party benchmarks (KernelBench L3) show a more modest 1.98x median speedup over PyTorch reference, which is still strong but ~5x less than the marketing headline.

What’s real: Qwen 3.7 Max can sustain long autonomous loops better than Qwen 3.6 Plus, and YC-Bench shows it generates ~2x the virtual revenue in a one-year startup simulation. What’s marketing: assume 35 hours of unattended autonomy without verification.

Pricing

Tier	Input ($/1M tokens)	Output ($/1M tokens)
Qwen 3.7 Max	$2.50	$7.50
Claude Opus 4.7	$15	$75
GPT-5.5	$10	$40
Gemini 3.5 Flash	$1.50	$9
DeepSeek V4 Pro	~$1.10	~$4.40

Qwen 3.7 Max is 6x cheaper than Claude Opus 4.7 on input, 10x cheaper on output — at near-frontier intelligence. The cheapest frontier-tier API on the market in May 2026.

Who should use it

Cost-sensitive agent shops already on Claude Code → migration is changing one env var, savings are ~80%.
Long-context workloads (entire-codebase reasoning, long-document RAG) — the 1M window helps, with caveat about retrieval quality.
Math-heavy applications — Apex Math advantage is real and significant.
Multi-hour agent fleets — the autonomy story (with caveats) is best-in-class on price/performance.

Who shouldn’t

US federal / defense procurement — Chinese-hosted inference is generally non-procurable.
EU regulated industries — Alibaba Cloud AI Act compliance is unclear in May 2026.
Any workload with strong English instruction-following requirements — GPT-5.5 still wins here.
Highly geopolitical content — Chinese alignment defaults differ from Western RLHF defaults.

Verdict

Qwen 3.7 Max is the first Chinese model where the right answer to “is this a serious frontier alternative?” is genuinely yes, for the right workloads. The 80% price gap and Anthropic API compatibility mean migration is trivial. The autonomy claims are exaggerated but the underlying capability is real.

For US/EU non-regulated workloads, it’s the new default for cheap frontier agents. For everyone else, it’s still GPT-5.5 or Claude Opus 4.7.

Sources: Alibaba Group official announcement, Qwen Studio, Marktechpost, VentureBeat, Artificial Analysis, CAISI evaluation.