Can DeepSeek V4 actually run as well on Huawei Ascend as Nvidia H200?

On launch day (April 24, 2026), Huawei announced full V4 support on its Ascend 950-based supernode clusters via vLLM-Ascend with w8a8 quantization. Real-world throughput gap to H200 is reportedly within 15-25% on inference, larger on training. For inference at scale, Ascend 950 is now a viable alternative — especially in China and for export-restricted deployments.

Is Huawei Ascend cheaper than Nvidia H200?

Yes — significantly, especially in China. Ascend 950 supernode pricing is roughly 40-60% lower than H200 SXM5 per FLOP, before factoring in the absence of US export-control premiums. Outside China, availability is the main constraint, not price.

Is the V4 model identical on both Nvidia and Ascend?

Yes — same weights, same architecture. The differences are in the inference stack (vLLM vs vLLM-Ascend), quantization recipes (FP8 standard on Nvidia, w8a8 on Ascend), and supported context lengths under tight memory budgets. Quality is identical at matched precision.

Should US/EU teams care about Ascend support?

Indirectly, yes. Ascend support means DeepSeek's training and serving costs in China are insulated from US export controls. That structurally lowers DeepSeek's cost basis — which feeds into the aggressive API pricing US/EU teams benefit from via OpenRouter, Together, Fireworks, etc.

Quick Answer

DeepSeek V4 on Huawei Ascend vs Nvidia H200 (April 2026)

Published: April 25, 2026

DeepSeek V4 on Huawei Ascend vs Nvidia H200 (April 2026)

DeepSeek V4 launched on April 24, 2026 with full first-day support for both Nvidia H200 and Huawei Ascend 950. This is the first time a major frontier-tier open model has shipped with first-class non-Nvidia inference. Here’s what it actually means.

Last verified: April 25, 2026

TL;DR

	Nvidia H200 SXM5	Huawei Ascend 950
Memory	141 GB HBM3e	144 GB (supernode config)
FP16 throughput	~67 TFLOPS	~58 TFLOPS (est.)
V4 inference stack	vLLM, SGLang	vLLM-Ascend
V4-Pro support	✅ Full (FP8)	✅ Full (w8a8)
V4-Flash support	✅ Full (FP8)	✅ Full (w8a8)
Power consumption	~700W	~600W
Availability outside China	Yes (constrained)	Limited
Price per FLOP	Higher	~40-60% lower in China

The launch-day reality

When DeepSeek V4 went live on April 24, 2026, two things shipped simultaneously:

Hugging Face weights for V4-Pro and V4-Flash, ready for vLLM serving on Nvidia.
Huawei Ascend supernode endorsement — Huawei Technologies publicly announced same-day full support on Ascend 950-based clusters.

Reuters reported Huawei chips were used in part of V4-Flash’s training. Hours later, ModelScope hosted the w8a8-quantized weights specifically tuned for Ascend deployment.

This is unprecedented for a non-Nvidia AI chip in the frontier-model tier.

Inference performance

Nvidia H200 SXM5

VRAM: 141 GB HBM3e
Bandwidth: 4.8 TB/s
V4-Flash on a single H200: ~50 req/sec at FP8, ~220 tokens/sec output per request
V4-Pro: Needs a multi-H200 cluster (16× minimum)
Stack: vLLM 0.7+, SGLang 0.4+
Quantization: Native FP8

Huawei Ascend 950 (supernode)

Memory per node: 144 GB
Topology: Supernode optimized for MoE — high inter-chip bandwidth
V4-Flash on Ascend 950 supernode: Comparable throughput to H200 (within 15-25% based on initial Huawei-published numbers)
V4-Pro: Officially supported on multi-node Ascend 950 supernodes
Stack: vLLM-Ascend with dedicated DeepSeek V4 launch scripts
Quantization: w8a8 (8-bit weights, 8-bit activations)

The performance gap is real but smaller than it was for prior generations. For inference workloads in China, Ascend 950 is now a credible alternative.

Why DeepSeek’s Ascend support matters

1. Sovereignty and supply chain

US export controls on H100/H200 to China have tightened repeatedly since 2022. Ascend 950 gives Chinese AI teams a viable supply chain that doesn’t depend on US chip access.

2. Cost structure

In China, Ascend 950 is dramatically cheaper than smuggled or downgraded H200s. Ascend supernode pricing for cloud providers is reportedly 40-60% below H200 equivalents on a per-FLOP basis. That cost advantage is partly why V4 can be priced at $1.74/$3.48.

3. Training optimization

DeepSeek-Flash was partly trained on Ascend 950 hardware. This is a major signal that Chinese-domestic training is now viable for frontier-tier models. Earlier Chinese frontier models (V3, R1) were Nvidia-trained.

4. Optionality for everyone

Even US/EU teams benefit. The lower cost basis in China lets DeepSeek price aggressively on the open market. OpenRouter, Together, and Fireworks all serve V4 from US/EU infrastructure at prices that wouldn’t be possible without DeepSeek’s underlying cost advantage.

What you actually deploy on

If you’re in China

Ascend 950 supernodes are the cost-optimal choice. Major Chinese cloud providers (Alibaba, Tencent, Huawei Cloud) all offer V4 inference on Ascend.

# vLLM-Ascend example
export USE_MULTI_BLOCK_POOL=1
export VLLM_ASCEND_ENABLE_FUSED_MC2=1

vllm serve ./deepseek-v4-flash-w8a8 \
  --tensor-parallel-size 8 \
  --quantization w8a8 \
  --max-model-len 1048576

If you’re in the US/EU

Nvidia H200 remains the default inference chip. vLLM with FP8 is the production stack.

vllm serve deepseek-ai/DeepSeek-V4-Flash \
  --tensor-parallel-size 4 \
  --quantization fp8 \
  --max-model-len 1048576 \
  --enable-chunked-prefill

For most US/EU teams: don’t self-host at all. Use OpenRouter or Together AI’s V4 endpoints. They handle the H200 fleet for you.

Performance comparison: real numbers

DeepSeek and Huawei published partial benchmarks on launch. Independent verification is still rolling in, but early numbers suggest:

Workload	H200 (FP8)	Ascend 950 (w8a8)	Gap
V4-Flash, 8K context, single-stream	220 tok/s	185 tok/s	+19% H200
V4-Flash, 8K context, batched (50 req/s)	50 req/s	42 req/s	+19% H200
V4-Pro, 32K context, single-stream	110 tok/s	88 tok/s	+25% H200
Long-context (500K), single-stream	45 tok/s	38 tok/s	+18% H200

The H200 advantage is real but not the 2-3× of prior generations. For most production workloads, the gap is engineering-noise level — and the price difference more than makes up for it in China.

What this means for the AI hardware market

A few honest predictions for the rest of 2026:

Nvidia retains the training crown for at least another generation. Multi-node FP8 training on H200/B200 is still dominant.
Ascend takes inference share in China. By end of Q3 2026, expect Ascend to handle the majority of LLM inference inside China — not because it’s better, but because it’s available, cheap, and now officially supported by frontier models.
Other Chinese chips follow. Cambricon, Biren, Moore Threads — all racing to get DeepSeek V4 inference certified.
US/EU teams care indirectly. You won’t run on Ascend, but the model you serve via OpenRouter or self-host on H200 was made possible by China’s ability to keep training cheaply on Ascend.

Bottom line

The DeepSeek V4 launch isn’t just a model release — it’s a signal that the AI compute monopoly is fragmenting. For the first time, a frontier-tier model ships with first-class non-Nvidia inference support. Whether you serve V4 on H200 in Virginia or Ascend 950 in Shanghai, the result is the same: near-Opus-4.7 quality at one-seventh the price.

For deployment decisions in late April 2026:

In China: Ascend 950 supernodes
In the US/EU: OpenRouter / Together / Fireworks (H200 fleet under the hood)
At scale (>10B tokens/month): Self-hosted H200 cluster

The hardware doesn’t actually matter much for app developers. What matters is that the underlying cost curve just got steeper — in users’ favor.

Last verified: April 25, 2026. Sources: Reuters reporting on Huawei Ascend supernode V4 support, DeepSeek V4 release notes, Hugging Face deepseek-ai/DeepSeek-V4-Pro model card, vLLM-Ascend project documentation, Fortune coverage of DeepSeek-Huawei integration.