AI agents · OpenClaw · self-hosting · automation

Quick Answer

DeepSeek V4 on Huawei Ascend vs Nvidia H200 (April 2026)

Published:

DeepSeek V4 on Huawei Ascend vs Nvidia H200 (April 2026)

DeepSeek V4 launched on April 24, 2026 with full first-day support for both Nvidia H200 and Huawei Ascend 950. This is the first time a major frontier-tier open model has shipped with first-class non-Nvidia inference. Here’s what it actually means.

Last verified: April 25, 2026

TL;DR

Nvidia H200 SXM5Huawei Ascend 950
Memory141 GB HBM3e144 GB (supernode config)
FP16 throughput~67 TFLOPS~58 TFLOPS (est.)
V4 inference stackvLLM, SGLangvLLM-Ascend
V4-Pro support✅ Full (FP8)✅ Full (w8a8)
V4-Flash support✅ Full (FP8)✅ Full (w8a8)
Power consumption~700W~600W
Availability outside ChinaYes (constrained)Limited
Price per FLOPHigher~40-60% lower in China

The launch-day reality

When DeepSeek V4 went live on April 24, 2026, two things shipped simultaneously:

  1. Hugging Face weights for V4-Pro and V4-Flash, ready for vLLM serving on Nvidia.
  2. Huawei Ascend supernode endorsement — Huawei Technologies publicly announced same-day full support on Ascend 950-based clusters.

Reuters reported Huawei chips were used in part of V4-Flash’s training. Hours later, ModelScope hosted the w8a8-quantized weights specifically tuned for Ascend deployment.

This is unprecedented for a non-Nvidia AI chip in the frontier-model tier.

Inference performance

Nvidia H200 SXM5

  • VRAM: 141 GB HBM3e
  • Bandwidth: 4.8 TB/s
  • V4-Flash on a single H200: ~50 req/sec at FP8, ~220 tokens/sec output per request
  • V4-Pro: Needs a multi-H200 cluster (16× minimum)
  • Stack: vLLM 0.7+, SGLang 0.4+
  • Quantization: Native FP8

Huawei Ascend 950 (supernode)

  • Memory per node: 144 GB
  • Topology: Supernode optimized for MoE — high inter-chip bandwidth
  • V4-Flash on Ascend 950 supernode: Comparable throughput to H200 (within 15-25% based on initial Huawei-published numbers)
  • V4-Pro: Officially supported on multi-node Ascend 950 supernodes
  • Stack: vLLM-Ascend with dedicated DeepSeek V4 launch scripts
  • Quantization: w8a8 (8-bit weights, 8-bit activations)

The performance gap is real but smaller than it was for prior generations. For inference workloads in China, Ascend 950 is now a credible alternative.

Why DeepSeek’s Ascend support matters

1. Sovereignty and supply chain

US export controls on H100/H200 to China have tightened repeatedly since 2022. Ascend 950 gives Chinese AI teams a viable supply chain that doesn’t depend on US chip access.

2. Cost structure

In China, Ascend 950 is dramatically cheaper than smuggled or downgraded H200s. Ascend supernode pricing for cloud providers is reportedly 40-60% below H200 equivalents on a per-FLOP basis. That cost advantage is partly why V4 can be priced at $1.74/$3.48.

3. Training optimization

DeepSeek-Flash was partly trained on Ascend 950 hardware. This is a major signal that Chinese-domestic training is now viable for frontier-tier models. Earlier Chinese frontier models (V3, R1) were Nvidia-trained.

4. Optionality for everyone

Even US/EU teams benefit. The lower cost basis in China lets DeepSeek price aggressively on the open market. OpenRouter, Together, and Fireworks all serve V4 from US/EU infrastructure at prices that wouldn’t be possible without DeepSeek’s underlying cost advantage.

What you actually deploy on

If you’re in China

Ascend 950 supernodes are the cost-optimal choice. Major Chinese cloud providers (Alibaba, Tencent, Huawei Cloud) all offer V4 inference on Ascend.

# vLLM-Ascend example
export USE_MULTI_BLOCK_POOL=1
export VLLM_ASCEND_ENABLE_FUSED_MC2=1

vllm serve ./deepseek-v4-flash-w8a8 \
  --tensor-parallel-size 8 \
  --quantization w8a8 \
  --max-model-len 1048576

If you’re in the US/EU

Nvidia H200 remains the default inference chip. vLLM with FP8 is the production stack.

vllm serve deepseek-ai/DeepSeek-V4-Flash \
  --tensor-parallel-size 4 \
  --quantization fp8 \
  --max-model-len 1048576 \
  --enable-chunked-prefill

For most US/EU teams: don’t self-host at all. Use OpenRouter or Together AI’s V4 endpoints. They handle the H200 fleet for you.

Performance comparison: real numbers

DeepSeek and Huawei published partial benchmarks on launch. Independent verification is still rolling in, but early numbers suggest:

WorkloadH200 (FP8)Ascend 950 (w8a8)Gap
V4-Flash, 8K context, single-stream220 tok/s185 tok/s+19% H200
V4-Flash, 8K context, batched (50 req/s)50 req/s42 req/s+19% H200
V4-Pro, 32K context, single-stream110 tok/s88 tok/s+25% H200
Long-context (500K), single-stream45 tok/s38 tok/s+18% H200

The H200 advantage is real but not the 2-3× of prior generations. For most production workloads, the gap is engineering-noise level — and the price difference more than makes up for it in China.

What this means for the AI hardware market

A few honest predictions for the rest of 2026:

  1. Nvidia retains the training crown for at least another generation. Multi-node FP8 training on H200/B200 is still dominant.

  2. Ascend takes inference share in China. By end of Q3 2026, expect Ascend to handle the majority of LLM inference inside China — not because it’s better, but because it’s available, cheap, and now officially supported by frontier models.

  3. Other Chinese chips follow. Cambricon, Biren, Moore Threads — all racing to get DeepSeek V4 inference certified.

  4. US/EU teams care indirectly. You won’t run on Ascend, but the model you serve via OpenRouter or self-host on H200 was made possible by China’s ability to keep training cheaply on Ascend.

Bottom line

The DeepSeek V4 launch isn’t just a model release — it’s a signal that the AI compute monopoly is fragmenting. For the first time, a frontier-tier model ships with first-class non-Nvidia inference support. Whether you serve V4 on H200 in Virginia or Ascend 950 in Shanghai, the result is the same: near-Opus-4.7 quality at one-seventh the price.

For deployment decisions in late April 2026:

  • In China: Ascend 950 supernodes
  • In the US/EU: OpenRouter / Together / Fireworks (H200 fleet under the hood)
  • At scale (>10B tokens/month): Self-hosted H200 cluster

The hardware doesn’t actually matter much for app developers. What matters is that the underlying cost curve just got steeper — in users’ favor.


Last verified: April 25, 2026. Sources: Reuters reporting on Huawei Ascend supernode V4 support, DeepSeek V4 release notes, Hugging Face deepseek-ai/DeepSeek-V4-Pro model card, vLLM-Ascend project documentation, Fortune coverage of DeepSeek-Huawei integration.