Huawei's Ascend Challenge: New AI Chip 2.87x Faster Than H20, First Chinese FP4 Inference

Available in: 中文

2026-03-22T11:54:17.000Z·2 min read

Huawei's new Ascend AI chip delivers 2.87x H20 performance with first domestic FP4 inference support, enabling larger LLM deployment at lower cost and reducing China's dependence on NVIDIA.

Huawei's Ascend Challenge: New AI Chip 2.87x Faster Than H20, First Chinese FP4 Inference

Huawei has released its next-generation Ascend AI processor, delivering 2.87 times the computational performance of NVIDIA's H20 — the most powerful NVIDIA chip available under US export controls. Most notably, it's the first Chinese-designed chip to support FP4 (4-bit floating point) inference, closing a critical capability gap with NVIDIA's latest Blackwell architecture.

Performance Breakdown

Metric	Huawei Ascend (New)	NVIDIA H20	Improvement
FP16 Compute	~580 TFLOPS	~200 TFLOPS	2.87x
FP4 Inference	✅ Supported	❌ Not supported	New capability
Memory Bandwidth	Undisclosed	4.0 TB/s	TBD
TDP	Undisclosed	700W	TBD
Process	SMIC 7nm (est.)	TSMC 4N	-
Available in China	✅ Domestic	✅ (restricted)	Supply chain secure

The FP4 Breakthrough

FP4 inference is the key innovation that makes this chip strategically important:

4x memory reduction: Models like LLaMA 70B that require 140GB in FP16 can run in ~35GB with FP4
Practical implication: AI inference that previously required 8 H20 cards can potentially run on 2 Ascend chips
NVIDIA parity: NVIDIA's Blackwell is the only other architecture shipping with FP4 support
Ecosystem enablement: Huawei's CANN framework now supports FP4 quantization pipelines

Market Impact

The release has immediate implications for the Chinese AI landscape:

Cloud providers: Alibaba Cloud, Huawei Cloud, and Baidu AI Cloud can offer more competitive inference pricing
Model developers: Access to FP4 inference enables deployment of larger models at lower cost
Enterprise AI: Companies building private LLM deployments can use domestic chips instead of navigating export restrictions
Academic research: Universities get access to modern inference capabilities without NVIDIA dependency

The Bigger Picture: China's Semiconductor Ambition

This chip release is part of a broader strategy:

SMIC advancement: Despite EUV lithography restrictions, SMIC has improved its DUV multi-patterning to achieve 7nm-class yields
Ecosystem building: Huawei's CANN + MindSpore stack provides a complete development environment
Market timing: The chip arrives as Chinese AI companies face increasing pressure to reduce NVIDIA dependency
International signal: Demonstrates that export controls have unintended consequences — they accelerate domestic innovation

Limitations to Consider

Despite the impressive headline numbers:

Software maturity: CANN lags significantly behind CUDA in library support and community
Manufacturing volume: SMIC's production capacity is a fraction of TSMC's
Gaming/AI training: FP4 is inference-only; training still requires higher precision
International availability: Export control restrictions may apply to this chip in some markets

Source: Wall Street Journal Hot Topics

↗ Original source

Comments0

Huawei's Ascend Challenge: New AI Chip 2.87x Faster Than H20, First Chinese FP4 Inference