Huawei's Ascend Challenge: New AI Chip 2.87x Faster Than H20, First Chinese FP4 Inference

Available in: 中文
2026-03-22T11:54:17.000Z·2 min read
Huawei's new Ascend AI chip delivers 2.87x H20 performance with first domestic FP4 inference support, enabling larger LLM deployment at lower cost and reducing China's dependence on NVIDIA.

Huawei's Ascend Challenge: New AI Chip 2.87x Faster Than H20, First Chinese FP4 Inference

Huawei has released its next-generation Ascend AI processor, delivering 2.87 times the computational performance of NVIDIA's H20 — the most powerful NVIDIA chip available under US export controls. Most notably, it's the first Chinese-designed chip to support FP4 (4-bit floating point) inference, closing a critical capability gap with NVIDIA's latest Blackwell architecture.

Performance Breakdown

MetricHuawei Ascend (New)NVIDIA H20Improvement
FP16 Compute~580 TFLOPS~200 TFLOPS2.87x
FP4 Inference✅ Supported❌ Not supportedNew capability
Memory BandwidthUndisclosed4.0 TB/sTBD
TDPUndisclosed700WTBD
ProcessSMIC 7nm (est.)TSMC 4N-
Available in China✅ Domestic✅ (restricted)Supply chain secure

The FP4 Breakthrough

FP4 inference is the key innovation that makes this chip strategically important:

Market Impact

The release has immediate implications for the Chinese AI landscape:

The Bigger Picture: China's Semiconductor Ambition

This chip release is part of a broader strategy:

  1. SMIC advancement: Despite EUV lithography restrictions, SMIC has improved its DUV multi-patterning to achieve 7nm-class yields
  2. Ecosystem building: Huawei's CANN + MindSpore stack provides a complete development environment
  3. Market timing: The chip arrives as Chinese AI companies face increasing pressure to reduce NVIDIA dependency
  4. International signal: Demonstrates that export controls have unintended consequences — they accelerate domestic innovation

Limitations to Consider

Despite the impressive headline numbers:

Source: Wall Street Journal Hot Topics

↗ Original source
← Previous: Musk's Gigafab: The Ambitious Plan to Build 50x Global Chip Capacity for SpaceXNext: n0 Announces noq: QUIC Multipath Implementation in Rust with 40Gbps+ Throughput →
Comments0