Huawei's Ascend Challenge: New AI Chip 2.87x Faster Than H20, First Chinese FP4 Inference
Available in: 中文
Huawei's new Ascend AI chip delivers 2.87x H20 performance with first domestic FP4 inference support, enabling larger LLM deployment at lower cost and reducing China's dependence on NVIDIA.
Huawei's Ascend Challenge: New AI Chip 2.87x Faster Than H20, First Chinese FP4 Inference
Huawei has released its next-generation Ascend AI processor, delivering 2.87 times the computational performance of NVIDIA's H20 — the most powerful NVIDIA chip available under US export controls. Most notably, it's the first Chinese-designed chip to support FP4 (4-bit floating point) inference, closing a critical capability gap with NVIDIA's latest Blackwell architecture.
Performance Breakdown
| Metric | Huawei Ascend (New) | NVIDIA H20 | Improvement |
|---|---|---|---|
| FP16 Compute | ~580 TFLOPS | ~200 TFLOPS | 2.87x |
| FP4 Inference | ✅ Supported | ❌ Not supported | New capability |
| Memory Bandwidth | Undisclosed | 4.0 TB/s | TBD |
| TDP | Undisclosed | 700W | TBD |
| Process | SMIC 7nm (est.) | TSMC 4N | - |
| Available in China | ✅ Domestic | ✅ (restricted) | Supply chain secure |
The FP4 Breakthrough
FP4 inference is the key innovation that makes this chip strategically important:
- 4x memory reduction: Models like LLaMA 70B that require 140GB in FP16 can run in ~35GB with FP4
- Practical implication: AI inference that previously required 8 H20 cards can potentially run on 2 Ascend chips
- NVIDIA parity: NVIDIA's Blackwell is the only other architecture shipping with FP4 support
- Ecosystem enablement: Huawei's CANN framework now supports FP4 quantization pipelines
Market Impact
The release has immediate implications for the Chinese AI landscape:
- Cloud providers: Alibaba Cloud, Huawei Cloud, and Baidu AI Cloud can offer more competitive inference pricing
- Model developers: Access to FP4 inference enables deployment of larger models at lower cost
- Enterprise AI: Companies building private LLM deployments can use domestic chips instead of navigating export restrictions
- Academic research: Universities get access to modern inference capabilities without NVIDIA dependency
The Bigger Picture: China's Semiconductor Ambition
This chip release is part of a broader strategy:
- SMIC advancement: Despite EUV lithography restrictions, SMIC has improved its DUV multi-patterning to achieve 7nm-class yields
- Ecosystem building: Huawei's CANN + MindSpore stack provides a complete development environment
- Market timing: The chip arrives as Chinese AI companies face increasing pressure to reduce NVIDIA dependency
- International signal: Demonstrates that export controls have unintended consequences — they accelerate domestic innovation
Limitations to Consider
Despite the impressive headline numbers:
- Software maturity: CANN lags significantly behind CUDA in library support and community
- Manufacturing volume: SMIC's production capacity is a fraction of TSMC's
- Gaming/AI training: FP4 is inference-only; training still requires higher precision
- International availability: Export control restrictions may apply to this chip in some markets
Source: Wall Street Journal Hot Topics
← Previous: Musk's Gigafab: The Ambitious Plan to Build 50x Global Chip Capacity for SpaceXNext: n0 Announces noq: QUIC Multipath Implementation in Rust with 40Gbps+ Throughput →
0