Huawei unveils Atlas 950 SuperCluster – promises 1 ZettaFLOPS FP4 performance

2 hours ago 1

(Image credit: Huawei)

Huawei has unveiled its next-generation data-center scale AI solution that can offer 1 FP4 ZettaFLOPS performance for AI inference and 524 FP8 ExaFLOPS for AI training at its Huawei Connect 2025 conference on Thursday. The new SuperCluster 950 system runs hundreds of thousands of the company's Ascend 950DT neural processing units (NPUs) and promises to be one of the most powerful supercomputers for artificial intelligence on the planet. Huawei expects its SuperCluster to compete with Nvidia's Rubin-based systems in late 2026.

Massive performance

The supercomputer purportedly offers up to 524 FP8 ExaFLOPS for AI training and up to 1 FP4 ZettaFLOPS for AI inference (MXFP4 to be more specific), which puts it just behind leading-edge AI supercomputers, such as Oracle's OCI Supercluster running 131,072 B200 GPUs and offering peak performance of up to 2.4 FP4 ZettaFLOPS for inference introduced last year. Keep in mind these figures pertain to peak performance numbers, so it remains to be seen whether they can be achieved in real life.

This SuperCluster is designed to support both RoCE (Remote Direct Memory Access over Converged Ethernet) and Huawei's proprietary UBoE (UnifiedBus over Ethernet) protocols, though it remains to be seen how fast the latter will be adopted. According to Huawei, UBoE offers lower idle-state latency, higher hardware reliability, and requires fewer switches and optical modules than traditional RoCE setups.

Huawei positions its Atlas 950 SuperCluster to support training and inference workloads for AI models with hundreds of billions to tens of trillions of parameters. Huawei believes this platform is well-suited for the next wave of large-scale dense and sparse models, thanks to its combination of compute throughput, interconnect bandwidth, and system stability. Though given its size, it is unclear how many companies will be able to accommodate the system.

Huawei admits that it cannot build processors that would challenge Nvidia's GPUs in terms of performance. Therefore, to achieve 1 ZettaFLOPS with the Atlas 950 SuperCluster, it intends to use a brute force approach, utilizing hundreds of thousands of AI accelerators to compete against Nvidia Rubin-based clusters in 2026–2027.

A common building block of Huawei's Atlas 950 SuperCluster is the Atlas 950 SuperPoD that integrates 8,192 Ascend 950DT chips, representing a 20-fold increase in processing units compared to the Atlas 900 A3 SuperPoD (also known as the CloudMatrix 384) and a massive increase in compute performance — 8 FP8 ExaFLOPS and 16 FP4 ExaFLOPS.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Performance of the Atlas 950 SuperCluster is truly impressive on paper; it is said to be massively higher compared to Nvidia's Vera Rubin NVL144 (1.2 FP8 ExaFLOPS, 3.6 NVFP4 ExaFLOPS), a product that the company compares it to. However, that performance comes at a price, namely size. The Atlas 950 SuperCluster setup includes 160 total cabinets — 128 for computation and 32 for communications — spread across 1,000 square meters, which is about the size of two basketball courts. By contrast, Nvidia's Vera Rubin NVL144 is a rack-scale solution that consists of one compute rack and one cable and switch rack that requires just several square meters of space.

As for Huawei's Atlas 950 SuperCluster — which consists of 64 Atlas 950 SuperPoDs and should measure around 64,000 m2 — its size is comparable to 150 basketball courts, or nine regulation soccer fields. Keep in mind, though, that a real campus would likely require additional space for power rooms, chillers/cooling towers, battery/UPS systems, and support offices, so the total site footprint could be significantly larger than 64,000 m².

The road ahead

One of the things about selling server hardware is that customers always want to know what is next, so in addition to having a good product, it is vital to have a roadmap. So at the Huawei Connect, the company disclosed plans to launch the Atlas 960 SuperCluster alongside the Atlas 960 SuperPoD in the fourth quarter of 2027.

This next-generation system will scale up to more than 1 million Ascend 960 NPUs and will provide 2 FP8 ZettaFLOPS and 4 MXFP4 ZettaFLOPS of performance. It will also support both UBoE and RoCE, with the former expected to deliver improved latency and uptime metrics while continuing to rely on Ethernet.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

Read Entire Article