AI Supercomputers

4 months ago 7

What is an AI supercomputer?

A supercomputer is a system of many processors (such as CPUs and GPUs) that can efficiently work together and achieve a high level of performance. When supercomputers use specialized AI chips and support large-scale AI training and deployment workloads, we refer to them as AI supercomputers. They are sometimes also referred to as “GPU clusters” or “AI datacenters”.

AI supercomputers are used for training or serving neural network models. Therefore, they typically support number formats favorable for AI training and inference, such as FP16 or INT8, contain compute units optimized for matrix multiplication, have high-bandwidth memory, and rely on AI accelerators rather than CPUs for most of their calculations. A more detailed definition can be found in our documentation and in section 2 of our paper.

How do you measure performance for AI supercomputers?

We provide the total computational rate of the hardware in the computing cluster, which is the performance of each ML hardware chip times the number of chips.

Non-AI supercomputers are often evaluated using the LINPACK benchmark, which measures the speed of serial computations in FP64 arithmetic format. However, AI workloads use parallelized computations and smaller number representations, so benchmarks like LINPACK are not an ideal measure of their performance. Furthermore, most organizations do not measure their performance on LINPACK or other benchmarks such as MLPerf, so we do not record these.

Which types of organizations own AI supercomputers?

Historically, government and academic research organizations such as Oak Ridge National Laboratory and Sunway have owned many of the top supercomputers. In recent years, most AI supercomputers are owned by cloud computing providers such as AWS, Google, and Microsoft Azure, or AI laboratories such as Meta and xAI.

How was the AI supercomputers dataset created?

The data was primarily collected from machine learning papers, publicly available news articles, press releases, and existing lists of supercomputers.

We created a list of potential supercomputers by using the Google Search API to search key terms like “AI supercomputer” and “GPU cluster” from 2019 to 2025, then used GPT-4o to extract any supercomputers mentioned in the resulting articles. We also added supercomputers from publicly available lists such as Top500 and MLPerf, and GPU rental marketplaces. For each potential supercomputer, we manually searched for public information such as number and type of chips used, when it was first operational, reported performance, owner, and location. A detailed description of our methods can be found in the documentation and Appendix A of our paper.

How do you estimate details like performance, cost, or power usage?

Performance is sometimes reported by the owner of the cluster, or in news reports. Otherwise, it is calculated based on the performance per chip of the hardware used in the cluster, times the number of chips.

Costs are sometimes reported by the owner or sponsor of the cluster. Otherwise, costs are estimated from the cost per chip of the hardware, times the number of chips, multiplied by adjustment factors for intra- and inter-server network hardware.

Power draw is sometimes reported by the owner or operator of the cluster. Otherwise, it is estimated from the power draw per chip, times the number of chips, multiplied by adjustment factors for other hardware and the power usage efficiency of the datacenter.

Detailed methodology definitions can be found in the paper and documentation.

How accurate is the information about each supercomputer?

We strive to accurately convey the reported specifications of each supercomputer. The Status field indicates our assessment of whether the supercomputer is currently operational, not yet operational, or decommissioned. The Certainty field indicates our assessment of the likelihood that the cluster exists in roughly the form specified in the dataset and the linked sources. If you find mistakes or additional information regarding any supercomputers in the dataset, please email [email protected].

How is the dataset licensed?

We have released a public dataset with a CC-BY license. This public dataset includes all of our data on supercomputers outside of China and Hong Kong, along with anonymized data on supercomputers within China, with values rounded to one significant figure and names and links removed. This dataset is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Data on Chinese supercomputers is stored privately to protect the data sources. For inquiries about this data, please contact Robi Rahman at [email protected].

How up-to-date is the data?

Although we strive to maintain an up-to-date database, new supercomputers are constantly under construction, so there will inevitably be some that have not yet been added. Generally, major supercomputers should be added within one month of their announcement, and others are added periodically during reviews. If you notice a missing supercomputer, you can notify us at [email protected].

Who can I contact with questions or comments about the dataset?

For general inquiries about the project and the paper, please reach out to Konstantin at [email protected]. For inquiries about the dataset, please contact Robi at [email protected].

Read Entire Article