The AMD-powered Frontier supercomputer is now the first officially recognized exascale supercomputer in the world, topping 1.102 ExaFlop/s during a sustained Linpack run. That ranks first on the newly-released Top500 list of the world’s fastest supercomputers as the number of AMD-powered systems on the list has expanded significantly this year. Frontier not only overtakes the previous leader, Japan’s Fugaku, but blows it out of the water — in fact, Frontier is faster than the next seven supercomputers on the list, combined. Notably, while Frontier hit 1.1 ExaFlops during a sustained Linpack FP64 benchmark, the system delivers up to 1.69 ExaFlops in peak performance but has headroom to hit 2 ExaFlops after more tuning. For reference, one ExaFlop equals one quintillion floating point operations per second.
Frontier also now ranks as the fastest AI system on the planet, dishing out 6.88 ExaFlops of mixed-precision performance in the HPL-AI benchmark. That equates to 68 million instructions per second for each of the 86 billion neurons in the brain, highlighting the sheer computational horsepower. It appears this system will compete for the AI leadership position with newly-announced AI-focused supercomputers powered by Nvidia’s Arm-based Grace CPU Superchips.
Additionally, the Frontier Test and Development (Crusher) system also placed first on the Green500, denoting that Frontier’s architecture is now also the most power-efficient supercomputing architecture in the world (the primary Frontier system ranks second on the Top500). The full system delivered 52.23 GFlops per watt while consuming 21.1 MW (megawatts) of power during the qualifying benchmark run. At peak utilization, Frontier consumes 29 MW.
The Frontier supercomputer’s sheer scale is breathtaking, but is just one of many accomplishments for AMD in this year’s Top500 list — AMD EPYC-powered systems now comprise five of the top ten supercomputers in the world, and ten of the top twenty. In fact, AMD’s EPYC is now in 94 of the Top500 supercomputers in the world, marking a steady increase over the 73 systems listed in November 2021, and the 49 listed in June 2021. AMD also appears in more than half of the new systems on the list this year. As you can see in the above album, Intel CPUs still populate most systems on the Top500, while Nvidia GPUs also continue as the dominant accelerator.
However, in terms of power efficiency, AMD reigns supreme in the latest Green500 list — the company powers the four most efficient systems in the world, and also has eight of the top ten and 17 of the top 20 spots.
The Frontier supercomputer is built by HPE and is installed at the Department of Energy’s (DOE) Oak Ridge National Laboratory (ORNL) in Tennessee. The system features 9,408 compute nodes, each with one 64-core AMD “Trento” CPU paired with 512 GB of DDR4 memory and four AMD Radeon Instinct MI250X GPUs. Those nodes are spread out among 74 HPE Cray EX cabinets, each weighing 8,000 pounds. All told, the system has 602,112 CPU cores tied to 4.6 petabytes of DDR4 memory.
Additionally, the 37,888 AMD MI250X GPUs feature 8,138,240 cores and have 4.6 petabytes of HBM memory (128GB per GPU). The CPUs and GPUs are tied together using the Ethernet-based HPE Cray Slingshot-11 networking fabric. The entire system uses direct watercooling to rein in heat, with 6,000 gallons of water moved through the system by 350-horsepower pumps — these pumps could fill an Olympic-sized swimming pool in 30 minutes. The water in the system runs at a balmy 85 degrees, which helps power efficiency as the system doesn’t use chillers to reduce the water temperature.
The entire system is connected to an insanely performant storage subsystem with 700 petabytes of capacity, 75 TB/s of throughput, and 15 billion IOPS of performance. A metadata tier is spread out over 480 NVMe SSDs that provide 10PB of the overall capacity, while 5,400 NVMe SSDs provide 11.5PB of capacity for the primary high-speed storage tier. Meanwhile, 47,700 PMR hard drives provide 679PB of capacity.
Assembling Frontier was a challenge unto itself, as ORNL had to source 60 million parts with 685 different part numbers to build the system. The chip shortage hit during construction, impacting 167 of those part numbers, so ORNL found itself short two million parts. AMD also ran into issues as 15 part numbers for its MI200 GPUs encountered shortages. To help circumvent the shortages, ORNL worked with the ASCR to get Defense Priorities and Allocation System (DPAS) ratings for those parts, meaning the US government invoked the Defense Act to procure the parts due to Frontier’s importance to national defense.
Even though the system currently peaks at 29 MW of power, Frontier’s mechanical plant can cool up to 40 MW of computational power, or the equivalent of 30,000 US homes. The plant can be expanded up to 70 MW, leaving room for future growth.
While Frontier gets the nod for the first officially-recognized Exascale supercomputer in the world, China is largely thought to have two Exacscale supercomputers, the Tianhe-3 and OceanLight, that broke the barrier a year ago. Unfortunately, those systems haven’t been submitted to the Top500 committee due to political tensions between the US and China. However, the lack of official submissions to the Top500 — a Gordon Bell submission was tendered as a proxy — has led to some doubt that these are true exascale systems, at least as measured with an FP64 workload.
For now, Frontier is officially the fastest supercomputer in the world and the first to officially break the exascale barrier. The nearly-mythical, oft-delayed Intel-powered Aurora is expected to come online later this year, or early next year, with up to 2 ExaFlops of performance, rivaling Frontier for the top spot in the supercomputing rankings.
Up next for AMD? El Capitan, a 2+ ExaFlop machine last known to be coming online in 2023. Upon completion, this Zen 4-powered supercomputer will vie with the Intel-powered Aurora for the title of the fastest supercomputer in the Top500.