In 2008, the US Defense Advanced Research Projects Agency (DARPA) set an ambitious goal: an exascale supercomputer in a 20 megawatt envelope. That goal, once viewed with skepticism by many — at the time research predicted that exascale systems would require hundreds of megawatts — has now been officially achieved by the exascale Frontier supercomputer at Oak Ridge National Laboratory (ORNL). At ISC 2022, the organizers of the Green500 list — which ranks supercomputers by their flops-per-watt efficiency — discussed this development and more.
A new frontier for efficiency
The “June” (late May, actually) Green500 list was led by ORNL. First up: Frontier’s testing and development system, Frontier TDS, although we prefer its less official name, ‘Borg’. Borg (which is basically just a single cabinet with the same design as Frontier’s 74 main cabinets) delivered 62.68 gigaflops per watt for a total of 19.20 Linpack petaflops. “If you naively extrapolate this to an exaflop, it comes out to about 16 megawatts,” said Wu Feng, custodian of the Green500 list and an associate professor at Virginia Tech, during his virtual appearance at ISC 2022. completion of computing efficiency, overshadowing the previous Green500 champion— MN-3 from Preferred Networks — by nearly 60 percent.
But perhaps more impressively, Frontier itself came in second with 52.23 gigaflops per watt. “Frontier on the Green500 is the highest-ranking Top500 supercomputer on the Green500 list ever,” says Feng. According to the Green500 listing, Frontier delivered 1,102 Linpack exaflops in a 21.1 megawatt envelope, which interpolates to one 19.15 megawatt exaflop. However, Al Geist — CTO of the Oak Ridge Leadership Computing Facility (OLCF) — revealed during the session that this was a “very conservative number” and that the average power consumption Oak Ridge submitted to the Green500 was, in fact, 20.2 megawatts. That works out to 54.5 gigaflops per watt and interpolates to an exaflop at 18.33 megawatts. This measurement makes Frontier more than 3.5 more efficient than the previous Top500 topper, Riken’s Fugaku system (15.42 gigaflops per watt).
This efficiency, Geist explains, speaks to a long legacy at ORNL. “Oak Ridge has been working on energy-efficient computing for about a decade,” he said, charting how this decade-long effort had paid off by using GPUs in the Titan system in 2012 via Frontier Today. “Exascale has really been made possible by this kind of 200× improvement in energy-efficient computing.” Geist has further credited AMD’s work to make its CPUs and GPUs more efficient, for example allowing the chips to cut out unused resources at a very granular level. He also mentioned the list itself: “I think the Green500 has done a remarkable job of raising awareness of energy efficiency and its importance throughout the community.”
Frontier’s shadow extends even beyond the top two systems on the Green500. Frontier – and by extension Borg – are HPE Cray EX systems with AMD Milan “Trento” Epyc CPUs, AMD Instinct MI250X GPUs, and HPE Slingshot-11 networks. The exact same architecture also features in the third-place system, the 151.90 Linpack petaflops LUMI supercomputer in Finland (51.63 gigaflops per watt, third in the Top500). It also appears in fourth place, the 46.10 Linpack petaflops Adastra system in France (50.03 gigaflops per watt, tenth in the Top500). “All four of these systems all use the same technology that was actually developed for Frontier,” Geist said. Both LUMI and Adastra also extrapolate to an exaflop below 20 megawatts.
All of the top ten systems have been accelerated: four with the aforementioned AMD MI250X GPUs, five with Nvidia’s A100 GPUs, and one in between in fifth place using the Preferred Networks MN-Core accelerator. Further, Feng said, it was the first time that all of the top ten machines from the previous list remained on the list — and not just on the list, but in the top 20. Those four Frontier-type systems, however, shot past the rest. from the pack on the list: the average power efficiency of the top ten systems is extrapolated to exascale at about 40 megawatts, showing the gap between the Frontier architecture and the competition. As can be seen in the box-and-whisker plot below, the remaining systems on the Green500 list showed modest improvements in efficiency compared to the November list.
There was another encouraging trend on the new list. The Green500 uses three levels of efficiency reporting, with a level one measure representing the entire system over a full run, a level three measure representing a smaller fraction of the system over the core phase of a run, and a level two measure somewhere in-between. “The total number of Level 2 and Level 3 submissions continues to grow from Level 1, so that’s really great,” said Natalie Bates, Chair of the Energy Efficient HPC Working Group (EEHPCWG), during the Green500 session. This Green500 list contained 102 measured entries: 57 at level one, 31 at level two, and 14 at level three.
Higher stakes, new strategies
The Green500 list, established 16 years ago, aims to “raise awareness (and encourage reporting) of the energy efficiency of supercomputers” and “promote energy efficiency as a first-order design constraint (similar to performance ).” But when the Green500 list was created, supercomputers were rated in single-digit kilowatts; now systems like Frontier are pulling megawatts down in double digits. ORNL director Thomas Zacharia said in a press conference that “when you… [Linpack] run [on Frontier]the machine starts drawing an additional 15 megawatts of power in less than ten seconds…that’s a small town in the US, that’s about how much power the city of Oak Ridge uses.”
The sheer scale of systems like Frontier has increased the urgency not only of the power consumption of the systems themselves, but also the efficiency of their supporting infrastructure and the sourcing of the power itself. Indeed, DARPA’s target of 20 megawatts for exascale was based on cost, as Geist told during ORNL’s Advanced Technologies Section webinar last year: “The song that came back from the head of [the] Office of Science at the time said they weren’t willing to pay more than $100 million over the five years, so it’s simple math [based on an average cost of $1 million per megawatt per year]† The 20 megawatts had nothing to do with what could be possible, it was just that pole we drove into the ground.”
In last week’s Green500 session, Geist explained that Oak Ridge was committed to “not only reducing the amount of energy it takes to run the computer, but also reducing the amount of energy it takes to run the data center.” cool down again.” As a result, the Frontier data center achieves a power consumption effectiveness (PUE) of only 1.03. “A lot of work has gone into making this machine and the data center itself as efficient as possible,” says Geist.
EuroHPCs previously mentioned LUMI system, meanwhile, is housed in a new data center designed with energy efficiency and sustainability in mind (pictured above). Located in an old paper mill in Kajaani, Finland, LUMI – which currently requires less than 10 megawatts to operate – is powered by 100 percent renewable energy (local hydropower) and is designed to sell its waste heat back to the city of Kajaani, further reducing energy costs and resulting in a net negative carbon footprint. Due to its location in the north of Finland, there is of course less need for artificial cooling. During a session on EuroHPC at ISC 2022, Anders Jensen, Executive Director of the EuroHPC JU, emphasized the importance of these holistic energy narratives for European supercomputers. †[The] Green500 is great,” he said, “but it doesn’t take into account where the energy came from.”