AMD CEO: The Next Challenge is Energy Efficiency

“Over the next decade, we must think of energy efficiency as the most important challenge,” Lisa Su, CEO of AMD told engineers at the 2023 IEEE International Solid State Circuits Conference (ISSCC) in San Francisco.

Despite a slow-down of Moore’s Law, other factors have pushed mainstream computing capabilities to double about every two-and-a-half years. For supercomputers, the doubling is happening even faster. However, Su points out, the energy efficiency of computing has not been keeping pace, pointing to future supercomputers requiring as much as 500 megawatts a decade from now.

“That’s probably too much,” she deadpanned. “It’s on the order of what a nuclear power plant would be.” Nobody really knows how to achieve the next thousand-fold increase in supercomputer capability—zettascale supercomputers, Su said. But it will surely require improvements in system-level efficiency, meaning not just energy efficient computing on chips, but also efficient interchip communication and low-power memory access.

On the compute side, Su pointed to improvements in processor architecture, advanced packaging, and—despite the well-known slow-down—better silicon technology. That combination could more than double the industry’s historic performance-per-watt-increase rate.As an example, Su compared the MI250X accelerator GPU, which is behind four of the five most efficient supercomputers, to its predecessor the MI100. The newer chip offers 4.2 times the performance with 2.2 times the efficiency. Of that, chiplet design and integration accounted for nearly half the performance increase and about 30 percent of the efficiency gain.

“Probably the largest lever we’ve had recently has been the use of advanced packaging and chiplets,” she said. “It allows us to bring the components of compute together much more closely than ever before.” 3D interconnects in chiplet-based systems can sling about 50 times the bits per joule of energy as can the copper connections on the motherboard, according to AMD. Using a technology called 3D V-cache, AMD compute chiplets now can have additional SRAM stacked atop them to expand the size of their cache.

Another energy-saving factor Su pointed to was what’s called domain specific computation, which Su describes as using “the right math for the right operations.” Because 8-bit floating point operations are about 30-times as energy efficient as 64-bit ones, makers of GPUs and other AI accelerator chips have been seeking ways to use such lower precision operations wherever they can. Domain-specific architecture accounted for about 40 percent of the MI250X’s performance and efficiency improvements.

AMD is hoping to get another 8-fold performance improvement and 5-fold efficiency gain from its next generation, the MI300.

But processor innovation in itself won’t be enough to get to zettascale supercomputing, Su said. Because AI performance and efficiency improvements are outstripping gains in the kind of high-precision math that’s dominated supercomputer physics work, the field should turn to hybrid algorithms that can leverage AI’s efficiency. For example, AI algorithms could get close to a solution quickly and efficiently, and then the gap between the AI answer and the true solution can be filled by high-precision computing.

Source: IEEE Semiconductors