Crisis Ahead: Power Consumption in AI Data Centers: Four Areas Chips Can Help

3 months ago 1

AI data centers are consuming energy at roughly four times the rate that more electricity is being added to grids, setting the stage for fundamental shifts in where power is generated, where AI data centers are built, and much more efficient system, chip, and software architectures.

The numbers are particularly striking for the United States and China, which are in a race to ramp up AI data centers. A 2024 report commissioned by the U.S. Department of Energy showed that last year, U.S. data centers consumed about 4.4% of the total electricity generated, or roughly 176 terawatt hours. That number is expected to increase to between 325 and 580 TWh by 2028, which is 6.7% to 12% of all the electricity generated in the U.S., respectively.


Fig. 1: Total electricity generation vs. consumption due to server storage, network equipment and infrastructure from 2014 through 2028 (estimated). Source: Lawrence Berkeley National Laboratory report[1]

China, meanwhile is expected to reach 400 TWh by next year, and while those numbers may look comparable to consumption in the U.S., the International Energy Agency noted[2] that China’s citizens consume significantly less energy than their U.S. counterparts. On a global scale, the rate of consumption is increasing 30% per year due primarily to AI, with the United States and China accounting for about 80% of that increase.


Fig. 2: Where the electricity is being consumed. Source: IEA

“Power is not a joke anymore,” said Jean-Marie Brunet, vice president and general manager for hardware-assisted verification at Siemens EDA. “Imagine if data center power consumption in 2028 is 12% of the entire power consumption of the United States. That’s insane. We’d have to redo the entire grid.”

Others agree. “Power generation is going to be a big deal,” noted Jensen Huang, president and CEO of NVIDIA, during a recent CDNLive discussion with Cadence CEO Anirudh Devgan. “The reason for that is power grids aren’t going to be enough to sustain the growth of this industry. We’d like to build this industry on-shore, and if you want to do that, then we’re going to see a lot of diesel power generators and all kinds of stuff.”

So what can be done about it? There are four main target areas, each of which directly involves the semiconductor industry:

  • Reducing transmission distances and the number of step-down voltages;
  • Limiting data movement whenever possible;
  • More efficient processing, and
  • Better cooling closer to processing elements or inside of packages.

Distance and step-down losses
As with data, there is a cost for moving electricity. An average of 5% of electricity is lost during transmission and distribution, according to the U.S. Energy Information Administration. What’s counterintuitive is that high-voltage lines that run up to hundreds of miles have a lower loss (about 2%) than lower-voltage lines running over shorter distances (about 4%). Those numbers are compounded by the source, as well, because different sources have different conversion rates (see figure 3, below.)


Fig. 3: U.S. electricity flow in quadrillion BTU. Source: U.S. Energy Information Administration, April 2025

“Ideally you keep your voltage as high as possible, which means the current as low as possible,” said Eelco Bergman, chief business officer at Saras Micro Devices. “The losses are the square of the current times the resistance. So you’re losing power the whole way. Whatever the high-tension wires are, you keep stepping that down. That may be 400 volts coming into the data center, and that gets converted to 48 volts for the rack, and then eventually stepped down to 12 volts to the point of load. But at every step of the way, you want to generate your power next to the data center to reduce the distance and keep the voltage as high as possible, and bring the voltage close to your endpoint. “

The tradeoff here is voltage vs. current. The higher the current, the greater the heat. And nothing is 100% efficient, so as power is moved closer to the package, some heat is generated. That, in turn, is compounded by everything happening inside that package, including processing data, moving it back and forth to memory, and the resistance/capacitance in the interconnects. On top of that, AI data centers have more data to process, so those workloads require higher utilization rates, which make it more difficult to keep up with the amount of heat that needs to be dissipated.

So from the high-voltage lines to the lower-voltage line, then to the PCB, the package, and finally the individual die, power is lost at each step along the way, Bergman said. “How do you reduce the distances? How much voltage can I get as close as possible? What’s the efficiency? Am I able to dissipate the heat? These are things the industry is looking at.”

The chip industry has a big role to play here. “We have too many steps due to existing infrastructure, where we had so many intermediate voltage levels,” said Andy Heinig, head of the Department for Efficient Electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “For sure, we can save a lot of energy here. What we also see is a need for the processors and power regulators to work together. Currently, power regulators are not intelligent. They only follow the current coming from the processor. But the processor knows what they have to do in the next cycle, and they can inform the power converters that a huge jump is coming or that something is switched off. So there are certain points where we can co-optimize the processor plus the voltage regulator, and reducing the number of intermediate voltage levels will help.”

Moving data
Another challenge is to architect systems so that data is processed closer to the source. That reduces the amount of data that needs to be moved. But alongside of that, the distances that data must travel need to be reduced, as well. This is one of the key drivers behind 3D-IC packaging. Instead of running wires across an SoC, components can be strategically placed vertically to reduce those distances. That improves performance, as well as the amount of power needed to drive signals.

“One of the biggest challenges for our customers at the moment is wire length in the designs,” said Andy Nightingale, vice president of product management and marketing at Arteris. “Multi-die is a separate challenge, but on each monolithic die before it goes through to multi-die, getting the wire length down is critical to power. One aspect we look at is congestion. We’ve got heat maps in our design analysis, as well, that look at congestion, because that’s a key point where a number of wires meet together at a switch. We work within the floor plan, as well, where we visualize the physical design so we can move switches out of a congestion point and still work within the floor plan to reduce heat dissipation in one area, and power congestion, as well.”

This also will require a mindset shift, because power still takes a backseat to performance in AI data centers. But if no more electricity is available, or if the price of electricity skyrockets, then AI companies will have no choice but to get serious about power.

“The emphasis on AI design these days is still very much performance, which means that while power is really, really important, it is still a secondary concern to getting the best speed and performance out of these chips,” said Marc Swinnen, director of product marketing at Ansys, now part of Synopsys. “There’s always a power/performance tradeoff, and that’s fundamental. So if you really want to lower the power, you’d have to lower the performance. Going down Moore’s Law helps. That lowers the power. The other thing is that most of the power is in the communication between the GPUs and the different elements, but even the backplane in the data center. NVIDIA came out with co-packaged optics networking just to reduce the power in communication within the rack and between racks.”

Solving these issues requires changes across the entire chip industry. “It starts on the chip, and if the chip is very power-hungry and you want to build an LLM, then you’ve got to train this thing,” said Siemens’ Brunet. “You train it by adding multiple functions and scaling. But if you add things up, starting with a single element that is very power-hungry, then the entire system is extremely power-hungry. You’ve also got a digital twin, and you need a tremendous amount of power to compute that digital twin, as well. But this isn’t just a problem for the EDA industry. It’s a problem for the whole world.”

More efficient processing
The good news is there is some obvious low-hanging fruit. “There is a 20% power tax simply because of a lack of visibility,” said Mo Faisal, president and CEO of Movellus. “For example, say you design a chip for 500 watts at 2 GHz. By the time you’re done with a system-level test and you’re ready for deployment, you find that all of those power systems were constructed with different goals. So now, if you want to stay within 500 watts, you have to crank the frequency down by 10% to 20%. It’s basically a throttle. And this only gets worse with chiplets, because now you have the same thing, but you multiply it by the number of chiplets that you’re dealing with. Each chiplet could have a different process corner, or it could be in a different process.”

That’s part of the picture. “There’s an additional 20% to 30% gain to be had by installing more visibility from the chip to the system and all the way to the data center,” Faisal said. “And both of these compound, so it’s not one or the other. The reason is that the chip designer is concerned about risk. ‘Hey, I don’t want the chip to ever fail. So they’re going to over-margin with redundancy. But in deployment, when you’re designing the data center, you’re not designing it for the maximum workload. You’re designing for the peak workload. And the reason for that is the workloads and software are changing much faster than the chips. It’s impossible to test the chip with all the combinations of workloads that you will see in the field, because the workloads and the models and the transformers and agents are all changing so fast. So you have to margin that in. The data center capacity is 30% over-provisioned compared to what you will see in the max load.”

Understanding how semiconductors will be used is critical to this equation, as well. Just adding redundancy and guard-banding everything for worst-case corner cases increases the power needed to drive signals through extra circuitry and the amount of heat that needs to be dissipated due to resistance/capacitance in the wires.

“Normally, you plan for worst-case scenarios,” said Noam Brousard, vice president of solutions engineering at proteanTecs. “So over time, you plan for the worst thing that can happen at a specific temperature or variable. And if you can meet that spec, then everything is okay. But now we’re running higher workloads that continue to increase with over time, and we’re expecting chips to last longer. With artificial general intelligence (AGI) algorithms, chips are processing a tremendous amount of compute. For a chip in a data center that now is expected to last eight years, instead of the historical four to six years, it will have to be able to handle that increase in workload. But planning for what will happen eight years from now is just not feasible.”

That planning becomes even more difficult in the context of accelerated aging and added stresses — thermal, mechanical, huge workloads running at full bore for longer periods of time.

“It’s very hard to use one model that factors all of these variables, and if you do, you’re going to be highly over-provisioning,” Brousard said. “The only way to keep power consumption in check, extend product lifetime, and maintain reliability is to dynamically adapt to the ever-changing individual power needs of the device over time.”

Moore’s Law plays a role here, too. While the improvements in processor performance at each new process node are diminishing, the improvements in power consumption are projected to be substantial — as much as 30%, depending on the process and the foundry. As with any new process, those numbers can vary significantly by architecture and workload.

Cooling
The rule of thumb is that data centers pay for power twice. The first time is to power the racks of servers and storage. The second time is to cool them so they don’t overheat, and that’s becoming a bigger problem because the dynamic current density is increasing along with the utilization of AI servers. It takes more processing to train large (and even small) language models, and to power generative and agentic AI searches. That, in turn, increases the utilization of various compute elements, so they are running at full bore for longer periods of time.

“The power overhead for cooling is about 30% to 40%, and just going to liquid cooling without chillers you can cut that in half,” said Saras Bergman. “But if you add chillers, it goes right back up. There’s an optimization game that needs to be played here.”

The pivot point in this equation is the availability of water. Running water in a closed system requires cooling. Using the local water supply does not. But according to the Environmental and Energy Study Institute, a large data center can consume up to 5 million gallons per day, which is about what a town with between 10,000 and 50,000 people would consume.

Two alternatives are direct cooling of individual die, and immersion cooling. Direct cooling can involve microfluidics channels, an idea first proposed by IBM back in the 1980s, and later abandoned because it was too difficult. But as thermal density increases, chipmakers may have no choice but to utilize some type of microfluidics. Still, implementing this approach adds structural and manufacturing challenges. The idea is well understood, because water cooling has been in use for more than half a century. But implementing it inside a package or chip, closer to the transistors, remains a challenge.

“If you’re looking at a cooling technology, there’s heat transfer efficiency, which tends to be looked at from a thermal resistance perspective, and junction-to-fluid temperature in a general sense,” explained Rajiv Mongia, senior principal engineer at Intel and leader of the company’s thermal core competency group. “But if you want to look at it from a thermodynamic perspective, it’s not junction-to-fluid inlet temperature. It’s junction-to-fluid exit temperature. Basically, the higher you can make the fluid temperature as you exit the package or the package area, the easier it is for everything downstream to be managed from a heat transfer perspective. That affects the overall efficiency of your cooling plants, your chillers, and all that stuff.”

This is a key consideration in stacking dies. “When we get to 3D-ICs, you might need to get fluid within the structure itself, like silicon micro-channels on the backside of the die,” Mongia said. “It’s a benefit-to-complexity ratio. You could cool it with this type of plate sitting on the backside. But once you have enough volumetric heating within a 3D stack — imagine a cube of some sort — you no longer can conduct the heat out through one side of the silicon. You have to pull the heat from inside the silicon in some way. At the end of the day, some exotic mechanism is going to be required because you’re generating so much power within that volume of silicon as opposed to just on one surface.”

Immersion cooling, in contrast, involves putting an entire server into an inert liquid. The challenge here is the same as with micro-fluidics. The heat needs to be drawn from the inside of the rack out, and dissipating the thermal load inside a package to an external cooling bath is more complicated than it might sound. It requires an understanding of where to place components in a package, potentially thermal interface materials, and heat channels from digital logic to the outside of the package.

It’s also possible that both of these approaches can be used together to sharply reduce heat, which could allow for even transistor density and an even larger demand for electricity.

Money and resources
None of this is lost on the chip industry. To move forward and continue growing at least as fast as today, two related issues need to be addressed — sustainability and cost. Those ultimately will determine the speed of deployment of AI data centers, the amount of processing they can handle, and what changes are needed from a transmission standpoint and from a chip/system/package design perspective.

“Sustainability comes into the equation all the time because there’s been decades worth of pressure on corporations to use our natural resources better,” said Mike Ellow, CEO of Siemens Digital Industries Software. “That’s where we’re headed, with semiconductors as the backbone that’s going to help a lot of industries. If you look at power consumption from data centers, where we’re headed is unsustainable. For us, the challenge is how to put four, five, or six times the compute power into the same power profile that already exists for that data center.”

Business basics figure into this picture, as well. “At the end of the day, it’s total cost of ownership,” said Intel’s Mongia. “Whether it’s a large language model that you’re creating or an inference you’re trying to generate, there’s a capital cost and an operating cost to do that. Thermal comes into the capital cost as well as the operating cost. So what is the balance? What is that ROI? What does it cost to upgrade to a liquid cooling solution, because those historically are more expensive than air cooling. All these AI data centers or AI solutions are predominantly liquid-cooled. For us to build it, you need to get more out of your package, meaning more inferences or more performance on generating your language model, and therefore reduce the operating cost over time.”

Conclusion
To put this in perspective, consider that the Hoover Dam in Nevada generates about 4 TWh per year; the Palo Verde nuclear power plant in Arizona generates 32 TWh per year, and the Three Gorges Dam in China is expected to generate 90 TWh per year. But between 2028 and 2030, given the current rate of growth, AI data center power demands will increase by 350 TWh, which is nearly three times as much energy as all of those generating facilities combined.[2]

No single change will shrink that gap. For the semiconductor industry to continue growing at the current pace, it will require changes from the grid on down, and from the chip up. And even then, it’s not clear if that will really close the gap, or whether it will simply enable AI data centers to grow even larger.

References

  1. 2024 United States Data Center Energy Usage Report, https://doi.org/10.71468/P1WC7Q.
  2. IEA, “Energy Demand From AI,” https://iea.blob.core.windows.net/assets/601eaec9-ba91-4623-819b-4ded331ec9e8/EnergyandAI.pdf
  3. Presentation from Intel Foundry.

Read Entire Article