Kit Wai Chan - Fotolia

HPE gets trove of U.S. sensor data to aid exascale computing

HPE will be working with the National Renewable Energy Lab to develop AIOps aimed at improving exascale computing efficiency. The lab has a trove of data that may help.

As its name implies, the National Renewable Energy Laboratory in Golden, Colo., is concerned about energy efficiency. It runs two supercomputers in data centers that are far more efficient than the average corporate data center. And it has collected years of sensor data that Hewlett Packard Enterprise believes will advance exascale computing efficiency.

HPE has reached an agreement with the U.S. lab to share supercomputing data and ideas. The goal is to learn how to build highly efficient data center environments for exascale computing.

In pursuit of energy efficiency, the National Renewable Energy Laboratory (NREL) put sensors throughout its supercomputers and data centers. This data provides intelligence on energy usage. The lab operates two supercomputers, including an eight petaflop HPE system named Eagle -- that's eight million billion calculations per second.

"They've got more sensors in that data center than you would imagine -- and we thought, this is perfect," said Mike Vildibill, vice president of the advanced technologies group at HPE.

The lab 'instrumented everything'

NREL has collected more than five years of sensor data from two supercomputers, totaling about 16 terabytes. "In order to achieve efficiency, you have to measure efficiency, and so they've instrumented everything," Vildibill said.

"That is exactly the type of data we want to feed into a machine learning framework to train an AI," Vildibill said.

They've got more sensors in that data center than you would imagine -- and we thought, this is perfect.
Mike VildibillVice president of advanced technologies group, HPE

HPE recently completed its acquisition of Cray Inc. for $1.3 billion. Cray is working on several planned exascale computers for the government, including a 1.5 exaflop system, Frontier, for the U.S. Dept. of Energy. An exaflop is one million trillion or one quintillion calculations per second.

A problem with exascale computing is its power needs. Frontier's electricity requirement is about 30 MW. This is roughly double the power consumption of the most powerful supercomputer in the U.S., the Summit system, a 200 petaflop IBM system at Oak Ridge National Laboratory in Oak Ridge, Tenn. One thousand petaflops equals one exaflop. The power demands of exascale computing systems have made energy efficiency an important priority.

Corporate data centers don't measure up

NREL's data center energy efficiency is better than the average corporate data center.

NREL reports its annualized Power Usage Effectiveness (PUE) rating at about 1.04. (Its dashboard reported 1.03 Thursday.) PUE is a measure of all of the power used by a data center. It divides the power the data center needs by the power used to run all the equipment, servers, chillers, networks -- all the infrastructure. 

NREL's PUE is in sharp contrast to data centers generally, which have an average annual PUE of 1.67, according to a survey this year by advisory organization UpTime Institute LLC.

HPE's goal is to develop AI algorithms specific to IT operations, also known as AIOps, that can prevent and detect anomalies and faults and keep the system more resilient. But "the true vision is to apply these AI algorithms to run a system more efficiently," Vildibill said. That includes finding ways to use less electricity, which can involve optimally laying out system workloads, he said.

HPE isn't making projections on what could be possible in terms of efficiency for AIOps in exascale computing.

Padraig Byrne, a Gartner analyst, said AIOps is used today to primarily monitor data center operations and identify the root cause of a problem. In terms of maturity, the technology is in its "relatively early days."

The goal is to get to fully automated systems, where autonomous action -- auto remediation -- will fix systems, Byrne said. But at the moment, the goal is to identify the source of the problem, he said.

To effectively monitor systems, Byrne said the sensors or instrumentation has to be built into the systems, and not added on later. Vendors are now doing that in a wide range of products. "This is a trend that we are seeing beyond the world of IT," he said.

Dig Deeper on Data center ops, monitoring and management

Cloud Computing
and ESG