Sergey - stock.adobe.com
Researchers recently found that training a single AI model can produce nearly five times the lifetime carbon emissions of the average American car. The findings, published this month by computer scientists at the University of Massachusetts, highlight an issue that gets short shrift in all the hype over AI: The process of training new deep learning models consumes considerable energy.
Although most enterprises are not currently using as much power for AI projects as companies like Google or Facebook, the new data on AI's carbon footprint should prompt IT leaders to consider the limits of deep learning in general for their organizations.
The UMass researchers looked specifically at energy consumption in training neural network models for natural language processing. One of the bigger energy challenges described in the paper relates to the testing of the different kinds of settings for a model -- over hundreds or millions of different combinations. The authors found that the process of training an AI model to work across different domains often uses far more power than training the original model. Moreover, the same explosion in compute and power requirements in AI is also seen in other business exercises like logistics, modeling and simulation.
Another implication of the research on AI's carbon footprint is that many AI applications would be better off starting from scratch, because tuning them to work on other applications uses considerably more power. While a new class of tools for parameter tuning could help reduce energy consumption, some experts question if deep learning makes sense for a lot of common uses cases. Down the road, quantum computing could play an important role in reducing the power burden for these types of problems.
The limits of deep learning
The folks on AI's cutting edge are the ones using the majority of energy. "We basically piggyback off of their energy use and algorithm creation," said Jed Dougherty, lead data scientist at Dataiku, an AI and machine learning platform.
Dougherty notes that the underlying inefficiency of AI model development is caused by the reliance on GPUs. "It's our responsibility to avoid deep learning when we can. Not just because we fundamentally don't know what it's doing, but because it requires massive GPU consumption."
Although deep learning can deliver good results for natural language processing and computer vision, developers working on other kinds of applications should ask why they are involved in deep learning at all, he said. Indeed, many researchers are discovering better approaches using competent, human-designed business rules and algorithms.
Jed DoughertyLead data scientist, Dataiku
Dougherty added that concerns about the significant CO2 output for new deep learning models may be overblown -- and mitigated by the upside of these models. After all, better AI-based models could lead to energy savings in other ways, not to mention more revenue for enterprises. In any case, AI's carbon footprint is not an issue that is raised much, he said. Most of Dataiku's customers are more concerned with financial cost than environmental impact, even though the two are somewhat linked.
The need for carbon monitoring
Stuart Dobbie, product lead at Callsign Ltd., an identity fraud, authorization and authentication company, said AI workloads essentially exacerbate existing bad practices in IT.
"The highlighted costs and energy concerns associated with AI are a product of traditionally inefficient concepts and outlooks in the IT industry," Dobbie said.
He said most companies over-provision infrastructure and underutilize compute. The skewed IT environments are driven most often by IT risk governance policies and/or are dictated by contractual risk policies that call for 50% over-provisioning for disaster recovery and resiliency.
Some of the energy and environmental impacts of these longstanding bad practices could be improved by both cloud providers and the development of new monitoring frameworks, Dobbie said. Cloud computing provisioners, for example, should adopt best-in-practice data center architectural designs to ensure the efficient running of technological infrastructure. This includes areas like data center airflow, efficient cooling, use of renewable energy services and monitoring paramount. Dobbie also believes that cloud providers should be required to publish their energy consumption and carbon footprints, so that those metrics become factors in choosing a provider.
For noncritical AI training processes, the use of spot instances could assist in borrowing underutilized compute at a fraction of the cost from other cloud computing tenants, who, for example, over-provision by policy and end up with unused compute.
Parameter tuning and AI's carbon footprint
One of the biggest challenges in improving the performance of AI models lies in tuning the parameters, or weights of the different artificial neurons used in the neural networks. And this aspect of training is not just limited to AI -- this same type of tuning is required for a lot of business problems, including optimizing business models and simulations, operations research, logistics and programming by example.
Programming by example, or PBE, is a technique for training an AI through examples, such as providing I/O pairs related of how to structure data. All these problems can lead to combinatorial explosion, in which each parameter can increase the number of possible solutions, as data scientists test out various combinations of parameters.
Parameters can have a big impact on the performance of a model, and, as a result, that model's impact on a business, said Scott Clark, co-founder and CEO of SigOpt Inc., which makes software for tuning deep learning parameters. Because parameter tuning often requires evaluating a wide variety of configurations, it can be computationally expensive. To the extent data centers are not run with sustainable energy, AI's carbon footprint will get bigger.
But not all parameter tuning methods are equal. The more exhaustive or naive they are, the more computationally intensive. This type of approach (exhaustive and naïve) randomly tries out all possibilities, which can require significant time to find the right solution. The more intelligent and adaptive the parameter tuning methods are, the less computationally intensive.
Researchers and vendors are designing a new category of AI tools for parameter optimization that rely on Bayesian optimization algorithms instead of naive random search or exhaustive grid search. Bayesian optimization uses a combination of statistics and approximations to search for the best combination of parameters for efficiency. SigOpt research claims they can slash compute time and power consumption by 95% compared with that used by the standard practice of randomly searching configurations.
Quantum could help
Down the road, quantum computers will likely be able to solve many of these combinatorial explosion problems in a single instruction, said Peter Chapman, president and CEO of IonQ Inc., a quantum computing startup. This will enable a new class of machine learning that learns with a much-reduced set of input data and minimal training required.
Programmers are often limited in the tools they can use to solve certain problems. For a certain subset of these problems, a quantum computer will allow a programmer to access new methods that can't be used today. Quantum computers will impose different restrictions because they are not applicable to all classical compute problems, but for the problems they are good at, they will unlock new ways to do things that simply are too expensive to run on a classical computer today. Chapman expects to see quantum computers enter the commercial marketplace to work on real world business problems soon.
But not all IT executives are convinced that quantum will reduce AI's carbon footprint. Dataiku's Dougherty quipped, "Hoping quantum computing will come along and save the day in this context is like hoping fusion reactors will be ready in time to mitigate climate change. Maybe they could, but the hardware is so far from being ready that it's best to focus on improving what we already know and use."