Getty Images

Partners cite reinforcement learning use cases, gradual uptake

Consultants view reinforcement learning as a tool for decision-making at scale and a way to solve optimization problems, but its capabilities remain a mystery for some customers.

Reinforcement learning may not enjoy top-of-mind status among customers, but consultants contend the technology might prove just the tool they need to address some vexing problems.

A subfield of machine learning (ML), reinforcement learning (RL), works on the principle of rewarding desired behaviors. An RL agent assesses a problem that needs solving, takes actions and adapts to feedback. A favorable or unfavorable response nudges the agent in a different direction, a trial-and-error process that improves the agent's performance over time. In business, reinforcement learning use cases gravitate toward optimization problems such as dynamic pricing and investment portfolio management -- in both cases the objective is to maximize profits.

Enterprises adopt RL, but the uptake hasn't been as spectacular as other AI and ML approaches. The technology isn't well known among corporate decision-makers and its complexity provides another hurdle in the path of wider deployment.

That said, RL is poised to play a bigger role on the ML stage, although IT services executives differ on the magnitude of its influence.

Openings for reinforcement learning use cases

"More and more clients are bringing reinforcement learning to production," said Jarno Kartela, global head of AI advisory at Thoughtworks, a technology consulting firm based in Chicago.

Jarno Kartela, global head of AI advisory at ThoughtworksJarno Kartela

That's because the technology can help organizations overcome some difficult circumstances. RL can manage situations that change rapidly and help decision-making when clients don't have much data about the problem at hand, Kartela said. In some cases, a client may need to address its problem without any data available, he added. An organization may face data restrictions due to GDPR and other privacy measures, for example.

RL provides a workaround: While ML algorithms often depend on loads of historical data, RL learns from its immediate environment rather than data stores. A dynamic pricing algorithm, for example, would respond to environmental conditions such as unit cost and competitors' pricing.

Carm Taglienti, engineer at Insight EnterprisesCarm Taglienti

RL's dynamic nature also makes it practical for use cases in fast-changing fields such as IT security.

"We can use reinforcement learning in cybersecurity where we can take advantage of the fact that we can monitor a real-time signal," said Carm Taglienti, engineer at Insight Enterprises, a solutions integrator based in Tempe, Ariz. "It's the kind of model where you can adapt to change quickly, where you don't have to predefine the exact outcome and can learn as you go and change the parameters."

Another use case is online learning. An RL model can monitor mouse input from users to measure their reaction to training materials. The system might detect that users struggle with certain portions of the learning regimen and adjust the curriculum accordingly. In this application, RL creates different learning pathways that boost training productivity, Taglienti said.

In addition, RL can support a range of use cases where causal inference can help with problem solving. A client, for instance, may need to understand the causal effects between actions and customers, Kartela noted.

Barriers to acceptance

While RL can potentially address a range of business challenges, most customers aren't asking for the technology by name.

"Customers are interested in the problems [RF models] can solve, but they don't know what reinforcement learning actually is," Taglienti said. "Demystifying them is a way to increase adoption."

"Most companies are barely using advanced techniques, never mind reinforcement learning," said Fernando Lucini, global data science and ML engineering lead at Accenture. He cited neural networks as an example of an advanced method.

The complex science behind RL also presents an adoption barrier. "It's complicated to get the rewards function working correctly," Lucini said. The rewards function provides the means for encouraging certain agent behaviors and discouraging others.

Balancing the various components of RL is another consideration. "RL has many moving parts and so much incremented complexity, Lucini said. "Each of those parts are modeled by an ML algorithm -- and the whole end-to-end process is modeled by combining all those models in an RL setting. You have to combine all those for the greater good."

Technology adopters also face the challenge of setting an optimization goal to determine the success of an RL project. The objective, and its associated metrics, should be sharply focused.

"Without the measurement of very specific, quantifiable elements, it is very difficult to know if you are doing better or not," Taglienti said.

Finally, technologists may struggle to explain the workings of RL and neural networks to a corporate risk officer, Lucini noted. Can technology adopters humanize an algorithm? The issues are semantical as well as tied to technology. If the argument for using RL is that it makes better decisions, how is "better" defined?

Prospects for wider use

Enterprise customers may ramp up their use of reinforcement learning, despite the technology's deployment challenges. Thoughtworks' Kartela said he sees "decision factories" equipped with RL technology as a trend going forward. A decision factory provides a platform for performing RL at scale across thousands of decision-making points, he noted.

"This is what is going to be the next big thing," Kartela said.

It's earning its place, little by little.
Fernando LuciniGlobal data science and ML engineering lead, Accenture

A 2022 Thoughtworks report focused on customer intimacy -- the strategy of catering to the specific needs of customers to build loyalty -- as a suitable problem for a decision factory. A business introducing multiple techniques for achieving customer intimacy on its digital platform would initially have no data to guide decision-making. A decision factory addresses that limited-data use case. The factory's RL-based agents create a self-learning system that can "explore under uncertainty" and receive real-time feedback, the report noted.

Accenture's Lucini, on the other hand, said he doesn't see massive interest among clients in RL. Transformer neural networks, in contrast, experience far greater uptake among customers. Adoption has been "nonstop" for the last three years, he noted. Customers use transformers in areas such as speech and text translation.

RL, however, is making incremental progress. "It's earning its place, little by little," Lucini said. "It has been quietly adopted in problem solving."

Dig Deeper on Emerging technologies for MSPs