Why continuous training is essential in MLOps

Organizations with machine learning strategies must consider when evolving data needs require continuous training of ML models.

AI-based applications learn from experience and self-modify, but they can lose accuracy. Machine learning-based applications benefit from continuous training to make the right decisions.

ML applications can systematically analyze data about their performance accuracy or other qualities and, based on that analysis, update the code representing the ML models embedded within them.

When input data for these applications is constantly changing in the real world, continuous training practices are needed. Businesses must fit these practices into their broader MLOps strategies.

What continuous training is

Continuous training is a somewhat misleading term. It implies that training is always happening, but this is not the case. It can be more accurately characterized as regular ML model retraining. An enterprise monitors the performance of its ML application in production. As long as the model continues to function within acceptable parameters to deliver the desired results based on given inputs, it does not need further training. When it ceases to perform acceptably, falling below the acceptable level of performance triggers a new round of training for the model. Retraining triggers can be automated or based on expert analysis.

An ML application performing sentiment analysis on a stream of social media posts to flag them as favorable, neutral or unfavorable to a company's product must contend with changes in consumer behavior and product information. Periodic spot checks by analysts assess its performance. If the rate of mischaracterizations rises above some arbitrary threshold, such as 5%, that triggers a round of retraining.

Continuous training isn't always required. For example, an application scanning photographs to find appearances of U.S. state flags may never deviate much from its baseline performance. It might even improve as the overall quality of photos fed into it improves. However, continuous training is usually required at some point.

The benefits of continuous training

ML applications are often deemed useful because they replace or reduce the need for actual human attention and judgment. As people do, ML apps make decisions based on what the model can understand about the training data and the concepts reflected by that data. If the nature of the data changes, known as data drift, or if the underlying concepts no longer apply, known as conceptual drift, the model must be retrained with new data or a new underlying conceptual framework.

The aforementioned sentiment analysis program could need regular retraining due to conceptual drift. The words and syntactical structures people use to indicate positive or negative sentiment evolve, especially among younger demographics. The application might also have to deal with data drift due to a sudden change in the volume of bot-generated messages or messages written by paid promotional services.

Where continuous training fits in the MLOps model

MLOps consists of four interlocking cycles of development: the data cycle, the model cycle, the development cycle and the operations cycle. Each cycle feeds forward into the next, meaning data for model training feeds into models, those models are then built into apps and those apps are ultimately put into production. The cycles can also feed backward, because performance data can trigger redesigns in the development cycle.

Normal operations, in the operations cycle, generate model performance data. When retraining a model, that operational data enters the data cycle. The first order of business is to identify the new training data, as well as how to assemble and prepare it for model training.

Organizations have improved ML-based applications with retraining. Domino Data Lab cites the success of Topdanmark, a European insurance company that automated the process of detecting performance drift in many of its ML models. They were able to automate different types of insurance claims using different models. With this automation, data scientists could focus on identifying changes in their data and adjusting those models to accommodate those changes. Because performance drift had triggered model retraining, a previously manual process could now take much less of their time.

In another example, online gaming company Wildlife uses Anyscale's AI development technology to help triple the speed with which they can put a revised ML model into production. Wildlife uses AI technology to make game recommendations to players.

If the MLOps lifecycle is integral to an organization's workflow for AI app development, that organization should carefully assess whether their ML models require continuous training. They should look out for the various scenarios that trigger retraining, such as the need to keep up with new data or address model performance deterioration.

Next Steps

How to become an MLOps engineer

Dig Deeper on Machine learning platforms

Business Analytics
Data Management