Active learning, in an AI context, is the capacity of a machine learning (ML) algorithm to query a human source for additional information. Essentially, active learning allows a minimally-trained AI program to identify what data it needs to become better. The algorithm identifies which subset of data it expects to yield the best results for a particular category and requests that someone label the data in that subset.
Active learning algorithms require minimal training data, which makes them especially helpful when there’s not a lot of labeled data available. This makes this type of algorithm useful for information retrieval and text analysis -- as well as image and speech recognition.
Active learning vs. supervised learning vs. unsupervised learning
Supervised ML, which uses historic data to make forecasts about new data, requires a human to create input and desired-output data for training. Because this approach requires a lot of human overhead, it can be expensive. In contrast, AI systems that use unsupervised learning require very little human overhead because the algorithms simply look for patterns in unlabeled datasets. While this type of ML can be cost-effective because it does not require as much human input, it can also be difficult to quantify the results as being meaningful.
Active learning can use both structured and unstructured data in a cost-efficient manner by prioritizing which data the model is most confused about and requesting labels for just that data. The model will use a relatively small amount of labeled data for training and request more labels later on if needed. This iterative approach to machine learning not only helps the machine learning model learn faster, it also keeps costs down by letting humans skip labeling data that isn't helpful to the model.