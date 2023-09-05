Supervised learning tends to get the most publicity in discussions of artificial intelligence techniques since it's often the last step used to create the AI models for things like image recognition, better predictions, product recommendation and lead scoring.

In contrast, unsupervised learning tends to work behind the scenes earlier in the AI development lifecycle: It is often used to set the stage for the supervised learning's magic to unfold, much like the grunt work that enablesa manager to shine. Both modes of machine learning are usefully applied to business problems, as explained later.

On a technical level, the difference between supervised vs. unsupervised learning centers on whether the raw data used to create algorithms has been pre-labeled (supervised learning) or not pre-labeled (unsupervised learning).

Let's dive in.

What is unsupervised learning? In unsupervised learning, an algorithm suited to this approach -- K-means clustering is an example -- is trained on unlabeled data. It scans through data sets looking for any meaningful connection. In other words, unsupervised learning determines the patterns and similarities within the data, as opposed to relating it to some external measurement. This approach is useful when you don't know what you're looking for and less useful when you do. If you showed the unsupervised algorithm many thousands or millions of pictures, it might come to categorize a subset of the pictures as images of what humans would recognize as felines. In contrast, a supervised algorithm trained on labeled data of cats versus canines is able to identify images of cats with a high degree of confidence. But there is a tradeoff with this approach: If the supervised learning project takes millions of labeled images to develop the model, the machine-generated prediction requires a lot of human effort. There is a middle ground: semisupervised learning. Aaron Kalb Aaron Kalb

What is semisupervised learning? Semisupervised learning is a sort of shortcut that combines both approaches. Semisupervised learning describes a specific workflow in which unsupervised learning algorithms are used to automatically generate labels, which can be fed into supervised learning algorithms. In this approach, humans manually label some images, unsupervised learning guesses the labels for others, and then all these labels and images are fed to supervised learning algorithms to create an AI model. Semisupervised learning can lower the cost of labeling the large data sets used in machine learning. "If you can get humans to label 0.01% of your millions of samples, then the computer can leverage those labels to significantly increase its predictive accuracy," said Aaron Kalb, co-founder and chief innovation officer of Alation, an enterprise data catalog platform. Figure 1. These machine learning models support a variety of business applications.

What is reinforcement learning? Another machine learning approach is reinforcement learning. Typically used to teach a machine to complete a sequence of steps, reinforcement learning is different from both supervised and unsupervised learning. Data scientists program an algorithm to perform a task, giving it positive or negative cues, or reinforcement, as it works out how to do the task. The programmer sets the rules for the rewards but leaves it to the algorithm to decide on its own what steps it needs to take to maximize the reward -- and, therefore, complete the task. Shivani Rao Shivani Rao

When should supervised learning vs. unsupervised learning be used? Shivani Rao, manager of machine learning at LinkedIn, said the best practices for adopting a supervised or unsupervised machine learning approach are often dictated by the circumstances, assumptions you can make about the data and application. The choice of using supervised learning versus unsupervised machine learning algorithms can also change over time, Rao said. In the early stages of the model building process, data is commonly unlabeled, while labeled data can be expected in the later stages of modeling. For example, for a problem that predicts if a LinkedIn member will watch a course video, the first model is based on an unsupervised technique. Once these recommendations are served, a metric recording whether someone clicks on the recommendation provides new data to generate a label. LinkedIn has also used this technique for tagging online courses with skills that a student might want to acquire. Human labelers, such as an author, publisher or student, can provide a precise and accurate list of skills that the course teaches, but it is not possible for them to provide an exhaustive list of such skills. Hence, this data can be thought of as incompletely tagged. These types of problems can use semisupervised techniques to help build a more exhaustive set of tags. Bharath Thota Bharath Thota Data science and advanced analytics expert Bharath Thota, partner at consulting firm Kearney, said that practical considerations also tend to govern his team's choice of using supervised or unsupervised learning. "We choose supervised learning for applications when labeled data is available and the goal is to predict or classify future observations," Thota said. "We use unsupervised learning when labeled data is not available and the goal is to build strategies by identifying patterns or segments from the data." Alation data scientists use unsupervised learning internally for a variety of applications, Kalb said. For example, they have developed a human-computer collaboration process for translating arcane data object names into human language, e.g., "na_gr_rvnu_ps" into "North American Gross Revenue from Professional Services." In this case, the machines guess, humans confirm and machines learn. "You could think of it as semisupervised learning in an iterative loop, creating a virtuous cycle of increased accuracy," Kalb said.