What is a decision tree in machine learning?
A decision tree is a flow chart created by a computer algorithm to make decisions or numeric predictions based on information in a digital data set. When algorithms learn to make decisions based on past known outcomes, it's known as supervised learning. The data set containing past known outcomes and other related variables that is used by a decision tree algorithm to learn is known as training data.
In a credit line application, for example, the results of the training, or fitting, stage in a decision tree is when the algorithm learns a flow chart from a data set. The target, or y, variable is the decision to accept or deny past credit applicants. The decision is based on other data such as credit score, number of late payments, current debt burden, length of credit history, account balances and similar information, all contained in a training data set. The debt-to-income (DTI) ratio measures an applicant's debt burden. If an applicant has an income of $100,000 and outstanding debt of $500,000, for example, then the DTI ratio would be 5.
Decisions tree use case
The decision tree structure shown in the diagram has three levels based on three different variables in the example training data set: credit score, late payments and DTI ratio. The top of the tree, the internal decision points and the outcome decisions are often referred to as nodes. This tree has one root node labeled "Start," four leaf nodes labeled as "Accept" or "Deny" and six internal nodes that contain the variables. Leaf nodes, sometimes known as terminal nodes, contain the decisions.
The credit score, late payments and DTI ratio selected by the decision tree algorithm are some of the more important variables in this data set. They interact in meaningful ways to yield the decision to accept or deny a credit application. The decision tree algorithm also lists the important split points that indicate creditworthiness: A credit score of 700, three or fewer late payments and a DTI ratio of 5.
Using the example shown in the diagram, the process of running a data point -- a single applicant's data -- through the tree to arrive at a decision is called scoring or inference. Available applicant information -- like credit score, late payments and DTI ratio -- is run through the decision tree to arrive at a uniform and systematic lending decision.
Data scientists can use the decision tree for single applications, as shown in the bottom of the diagram, or to score entire portfolios of consumers quickly and automatically.
Types of decision trees
There are many decision trees within two main types: classification and regression. Each subcategory of a decision tree has customizable settings, making them a flexible tool for most supervised learning and decision-making applications. One way to differentiate the type of decision tree used is whether the prediction is a categorical variable and classification outcome (for example, lend: yes, no; and letter grade: A, B, C, D or F) or a numeric variable and numeric outcome (for example, how much to lend and numeric grade). Decision trees that predict numeric outcomes are sometimes known as regression trees.
Decision tree algorithms such as CART and C4.5 use characteristics like find the most important variables, locate split points and select a final tree structure to distinguish themselves from one another. There are also many free and commercial software packages that offer various settings and types of algorithms.
While there might be a specific reason to use a certain type of decision tree, it is sometimes necessary to try different algorithms and settings to determine what works best for a specific business problem and data set. Some decision tree software even lets data scientists add their own variables and split points by collaborating interactively with the decision tree algorithm. Other popular machine learning algorithms are based on combinations of many decision trees.
A random forest algorithm takes the mode or average decision from many individual decision trees to achieve a final outcome. Interesting mathematical things can happen when many simpler models are combined. Due to their mathematical properties and the overall flexibility of decision trees, random forests are one of the easiest machine learning algorithms to use.
A gradient boosting machine (GBM) starts with a regular decision tree and then uses a series of additional trees to improve upon the outcome of the single tree. Though much more temperamental than a single decision tree or random forest, GBMs tend to be one of the more accurate machine learning algorithms for business data mining applications.
Advantages of using a decision tree
Decision trees are not restricted to financial applications. They work well for many traditional business applications based on structured data, which are columns and rows of data stored in database tables or spreadsheets. Decision trees can provide many advantages, including the following:
- They are flexible and come in many forms for most business decision-making applications.
- They use heterogeneous data like numeric and text data.
- They use dirty data that doesn't need a lot of cleaning or standardization.
- They use missing data values in their training process.
- They don't take much time to train or score new data -- unlike other types of machine learning models.
- They are considered explainable, or glassbox, models that are structured for direct interpretability. Explainable models are more easily verified and debugged by data scientists and more likely to be approved and adopted for business applications than some of their super-complex machine learning cousins.
- There are many free and commercial software packages that offer different settings and types of algorithms.
Disadvantages of using a decision tree
Decision trees have their share of flaws, including the following:
- Decision tree training processes tend to be "greedy" because the tree learns from the top down -- one decision point at a time -- without consideration for the entire tree structure.
- Greedy training processes can lead to instability, as adding just a few more rows of training data and refitting the tree can lead to an entirely different structure.
- Decision trees tend to perform poorly on pattern recognition tasks in unstructured data, such as images, videos, raw text and sound. Those applications need deep neural networks.
Optimal decision trees can fix greediness and instability issues, but they tend to be available in academic software that is more difficult to use. Ensemble models, like random forests and GBMs, improve the stability of single decision trees, but they are also much less explainable than single trees.