TechTarget.com/searchenterpriseai

https://www.techtarget.com/searchenterpriseai/definition/linear-regression

What is linear regression?

By Stephen J. Bigelow

Linear regression is a statistical technique that identifies the relationship between the mean value of one variable and the corresponding values of one or more other variables. By understanding the relationship between variables, the linear regression technique can help data scientists model and predict how one variable will change in relation to another. This can involve analyses such as estimating sales based on product prices or predicting crop yield based on rainfall. At a basic level, the term regression means to return to a former or less developed state.

Linear regression in machine learning (ML) builds on this fundamental concept to model the relationship between variables using various ML techniques to generate a regression line between variables such as sales rate and marketing spend. In practice, ML tends to be more useful when working with multiple variables, called multivariate regression, where the relationships between them require more complex regression coefficients.

Linear regression is a basic component in supervised learning. At its core, it can help determine if one explanatory variable can provide value in predicting the outcome of the other. For example, does ad spending on one medium or another have any meaningful impact on sales?

In the most basic case, linear regression tries to predict the value of one variable, called the dependent variable, given another variable, called the independent variable. For example, if an organization was trying to predict sales rate based on advertising spend, sales would be the dependent variable, while ad spend would be the independent variable.

Linear regression is linear in that it guides the development of a function or model that fits a straight line -- called a linear regression line -- to a graph of the data. This line also minimizes the difference between a predicted value for the dependent variable given the corresponding independent variable.

In the case of estimating sales rate, each dollar in sales might climb regularly within a certain range for every dollar spent on ads and then slow down once the ad market reaches a saturation point. In these cases, more complex functions need to be constructed using statistics or ML techniques to fit the data onto a straight line.

Why is linear regression important?

Linear regression is important for the following reasons:

Types of linear regression

There are three main types of linear regression:

  1. Simple linear regression. Simple linear regression finds a function that maps data points to a straight line onto a graph of two variables.
  2. Multiple linear regression. Multiple linear regression finds a function that maps data points to a straight line between one dependent variable, like ice cream sales, and a function of two or more independent variables, such as temperature and advertising spend.
  3. Nonlinear regression. Nonlinear regression finds a function that fits two or more variables onto a curve rather than a straight line.

Beyond these three fundamental categories, however, there are numerous specific linear regression methods, which include the following:

Each specific approach can be applied to different tasks or data analysis objectives. For example, HLM -- also called multilevel modeling -- is a type of linear model intended to handle nested or hierarchical data structures, while ridge regression can be used when there's a high correlation between independent variables, which might otherwise lead to unintended bias using other methods.

Examples of linear regression

The following highlights three common ways linear regression is used:

  1. It can be used to identify the magnitude of the effect an independent variable, like temperature, might have on a dependent variable like ice cream sales.
  2. It can be used to forecast the impact of changes driven by the independent variable -- for example, how much more ice cream might be sold with different levels of advertising.
  3. It can be used to predict trends and future values -- like how much ice cream should be stocked to meet demand if the temperature is predicted to reach 90 degrees Fahrenheit.

Linear regression use cases

Typical use cases for linear regression in business include the following:

Advantages and disadvantages of linear regression

Advantages of linear regression include the following:

Disadvantages of linear regression include the following:

Key assumptions of linear regression

Linear regression requires the data set to support the following properties:

Linear regression tools

Linear regression methods can be performed on paper, but the process can be cumbersome and involve repetitive sums and squares. Higher education and professional applications of linear regression equations typically employ software applications to ingest data and perform the required mathematical processes quickly. Linear regression is supported by numerous software tools and programming libraries, including the following:

Linear regression vs. logistic regression

Linear regression is just one class of regression techniques for fitting numbers onto a graph.

Multivariate regression might fit data to a curve or a plane in a multidimensional graph representing the effects of multiple variables.

Although logistic regression and linear regression both use linear equations for predictions, logistic regression predicts whether a given data point belongs to one class or another, such as spam/not spam for an email filter or fraud/not fraud for a credit card authorizer.

Learn what supervised, unsupervised and semisupervised learning are and how machine learning algorithms are used in various business applications.

23 Sep 2024

All Rights Reserved, Copyright 2018 - 2025, TechTarget | Read our Privacy Statement