https://www.techtarget.com/searchenterpriseai/definition/linear-regression
Linear regression is a statistical technique that identifies the relationship between the mean value of one variable and the corresponding values of one or more other variables. By understanding the relationship between variables, the linear regression technique can help data scientists model and predict how one variable will change in relation to another. This can involve analyses such as estimating sales based on product prices or predicting crop yield based on rainfall. At a basic level, the term regression means to return to a former or less developed state.
Linear regression in machine learning (ML) builds on this fundamental concept to model the relationship between variables using various ML techniques to generate a regression line between variables such as sales rate and marketing spend. In practice, ML tends to be more useful when working with multiple variables, called multivariate regression, where the relationships between them require more complex regression coefficients.
Linear regression is a basic component in supervised learning. At its core, it can help determine if one explanatory variable can provide value in predicting the outcome of the other. For example, does ad spending on one medium or another have any meaningful impact on sales?
In the most basic case, linear regression tries to predict the value of one variable, called the dependent variable, given another variable, called the independent variable. For example, if an organization was trying to predict sales rate based on advertising spend, sales would be the dependent variable, while ad spend would be the independent variable.
Linear regression is linear in that it guides the development of a function or model that fits a straight line -- called a linear regression line -- to a graph of the data. This line also minimizes the difference between a predicted value for the dependent variable given the corresponding independent variable.
In the case of estimating sales rate, each dollar in sales might climb regularly within a certain range for every dollar spent on ads and then slow down once the ad market reaches a saturation point. In these cases, more complex functions need to be constructed using statistics or ML techniques to fit the data onto a straight line.
Linear regression is important for the following reasons:
There are three main types of linear regression:
Beyond these three fundamental categories, however, there are numerous specific linear regression methods, which include the following:
Each specific approach can be applied to different tasks or data analysis objectives. For example, HLM -- also called multilevel modeling -- is a type of linear model intended to handle nested or hierarchical data structures, while ridge regression can be used when there's a high correlation between independent variables, which might otherwise lead to unintended bias using other methods.
The following highlights three common ways linear regression is used:
Typical use cases for linear regression in business include the following:
Advantages of linear regression include the following:
Disadvantages of linear regression include the following:
Linear regression requires the data set to support the following properties:
Linear regression methods can be performed on paper, but the process can be cumbersome and involve repetitive sums and squares. Higher education and professional applications of linear regression equations typically employ software applications to ingest data and perform the required mathematical processes quickly. Linear regression is supported by numerous software tools and programming libraries, including the following:
Linear regression is just one class of regression techniques for fitting numbers onto a graph.
Multivariate regression might fit data to a curve or a plane in a multidimensional graph representing the effects of multiple variables.
Although logistic regression and linear regression both use linear equations for predictions, logistic regression predicts whether a given data point belongs to one class or another, such as spam/not spam for an email filter or fraud/not fraud for a credit card authorizer.
Learn what supervised, unsupervised and semisupervised learning are and how machine learning algorithms are used in various business applications.
23 Sep 2024