(Image from Pattern Recognition and Machine Learning by Bishop)
Today I’m going to build on the ridge regression blog we last talked about by comparing it with the LASSO (Least Absolute Shrinkage and Selection Operator) algorithm. It was part of a larger conversation we’ve been having off and on in this blog on learning the basics of machine learning without a lot of math or coding. Since machine learning is such a big part of data science, it’s important to understand some of these principle algorithms.
Let’s return to the rainfall prediction problem where we want to know if our city will have more than 5 mm of rain next Monday. Ridge regression estimates how likely a variable is to influence the target variable and is the most commonly used linear regression technique in data science. Use cases for ridge regression and LASSO are when there are more predictor variables than observations.
Perhaps the data set we can access only has fifty observations of time of day but there are hundreds of other predictor variables that could affect rainfall amounts. In this problem, the rainfall amount would be the target variable, y. We could have three predictor variables, such as time of day, season and cloud cover percentage, represented as x1, x2 and x3. In LASSO, we assume that these variables x1, x2 and x3 are relevant to predicting rainfall amount. All the other hundreds of variables that might help us predict rainfall amount about would be excluded in the LASSO method.
Ridge regression and LASSO are both regularized linear regressions techniques, meaning they use math to change the input variables (x1, x2, x3) differently. LASSO was invented in 1989 to improve the prediction accuracy and make the results from the model easier to interpret. In our problem since we have hundreds of variables, a typical regression problem would probably “overfit” or make extreme predictions about rainfall. LASSO corrects for this so we can more accurately predict the rainfall amount next Monday. If the variables are highly correlated to each other, the LASSO model will not do as well. Many have suggested in this case to use Elastic Net regression instead.