Linear regression, feature scaling, and regression coefficients

In summary: There is no one "correct" way to do this. You can choose to scale all variables before running the analysis, or you can let the algorithm choose which variables to scale.
  • #1
fog37
1,569
108
TL;DR Summary
Linear regression, feature scaling, and regression coefficients
Hello,

In studying linear regression more deeply, I learned that scaling play an important role in multiple ways:

a) the range of the independent variables ##X## affects the values of the regression coefficients. For example, a predictor variable ##X## with a large range typically get assigned a larger regression coefficient and comparing the relative importance of the regression coefficients solely based on coefficient magnitude is misleading. The more appropriate way to compare coefficients to determine relative importance is to standardize the independent variables (standardization is a form of scaling) before building the model.

Another benefit of scaling the predictor variables (standardization, normalization or any other scaling technique) is to extract more meaning from the interpretation of the coefficients: sometimes a regression coefficient may be extremely small and that may just be due to the particular scaling of the data. It is possible to get a larger coefficient and extract more understanding about the relationship between ##Y## and ##X## by properly scaling the predictor variable.

I also read that certain statistical and ML algorithm really require scaling while other (rule-based ones) don't.

So, in essence, scaling is useful but not always required. However, in some cases, it is required as a pre-processing step...

Finally my question: without any type of scaling the independent variables, does linear regression (multiple or single) perform properly, i.e. are the regression coefficient computed correctly? Aside from interpretability issues, does linear regression (OLS) generate larger coefficients for variables with larger range?

Thank you for any input on this!
 
Physics news on Phys.org
  • #2
Whether or not to scale is primarily determined just by concerns about computational overflows and round-off errors. You should always look at the statistical significance of the coefficients (how many standard deviations they are away from zero) rather than just their magnitude. Any reasonable statistics package will have a regression algorithm that includes the information you need.
 
  • Like
Likes fog37
  • #3
True. I have an example where the coefficient is practically zero and the p-value is very very small (<0.05).
Linear scaling leads to a larger regression coefficient keeping the same p-value.

I guess my dilemma is that the certain algorithm change "require" feature scaling to perform correctly and I am wondering if linear regression is one of them...
 
  • #4
"Finally my question: without any type of scaling the independent variables, does linear regression (multiple or single) perform properly, i.e. are the regression coefficient computed correctly?"
Always, as long as there are not any data entry errors. The problem here is that your question is not well-phrased: if you take any set of data, correctly entered, and apply least squares, then assuming the program carries out LS correctly the coefficients are computed correctly -- you get the answers you should get based on the inputs.
What I think you mean by "computed correctly" is this: are they the ones appropriate for the context of the problem? IMO the answer there is more subtle: note that
We never know the true form of of any model: whenever you specify the form of a linear regression model you are making an assumption that it is correct. This means that, by default, assuming no errors in data entry, recording, or in the calculations, the coefficients are computed correctly for the assumed model form.
If you're asking about scaling there are two things [at least] to think about.
First: suppose, as an extreme example, you're trying to perform linear regression with a person's age in years based on their salary, in dollars. Typically salaries will be in thousands, age will be at most 100 [and most likely under 70, since we're talking about salaries]. In order to get an equation that looks like
Age = intercept + slope Salary
work the slope will need to be very small to give values on the right down to the scale of Age.
however, if Salary is in tens of thousands of dollars the slope won't be tiny, since the recorded values for salary are already roughly on the scale of age.
In short, in linear regression scaling is most often a matter of choice.
Second: there are some more sophisticated methods [K-nearest neighbors for one] where the essential calculations are based on distances between values, and if one or more of the variables are on significantly greater magnitude than others those variables will have dominate the calculations: here good practice is to scale all variables to have the same magnitude and variability prior to performing the analysis
 
  • Like
Likes fog37 and FactChecker
  • #5
statdad said:
Second: there are some more sophisticated methods [K-nearest neighbors for one] where the essential calculations are based on distances between values, and if one or more of the variables are on significantly greater magnitude than others those variables will have dominate the calculations: here good practice is to scale all variables to have the same magnitude and variability prior to performing the analysis
Would that be a standard step in the tool algorithm or at least an option that the user can select?
 
  • #6
To some extent whether the scaling is done automatically or left as an on/off option for the user depends on the software. Regardless, for the types of processes I mentioned scaling should be done
 
  • Like
Likes FactChecker
  • #7
statdad said:
To some extent whether the scaling is done automatically or left as an on/off option for the user depends on the software. Regardless, for the types of processes I mentioned scaling should be done
I can't think of any case where scaling was bad to do, and there are certainly cases where it should be done.
 
  • #8
I can't think of any where it would be bad, and [I believe] your comment also implies that there are situations where it isn't required.
 
  • #9
statdad said:
I can't think of any where it would be bad, and [I believe] your comment also implies that there are situations where it isn't required.
It's not a required part of the algorithm. It is just safer in some cases to avoid numerical problems with the calculations.
 

FAQ: Linear regression, feature scaling, and regression coefficients

What is linear regression?

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It assumes that the relationship can be represented by a linear equation, which can be used for prediction and analysis of trends within the data.

Why is feature scaling important in linear regression?

Feature scaling is important in linear regression because it ensures that all independent variables contribute equally to the model. Without scaling, features with larger ranges can dominate the cost function, leading to suboptimal model performance. Common methods of feature scaling include normalization and standardization.

What are regression coefficients?

Regression coefficients are the parameters in a linear regression model that represent the relationship between each independent variable and the dependent variable. Each coefficient indicates the expected change in the dependent variable for a one-unit change in the corresponding independent variable, assuming all other variables remain constant.

How do you interpret the regression coefficients?

To interpret the regression coefficients, you look at the value and sign of each coefficient. A positive coefficient indicates a direct relationship, meaning that as the independent variable increases, the dependent variable also increases. Conversely, a negative coefficient indicates an inverse relationship. The magnitude of the coefficient indicates the strength of the relationship.

What are some common assumptions of linear regression?

Common assumptions of linear regression include linearity, independence of errors, homoscedasticity (constant variance of errors), normality of error terms, and no multicollinearity among independent variables. Violating these assumptions can lead to inaccurate predictions and unreliable statistical inferences.

Back
Top