- #1
fog37
- 1,569
- 108
- TL;DR Summary
- scaling and standardization in statistical analysis
Hello everyone,
When working with variables in a data set to find the appropriate statistical model (linear, nonlinear regression, etc.), the variables can have different range, standard deviation, mean, etc.
Should all the input variables be always standardized and scaled before the analysis is applied so they have the same mean and range?
For example, when determining the price of a house (target output variable) using a multivariate linear regression model, the input variables (square footage, year it was build, number of rooms, etc.) have very different ranges...It could happen that a certain variables gets a larger weight just because of the range of its values...
What do do?
When working with variables in a data set to find the appropriate statistical model (linear, nonlinear regression, etc.), the variables can have different range, standard deviation, mean, etc.
Should all the input variables be always standardized and scaled before the analysis is applied so they have the same mean and range?
For example, when determining the price of a house (target output variable) using a multivariate linear regression model, the input variables (square footage, year it was build, number of rooms, etc.) have very different ranges...It could happen that a certain variables gets a larger weight just because of the range of its values...
What do do?