Normal assumption with least squares regression

In summary, the conversation is about the assumption of normality when using least squares estimation. The speaker questions why normality is necessary and if it can be assumed to be another distribution. The summary explains that normality is not needed for least squares estimation, but it is necessary for testing hypotheses. The speaker is advised to use maximum likelihood estimation if they wish to assume a non-normal distribution.
  • #1
Guy Incognito
4
0
My google search just turns up results telling me that one of the assumptions I have to make is that each Y is normal. My question is why do I have to assume its normal. Why does it follow that it has to be normal as opposed to some other distribution? Hope that makes sense.

Edit: I thought about this some more. Is it just as simple as the standard errors for the parameters are computed assuming each Y is normal? If you write it out you can easily see that B1 for example is a linear function of Y1...Yn and thus will be normal.
 
Last edited:
Physics news on Phys.org
  • #2
You do not need normality for least squares estimation. That includes the estimation of standard errors. The LS parameter estimates and the standard deviations are sample-based statistics; they do not require making assumptions about a distribution.

You do need normality when you are testing hypotheses based on the parameter estimates and the standard deviations. Since hypothesis testing means looking up probability values from a "probability table," you need to know which table to look at, and that means you have to make an assumption about the distribution.
 
  • #3
Ok, so I guess my question is why do I have to assume that it's normal. Why can't I assume it's gamma or anything else. I was under the impression that if I wanted to use anything other than normal, I had to use GLMs (which I'll admit I know nothing about).
 
Last edited:
  • #4
My advice is not to make distributional assumptions whenever you don't have to.

However, that would imply you cannot use ordinary LS results to test hypotheses.

If you wish to assume a non-normal distribution, then my advice is to use maximum likelihood estimation: http://en.wikipedia.org/wiki/Maximum_likelihood
 

Related to Normal assumption with least squares regression

1. What is the normal assumption in least squares regression?

The normal assumption in least squares regression is a fundamental assumption that states that the errors or residuals of the regression model follow a normal distribution. This means that the majority of the errors will be close to zero, with fewer errors that are further away from zero.

2. Why is the normal assumption important in least squares regression?

The normal assumption is important because it allows us to make statistical inferences about the regression model. When the errors are normally distributed, we can use various statistical tests and confidence intervals to assess the significance and accuracy of the regression coefficients.

3. How can I check if the normal assumption holds in my regression model?

There are several graphical and statistical methods that can be used to check the normal assumption in least squares regression. These include residual plots, histogram and QQ plots, and statistical tests such as the Shapiro-Wilk test. These methods can help identify any deviations from normality and guide potential remedies.

4. What happens if the normal assumption is violated in least squares regression?

If the normal assumption is violated, the statistical inferences made from the regression model may not be reliable. This can lead to incorrect conclusions about the significance and accuracy of the regression coefficients. In such cases, alternative regression methods, such as robust regression, may be more appropriate.

5. Is it always necessary to meet the normal assumption in least squares regression?

While the normal assumption is desirable, it is not always necessary to meet it in least squares regression. In some cases, the central limit theorem can help to approximate the normality of the errors, especially when the sample size is large. Additionally, there are robust regression techniques that can be used to handle deviations from normality in the errors.

Similar threads

Replies
30
Views
3K
Replies
7
Views
1K
Replies
7
Views
1K
Replies
4
Views
3K
Replies
4
Views
2K
Replies
5
Views
1K
Back
Top