- #1
meanrev
- 116
- 2
I cannot find a consistent definition of the studentized residual and the RMSEP, because I've noticed that various websites, lecture notes and software packages mix up 1 or 2 definitions along the way to the point that a "compound" definition ends up very different between one reference source and another!
So I'm going to write all of my definitions from the ground up. Would someone be so kind as to confirm to me if my definitions 4, 5, 7 and 8 are correct?
> Regarding (4) and (5), should I divide my PRESS by the sample size [itex]n[/itex] or should I divide it by the degrees of freedom, as I would calculate the RMSE?
> Regarding (7) and (8), am I correct to use the jackknifed residual in the numerator and the RMSEP (instead of the RMSE) in the denominator? Is there an intuitive explanation as to why I should prefer the jackknifed residual over the internally studentized residual?
DEFINITION 1. My raw residuals are [itex]\hat{e}_{i}=Y_{i}-\hat{Y}_{i}[/itex] where [itex]Y_{i}[/itex]'s are the actual values and [itex]\hat{Y}_{i}[/itex] are the values predicted by the regression equation.
DEFINITION 2. The hat matrix is defined as [itex]H[/itex] such that the vector of values predicted by the regression equation [itex]\hat{Y}=HY[/itex], where [itex]Y[/itex] is the vector of actual values.
DEFINITION 3. The jackknifed residuals are defined as [itex]\hat{e}_{i,-i}=Y_{i}-\hat{Y}_{i,-i}[/itex] where [itex]\hat{Y}_{i,-i}[/itex] are the values predicted by the regression equation estimated while excluding [itex]Y_{i}[/itex]
DEFINITION 4. Given a sample size of [itex]n[/itex] data points and [itex]k[/itex] predictor variables, the RMSE is simply the SSE divided by the degrees of freedom, [itex]\sqrt{\dfrac{SSE}{n-k-1}}[/itex].
DEFINITION 5. Given a sample size of [itex]n[/itex] data points, the predicted residual sums of squares (PRESS) is [itex]PRESS=\sum_{i=1}^{n}\hat{e}_{i,-i}=\sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i,-i}\right)^{2}[/itex] so the root mean squared error of prediction (RMSEP) is [itex]RMSEP=\sqrt{\dfrac{PRESS}{n}}[/itex]
DEFINITION 6. The standardized residual is the raw residual divided by its RMSE, i.e. [itex]\dfrac{\hat{e}_{i}}{RMSE}[/itex].
DEFINITION 7. The internally studentized residual is [itex]\dfrac{\hat{e}_{i}}{RMSE\sqrt{1-h_{ii}}}[/itex] where the leverage [itex]h_{ii}\in\left[0,1\right][/itex] is the [itex]i[/itex]th diagonal entry of the hat matrix .
DEFINITION 8. The studentized deleted residual is calculated using the jackknifed residuals, so it is computed as [itex]\dfrac{\hat{e}_{i,-i}}{RMSEP\sqrt{1-h_{ii}}}[/itex].
So I'm going to write all of my definitions from the ground up. Would someone be so kind as to confirm to me if my definitions 4, 5, 7 and 8 are correct?
> Regarding (4) and (5), should I divide my PRESS by the sample size [itex]n[/itex] or should I divide it by the degrees of freedom, as I would calculate the RMSE?
> Regarding (7) and (8), am I correct to use the jackknifed residual in the numerator and the RMSEP (instead of the RMSE) in the denominator? Is there an intuitive explanation as to why I should prefer the jackknifed residual over the internally studentized residual?
DEFINITION 1. My raw residuals are [itex]\hat{e}_{i}=Y_{i}-\hat{Y}_{i}[/itex] where [itex]Y_{i}[/itex]'s are the actual values and [itex]\hat{Y}_{i}[/itex] are the values predicted by the regression equation.
DEFINITION 2. The hat matrix is defined as [itex]H[/itex] such that the vector of values predicted by the regression equation [itex]\hat{Y}=HY[/itex], where [itex]Y[/itex] is the vector of actual values.
DEFINITION 3. The jackknifed residuals are defined as [itex]\hat{e}_{i,-i}=Y_{i}-\hat{Y}_{i,-i}[/itex] where [itex]\hat{Y}_{i,-i}[/itex] are the values predicted by the regression equation estimated while excluding [itex]Y_{i}[/itex]
DEFINITION 4. Given a sample size of [itex]n[/itex] data points and [itex]k[/itex] predictor variables, the RMSE is simply the SSE divided by the degrees of freedom, [itex]\sqrt{\dfrac{SSE}{n-k-1}}[/itex].
DEFINITION 5. Given a sample size of [itex]n[/itex] data points, the predicted residual sums of squares (PRESS) is [itex]PRESS=\sum_{i=1}^{n}\hat{e}_{i,-i}=\sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i,-i}\right)^{2}[/itex] so the root mean squared error of prediction (RMSEP) is [itex]RMSEP=\sqrt{\dfrac{PRESS}{n}}[/itex]
DEFINITION 6. The standardized residual is the raw residual divided by its RMSE, i.e. [itex]\dfrac{\hat{e}_{i}}{RMSE}[/itex].
DEFINITION 7. The internally studentized residual is [itex]\dfrac{\hat{e}_{i}}{RMSE\sqrt{1-h_{ii}}}[/itex] where the leverage [itex]h_{ii}\in\left[0,1\right][/itex] is the [itex]i[/itex]th diagonal entry of the hat matrix .
DEFINITION 8. The studentized deleted residual is calculated using the jackknifed residuals, so it is computed as [itex]\dfrac{\hat{e}_{i,-i}}{RMSEP\sqrt{1-h_{ii}}}[/itex].