What Is PRESS & Why is e_{i,-i} Its Notation?

  • Thread starter logarithmic
  • Start date
  • Tags
    Notation
In summary, PRESS residuals provide a way to measure the influence of individual data points on the overall linear regression model. This is done by removing each data point one at a time, fitting the model without that point, and then comparing the estimated value with the actual value. The difference between these two values is known as the PRESS residual, and a large value indicates that the data point has a significant impact on the regression. This method allows for the identification of outliers and other influential points without having to refit the model multiple times. Other related concepts, such as internally and externally standardized residuals, fall under the category of regression diagnostics.
  • #1
logarithmic
107
0
So after studying PRESS residuals I'm curious to know what PRESS stands for, and why it is denoted [tex]e_{i,-i}[/tex]. What is the significance of this particular subscript in the notation. (Not very mathematical questions, I know).
 
Physics news on Phys.org
  • #2
The results of linear regression - estimates of the coefficients as well as anything else - are easily influenced by outliers Even a single outlier, in either [tex] y [/tex] or [tex] \mathbf{x} [/tex] space, can have a drastic influence on the fit.
The residuals you are discussing provide one way of providing just how much influence the individual observations have on the overall regression. The idea is to think about removing, one at a time, individual data points from your data set, fitting the model without that data value, then seeing how well this new regression describes the eliminated value.

I'll concentrate on the data value labeled [tex] (\mathbf{x}_1, y_1) [/tex] - except for notation, the idea is the same for all. The philosophy is

  • Eliminate [tex] (\mathbf{x}_1, y_1) [/tex] from the data
  • Fit the regression using the remaining data
  • Use the new model to estimate [tex] y_1 [/tex]

The PRESS residual is simply the difference between the estimate of [tex] y_1 [/tex], obtained with the reduced data set, and the actual value of [tex] y_1 [/tex]. Large values of this residual indicate that the pair [tex] (\mathbf{x}_1, y_1) [/tex] have a large contribution to the fitting of the original regression.

The same idea holds for the other data values.
It is not necessary to actually refit the regression several times, once for each of the original data values. There are rather simple ways to obtain these values from items calculated during the original fit.

As you read more on this topic you will also see discussions of internally versus externally standardized residuals. The terminology is extensive, but all of these ideas relate to the same goal: examining a large, complicated, set of data to see which points exert unreasonable influence on a regression. These ideas, and others, fall into the category of regression diagnostics .

Finally, one short discussion of the ideas in your post can be found here:

http://www.sph.umich.edu/class/bio650/2001/LN_Nov05.pdf

Good luck.
 
Last edited by a moderator:

FAQ: What Is PRESS & Why is e_{i,-i} Its Notation?

What is PRESS?

PRESS stands for "Predicted Residual Sum of Squares" and is a statistical method used in regression analysis to evaluate the predictive power of a model. It is used to determine how well a model fits the data and to identify any influential data points that may be affecting the model's performance.

Why is ei,-i used in PRESS notation?

ei,-i represents the difference between the actual data point and the predicted data point when that data point is excluded from the model. This allows for a more accurate assessment of the model's predictive power as it takes into account how well the model performs on data that it has not been trained on.

How is PRESS calculated?

PRESS is calculated by iteratively removing each data point from the model, recalculating the model's parameters, and then using the new model to predict the excluded data point. The difference between the predicted and actual values is then squared and summed to give the PRESS value.

What is the significance of PRESS in regression analysis?

PRESS is an important tool in regression analysis as it allows for the identification of influential data points and the assessment of a model's predictive power. It is used to evaluate the overall fit of the model and to identify any potential issues or areas for improvement.

How is PRESS different from other evaluation metrics?

PRESS differs from other evaluation metrics, such as R-squared or mean squared error, in that it takes into account the performance of the model on data that it has not been trained on. This allows for a more accurate assessment of the model's predictive power and can help to identify any issues with overfitting or influential data points.

Back
Top