Testing a Linear Stepwise Regression Model - Need Advice!

In summary: CBThere are a number of tests you could use. For the residual analysis you could use a Student's t-test or a Levene's test. For the Wilcoxon's test you could use the χ2 or the McNemar's test. For the Spearman's test you could use the Spearman's rank correlation or the Spearman's rho.
  • #1
davemk
8
0
Hi folks.

Just looking for some input please.

I have a dataset containing interval data (one dependent and 6 independent variables) and taken a random 90% sample (approx 300 observations). I've performed a linear stepwise regression on the 90%, in order to obtain a model to predict the dependant using a number of input variables. I'm confident that I've done this ok.

The issue comes with testing the model. I'm sure that this is probably a simple step but, for some reason, I'm really struggling with it and would be grateful for some advice.

In order to test the model, I'm using the 10% of the dataset that were not used in the linear regression. I've input the predictor variables into the model, which has given me an expected value. I now want to compare this to the actual value. I was originally going to use Chi Square but that seems to be probability based and I'm not sure it's appropriate.

I've been told Spearman's rho would probably be most appropriate although I'm still not 100% sure that's right. Essentially, I would only be testing whether my predicted values = actual values.All help appreciated. Thanks in advance.
 
Last edited:
Physics news on Phys.org
  • #2
davemk said:
Hi folks.

Just looking for some input please.

I have a dataset containing interval data (one dependent and 6 independent variables) and taken a random 90% sample (approx 300 observations). I've performed a linear stepwise regression on the 90%, in order to obtain a model to predict the dependant using a number of input variables. I'm confident that I've done this ok.

The issue comes with testing the model. I'm sure that this is probably a simple step but, for some reason, I'm really struggling with it and would be grateful for some advice.

In order to test the model, I'm using the 10% of the dataset that were not used in the linear regression. I've input the predictor variables into the model, which has given me an expected value. I now want to compare this to the actual value. I was originally going to use Chi Square but that seems to be probability based and I'm not sure it's appropriate.

I've been told Spearman's rho would probably be most appropriate although I'm still not 100% sure that's right. Essentially, I would only be testing whether my predicted values = actual values.All help appreciated. Thanks in advance.

To some extent this depends on how clever you want to be. What you want to do is test that the residuals for the hold back sample have zero mean and that they are homoscedastic. With about 30 points you may have difficulty doing much more.

For the first of these I would just test for zero mean using the usual methods.

For the latter I would plot the residuals against the input variables and eyeball the data (at least to start with), but there are tests, see http://en.wikipedia.org/wiki/Homoscedasticity for a pointer.

You might also want to test the residuals for normality.

CB
 
  • #3
That's a great help, thank you very much.

I've already plotted the residuals for obs vs expected and histograms for normailty so I'll have a look into the tests within the link you posted (I must admit, I've never heard of those tests so I'll have a read up on those).

Thanks again. I'll update the thread with my progress asap.
 
  • #4
CaptainBlack said:
To some extent this depends on how clever you want to be.

With about 30 points you may have difficulty doing much more.

Hello again. If I was to get more data (say 70 observations) in order to test the model, is there a specific test that I could use? At the moment, I've performed a residual analysis and then I'm looking at performing a Wilcoxon's test or Spearman's test.

Any thoughts on this process, or alternatives? The procedures in the link above don't appear to be available in SPSS.
 
  • #5


Hi there,

Thank you for reaching out for advice on testing your linear stepwise regression model. It sounds like you have done a good job so far in performing the regression and now you just need to test the model.

Firstly, it is important to note that there is no one "correct" way to test a regression model. Different tests may be appropriate depending on the specific goals and assumptions of your study.

One option for testing your model is to use the 10% of the dataset that was not used in the regression as a holdout sample. This means that you can use this data to evaluate the accuracy of your model's predictions. You can compare the predicted values from your model to the actual values in the holdout sample and calculate metrics such as mean squared error or R-squared to assess the performance of your model.

Another option is to use cross-validation, which involves dividing your data into multiple subsets and using each subset as both a training and testing set. This can help to provide a more accurate assessment of your model's performance.

As for the specific test to use, Spearman's rho is a measure of the correlation between two variables and may be appropriate if you are interested in assessing the strength of the relationship between your predicted and actual values. However, it is important to consider the assumptions and limitations of this test before using it.

I would also recommend consulting with a statistician or conducting further research on appropriate tests for your specific study. It is important to carefully consider the goals and assumptions of your study before selecting a test to use.

I hope this helps and good luck with your analysis.
 

FAQ: Testing a Linear Stepwise Regression Model - Need Advice!

What is a linear stepwise regression model?

A linear stepwise regression model is a statistical method used to analyze the relationship between a dependent variable and one or more independent variables. It involves a step-by-step process of selecting the most significant variables to be included in the final model, based on their contribution to the overall prediction of the dependent variable.

How is a linear stepwise regression model tested?

To test a linear stepwise regression model, the first step is to specify the dependent variable and independent variables. Then, the model is fitted using a regression analysis technique, such as ordinary least squares. The significance of each independent variable is assessed through statistical tests, and the variables with the highest significance are included in the final model.

What are the advantages of using a linear stepwise regression model?

One advantage of using a linear stepwise regression model is that it helps to identify the most significant variables for predicting the dependent variable, which can lead to a more accurate and efficient model. It also allows for the assessment of the individual contributions of each variable to the overall prediction, making it easier to interpret the results.

What are the limitations of a linear stepwise regression model?

A major limitation of a linear stepwise regression model is that it relies on a predetermined significance level to determine which variables should be included, which can lead to the omission of important variables. It also assumes a linear relationship between the dependent and independent variables, which may not always be the case in real-world data.

How can the results of a linear stepwise regression model be interpreted?

The results of a linear stepwise regression model can be interpreted by looking at the coefficients of each variable included in the final model. These coefficients represent the change in the dependent variable for a one-unit change in the independent variable. Additionally, the overall significance of the model can be assessed through measures such as the F-statistic and R-squared value.

Back
Top