Simple least squares regression problem. Am I doing anything wrongly?

In summary, the purpose of a simple least squares regression problem is to model the relationship between a dependent variable and one or more independent variables. When selecting variables for the model, it's important to consider their relevance and potential impact, as well as using statistical tests to determine their significance. Common mistakes to avoid include assuming causation when there is only correlation, checking for multicollinearity, and assessing the residuals for normal distribution and outliers. To evaluate the performance of the model, the coefficient of determination (R-squared) is commonly used, along with other metrics such as RMSE and MAE. If the model does not meet assumptions, alternative techniques may need to be considered, along with critical evaluation of the data and seeking advice from experts
  • #1
bobthebanana
23
0
Least squares regression of Y on A-D based on sample size of 506. Reported results with standard errors are:


Y = 11.08 - 0.954*A - 0.134*B + 0.255*C - 0.052*D
s.errs (0.32) (0.117) (0.043) (0.019) (0.006)

R^2 = 0.581


problem A. Test null that coefficient on D is equal to 0
d = coefficient on D
null: D ~ N(0, 0.006)
Pr(d >= 0.052) = 1 - normalcdf(0.052 / 0.006) = 0
reject


problem B. Construct 95% confidence interval for coefficient on D
0.052 +/- 1.96*(0.006 / sqrt(506))


problem C. What is the probability that this interval contains the true population regression coefficient on D?
? just 95%?



Is anything wrong with any of my answers?
 
Physics news on Phys.org
  • #2


Your answers are generally correct, but there are a few areas that could use some clarification or improvement.

For problem A, you could mention that the test is a two-tailed test with a significance level of 0.05. This means that we would reject the null hypothesis if the p-value is less than 0.05, which it is in this case. Also, instead of just saying "reject," you could say "reject the null hypothesis" to make it clear what you are rejecting.

For problem B, you should specify that the confidence interval is for the population regression coefficient on D, not just the coefficient itself. Additionally, you should mention that the confidence interval is calculated using the t-distribution with 504 degrees of freedom, since we are using a sample size of 506.

For problem C, you are correct that the probability that the interval contains the true population regression coefficient on D is 95%. However, you could also mention that this is based on the assumption that the model is correctly specified and that the assumptions of least squares regression are met. This is important to mention because if the model is misspecified or the assumptions are violated, the confidence interval may not accurately reflect the true population coefficient.
 
  • #3


I cannot provide a definitive answer without access to the data and the specific details of the regression analysis. However, based on the information provided, your responses seem to be reasonable and appropriate. The null hypothesis for problem A is correctly stated and the calculation for the probability of rejecting the null hypothesis is correct. In problem B, the formula for the confidence interval is correct and the use of the standard error is appropriate. In problem C, it is correct that the probability of the interval containing the true population regression coefficient on D would be 95%, assuming the assumptions of the simple least squares regression are met. However, it would be helpful to also report the p-value for this confidence interval, as it provides additional information about the significance of the coefficient on D. Overall, your responses demonstrate a good understanding of simple least squares regression and its interpretation.
 

Related to Simple least squares regression problem. Am I doing anything wrongly?

1. What is the purpose of a simple least squares regression problem?

The purpose of a simple least squares regression problem is to model the relationship between a dependent variable and one or more independent variables. It is a statistical technique used to understand and predict the behavior of a dependent variable based on changes in the independent variables.

2. How do I know if I am using the correct variables in my regression model?

In a simple least squares regression problem, the independent variables should be chosen based on their relevance and potential impact on the dependent variable. It is important to consider the research question and existing theories when selecting variables for the model. Additionally, statistical tests such as the F-test and t-tests can be used to determine the significance of the variables in the model.

3. What are some common mistakes to avoid in a simple least squares regression problem?

One common mistake is assuming a causal relationship between variables when there is only a correlation. It is also important to check for multicollinearity, which occurs when independent variables are highly correlated with each other. It can also be helpful to assess the residuals of the model to ensure they are normally distributed and there are no patterns or outliers.

4. How do I evaluate the performance of my simple least squares regression model?

The most common way to evaluate the performance of a simple least squares regression model is to look at the coefficient of determination (R-squared). This measures the proportion of the variation in the dependent variable that is explained by the independent variables in the model. Other metrics such as the root mean squared error (RMSE) and mean absolute error (MAE) can also be used to assess the accuracy of the model.

5. What should I do if my simple least squares regression model does not meet the assumptions?

If your model does not meet the assumptions of a simple least squares regression, it may be necessary to use alternative modeling techniques such as non-linear regression or generalized linear models. It is also important to critically evaluate the data and consider if there are any outliers or influential points that may be affecting the results. Seeking the advice of a statistician or consulting with colleagues can also be helpful in finding a suitable solution.

Similar threads

  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Replies
2
Views
3K
  • Introductory Physics Homework Help
Replies
2
Views
1K
  • STEM Academic Advising
Replies
10
Views
4K
Back
Top