Linear vs. Cubic Regression: Training and Test RSS Comparison

  • Thread starter brojesus111
  • Start date
  • Tags
    Regression
In summary: Wait, I think I got it. y-hat_i = x_i [sum{f=1,n} x_f y_f]/[sum{g=1,n} x_g ^2]y-hat_i = [sum{f=1,n} x_i x_f y_f]/[sum{g=1,n} x_g ^2]y-hat_i = [sum{f=1,n} f(x_i,x_f) y_f]
  • #1
brojesus111
39
0

Homework Statement



I collect a set of data (n = 100 observations) containing a single predictor and a quantitative response. I then fit a linear regression model to the data, as well as a separate cubic regression.

1) Suppose that the true relationship between X and Y is linear. Consider the training residual sum of squares (RSS) for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.

2) Answer the above using test rather than training RSS.

3) Suppose that the true relationship between X and Y is not linear, but we don’t know how far it is from linear. Consider the training RSS for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.

4) Answer the above with test rather than training RSS.

5)

UDuO7r6.png


Homework Equations



The Attempt at a Solution



Attempt at 1: Not enough information since the training data could be wobbly, which in that case despite the true linear relationship, the cubic might fit better. But the training data could also be fairly linear, so the linear would be better and the cubic too wobbly.

Attempt at 2: In this case, the linear will be better since we are using the test RSS and if it is truly linear, then the linear regression should give a lower RSS since the fit will be better than the cubic.

Attempt at 3: Chances are that the cubic regression will provide the lower RSS. The linear will not provide a good fit for the non-linear relationship and even if the training data is less or more non-linear than the cubic regression, the cubic should provide the lower RSS since it should provide the better fit since it has more coefficients than the linear regression.

Attempt at 4: The cubic regression should give a lower RSS since it is not linear and our true relationship is not linear.

Attempt at 5: I'm assuming I have to solve for y_i from the ß^hat equation and then figure out what the a_i means given that, but I'm stuck on how to solve for y_i.

Any tips, help, corrections, etc. would be great.
 
Physics news on Phys.org
  • #2
Try again. Hint: A linear relation ax+b is a special case of a cubic, represented as a cubic as 0x3+0x2+ax+b.
 
  • Like
Likes 1 person
  • #3
D H said:
Try again. Hint: A linear relation ax+b is a special case of a cubic, represented as a cubic as 0x3+0x2+ax+b.

Ok, so this is my revised answer:

1) The cubic will give the lower RSS since adding another variable to a least square equation must allow us to fit the training data better.

2) I think my answer to this one is right. The RSS for the linear should still provide the lower RSS since the cubic would be too wobbly.

3) Same reasoning as #1. Since we have more variables, we should be able to fit the training observations better.

4) I'm not completely sure on this one, but I think the answer is cubic since the true linear relationship is non-linear.

5) Still stuck.
 
  • #4
Ok, so I have this for 5 now.

I replaced the beta in the original y-hat equation to get:

y-hat_i = x_i [sum{i=1,n} x_i y_i]/[sum{i'=1,n} x_i' ^2]
y-hat_i = sum{i'=1,n} f(x_i,x_i') y_i

So this means that a_i' is f(x_i, x_i').

Is this correct? The only thing that looks wrong is that I end up with y_i instead of y_i' in the end.
 
Last edited:
  • #5
Realized my answer for #4 is wrong. We don't have enough information for that one.
 
  • #6
With your last post, everything looks good except for your work on #5.

Regarding #2, I'm not thrilled with the term "wobbly". It's a bit too, well, wobbly. If the relationship truly is linear, the linear part of the cubic will fit the signal. The quadratic and cubic terms will fit whatever is left after you remove the signal.
Hint: What's left after you remove the signal?

Another issue (again on #2): Suppose some of your test data lies outside the domain of the training data (i.e., you're extrapolating rather than interpolating). What happens to that cubic expression when applied outside the domain of the training data?
 
  • #7
D H said:
With your last post, everything looks good except for your work on #5.

Regarding #2, I'm not thrilled with the term "wobbly". It's a bit too, well, wobbly. If the relationship truly is linear, the linear part of the cubic will fit the signal. The quadratic and cubic terms will fit whatever is left after you remove the signal.
Hint: What's left after you remove the signal?

Another issue (again on #2): Suppose some of your test data lies outside the domain of the training data (i.e., you're extrapolating rather than interpolating). What happens to that cubic expression when applied outside the domain of the training data?

For #2, can I just say since that we explicitly know that the true relationship is linear, the linear model should minimize the test MSE?

For #5, can I do this?

I will give the summations their own variables.

y-hat_i = x_i beta
beta = [sum{f=1,n} x_f y_f ]/[sum{g=1,n} x_g ^2]
y-hat_i = x_i [sum{f=1,n} x_f y_f]/[sum{g=1,n} x_g ^2]

Since the summation of x_k is just a constant, we can rewrite it as:

y-hat_i = sum{f=1,n} f(x_i,x_f) y_f
Then substitute f=i' and we get our answer.
 
  • #8
What is your f(x_i,x_f)? You haven't defined it.
 
  • #9
D H said:
What is your f(x_i,x_f)? You haven't defined it.

It's x_i [sum{f=1,n} x_f]/[sum{g=1,n} x_g ^2]
 
  • #10
OK! That looks good.
 
  • #11
D H said:
OK! That looks good.

Awesome, thanks for the help!
 

Related to Linear vs. Cubic Regression: Training and Test RSS Comparison

1. What is linear/cubic regression?

Linear/cubic regression is a statistical method used to analyze the relationship between two continuous variables. It involves fitting a line/curve to a set of data points in order to make predictions about the relationship between the variables.

2. How is linear/cubic regression different from other types of regression?

Linear regression assumes that the relationship between the variables is linear, meaning that the data points will fall along a straight line. Cubic regression, on the other hand, allows for a curved relationship between the variables. This can provide a better fit for some datasets.

3. What is the purpose of performing a linear/cubic regression?

The purpose of linear/cubic regression is to identify and quantify the relationship between two variables. This can help in making predictions about future data points, identifying trends or patterns in the data, and determining the strength of the relationship between the variables.

4. What is the difference between simple and multiple linear/cubic regression?

In simple linear/cubic regression, there is only one independent variable, while in multiple regression, there are two or more independent variables. This allows for a more complex analysis of the relationship between the variables, but also requires a larger dataset and more advanced statistical techniques.

5. How is the accuracy of a linear/cubic regression model measured?

The accuracy of a linear/cubic regression model is typically measured by the coefficient of determination (R-squared). This is a value between 0 and 1 that represents the proportion of the variation in the dependent variable that can be explained by the independent variable(s). The higher the R-squared value, the better the model fits the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
552
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
996
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Aerospace Engineering
2
Replies
35
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Introductory Physics Homework Help
Replies
1
Views
2K
Back
Top