Would including the true/known y-intercept in my dataset be "overfitting"?

  • I
  • Thread starter fahraynk
  • Start date
In summary, if you know the true intercept for your data but still include it in the fit, it can lead to overfitting. To avoid this, you can adjust your data by subtracting the known intercept from all Y values, fitting a regression model with no constant term, and then adding the intercept back into the model. Alternatively, you can replicate the data point or directly sample the point to make it a real data point instead of a fake one. However, this approach can be questionable and there are statistical problems associated with it. It may be better to use methods such as restricted regression to handle coefficient constraints.
  • #36
Dale said:
That can be done, but it has to be a very explicit and convincing argument. Personally, when I see a statistical model done without an intercept I am instantly highly suspicious. The burden of proof is on the scientist much more stringently, and frankly in some infamous papers where I have seen this done it was done poorly and rendered the conclusion completely unbelievable.

So overall, my opinion is opposite yours. I would rather use a standard and reasonable statistical process and look at the fitted intercept as a check on the quality of the data and the model.
I think that I have to concede that your approach is the wiser approach. Forcing the Y-intercept to a particular value would only be appropriate if there is indisputible consensus based on science.
 
Physics news on Phys.org
  • #37
The problem is stated mathematically as:
ε2=Σ(m xi+b-yi)2
b=7
Minimizing the mean square error (ε2)
2/dm = 0 = Σ(m xi+b-yi) xi
Solve above for m and that’s your fitting equation, y=mx+7. m will be expressed entirely in terms of known quantities with a guaranteed intercept of 7.
 

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
585
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • STEM Educators and Teaching
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
Back
Top