Which method should I use for a linear fit of data with error on both z and y?

  • I
  • Thread starter BillKet
  • Start date
  • Tags
    Fit Linear
In summary, a standard linear fit should be fine. Leaving out intercept terms and lower order terms can introduce bias.
  • #1
BillKet
313
29
Hello! I have some data of the form (x,y,z) which I know it is described by a function of the form: ##z=y(a+bx)##, where a and b are parameters to be fitted for. z and y have error associated to them while x doesn't (x is actually an integer going from 0 to 3 for each value of y). I tried to do the fit in 2 different ways. I first made a linear fit of the form ##z=yA## for A (I used this package which accounts for the error on both z and y: https://docs.scipy.org/doc/scipy/reference/odr.html) for each value of x, then I made a fit of the form ##A=a+bx## for a and b, with the error on A obtained from the first fit. In the end I get a value and error for a and b. A second method I used was to fit directly ##z=y(a+bx)## to the whole data at once (it is not really a linear fit anymore, but it can be easily done in Python, with the same package as mentioned above). Now I get a new set of values and errors for a and b. The values obtained using the 2 methods are consistent with each other (within the errors on a and b), but using the first method gives a smaller error than in the second method. Is there anything I am missing? Shouldn't I get the exactly same result both ways? And in case the answer is no, which method should I use and why? Thank you!
 
Physics news on Phys.org
  • #2
Can you provide some context here? What is the data you’re fitting? Where does it come from?

Knowing that, we might find that certain fields of research prefer certain methods to be used over other methods.
 
  • #3
BillKet said:
Shouldn't I get the exactly same result both ways?
No, definitely not. If different methods always gave exactly the same result then there would be no point in having different methods at all.

BillKet said:
And in case the answer is no, which method should I use and why?
The errors in y, are they large or can they be neglected?
 
  • Like
Likes jedishrfu
  • #4
jedishrfu said:
Can you provide some context here? What is the data you’re fitting? Where does it come from?

Knowing that, we might find that certain fields of research prefer certain methods to be used over other methods.
The data is from a molecular spectroscopy experiment. For people working in the field, this is similar to a King plot fit, but for molecular terms (when the field shift is important). z corresponds to a frequency shift between different molecules, y is the change in radius of one of the atoms of the molecules between different molecules and x is the frequency level that is being tested.
 
  • #5
Dale said:
No, definitely not. If different methods always gave exactly the same result then there would be no point in having different methods at all.

The errors in y, are they large or can they be neglected?
Thank you for your reply! To be honest I wasn't even sure if they can count as different, I assumed they are the same method, but it one case it do it in 2 steps while in the other in one step only.

The errors on y are a lot smaller than the errors on z. From what I've seen ignoring them doesn't produce a big difference. The errors on z contain also systematic uncertainties and the statistics for them are a lot lower, so the error is quite big.
 
  • #6
BillKet said:
The errors on y are a lot smaller than the errors on z.
Then doing a standard least squares fit should be fine. Stepwise first are always a little sketchy, so I would avoid it. The smaller error is most likely producing a larger bias.

I would probably fit to the following model ##z= ay + bx + cxy + d## with a standard linear model. In R this model would be written
Code:
z~x*y
where the inclusion of the other terms is so standard that they are simply assumed. Leaving out intercept terms and lower order terms can introduce bias. This model will give you the best unbiased linear estimator.
 
  • #7
Dale said:
Then doing a standard least squares fit should be fine. Stepwise first are always a little sketchy, so I would avoid it. The smaller error is most likely producing a larger bias.

I would probably fit to the following model ##z= ay + bx + cxy + d## with a standard linear model. In R this model would be written
Code:
z~x*y
where the inclusion of the other terms is so standard that they are simply assumed. Leaving out intercept terms and lower order terms can introduce bias. This model will give you the best unbiased linear estimator.
Oh I see! So if the fit is good b and d should be consistent with zero, right? Thanks a lot! Could you please explain to me a bit more why doing it in 2 steps gives me a different error (it is actually ~3 times smaller)?
 
  • #8
BillKet said:
Oh I see! So if the fit is good b and d should be consistent with zero, right? Thanks a lot! Could you please explain to me a bit more why doing it in 2 steps gives me a different error (it is actually ~3 times smaller)?
I am surprised that it is that much different. Without the data I can’t really tell. There might be some substantial covariance or multicolinearity that is constrained away in the stepwise approach.
 
  • #9
Dale said:
I am surprised that it is that much different. Without the data I can’t really tell. There might be some substantial covariance or multicolinearity that is constrained away in the stepwise approach.
Please find the data I am using below. The errors are combined statistical and systematic. They come from different experiments (hence the different range of errors). Just to give a bit more details, the function I actually need to fit is this ##z=y(a+b(x+0.5)/4.186)## (just a redefinition of a and b for completeness). Each sub-array of z corresponds to a value of x. For example the second entry of z should be written as: ##0.176=-0.216(a+b(0+0.5)/4.186)## Please let me know if I can provide further details.

$$y = [-0.312, -0.216, -0.080, 0. , 0.210 ]$$
$$y_{err}=[0.015, 0.010, 0.004, 0.00001,0.01]$$
$$x=[0,1,2,3]$$
$$z = [[ 0.268, 0.176, 0.117 , -0. , -0.184],
[ 0.277, 0.177, 0.100, -0. , -0.179]
[ 0.274, 0.178, 0.121, -0. , -0.250]
[ 0.298, 0.063, 0.001, -0. , -0.374 ]]$$
$$z_{err}=[[0.008, 0.015, 0.028, 0.008, 0.021],
[0.005, 0.013 , 0.018, 0.004, 0.012],
[0.014, 0.016, 0.053, 0.016, 0.042],
[0.059, 0.088, 0.163, 0.055, 0.151]]$$
 

FAQ: Which method should I use for a linear fit of data with error on both z and y?

What is a linear fit?

A linear fit is a statistical method used to analyze the relationship between two variables. It involves finding the line of best fit that represents the relationship between the variables in a linear fashion.

Why is it important to do a linear fit?

A linear fit can help us understand the relationship between two variables and make predictions based on this relationship. It is also used to identify trends and patterns in data, which can be useful in making informed decisions.

How do you determine the "right" way to do a linear fit?

The right way to do a linear fit depends on the specific data and the goals of the analysis. Generally, it involves selecting the appropriate type of linear model, checking for assumptions, and interpreting the results accurately.

What are some common mistakes when doing a linear fit?

Some common mistakes when doing a linear fit include using the wrong type of linear model, not checking for assumptions, and misinterpreting the results. It is also important to avoid overfitting the data, which can lead to inaccurate predictions.

Can a linear fit be used for non-linear relationships?

No, a linear fit is only appropriate for analyzing linear relationships between two variables. For non-linear relationships, other types of regression models, such as polynomial or exponential regression, may be more suitable.

Similar threads

Replies
4
Views
1K
Replies
3
Views
1K
Replies
8
Views
2K
Replies
6
Views
3K
Replies
28
Views
2K
Replies
26
Views
2K
Replies
3
Views
2K
Replies
1
Views
1K
Back
Top