Error on non-linearity in a linear fit

In summary, before fitting a function to data, you should determine if the data is normally distributed and determine what type of function to fit. If the data is not normally distributed, then the error estimates are not very accurate and the least-squares method is not useful.
  • #1
Malamala
313
27
Hello! If I have some data points, with error bars on both x and y, and I would like to fit them with a function f(x). How can I write the chi-squared in this case? For errors only on y, I would have ##\chi^2 = \sum_i(\frac{f(x)-y}{\sigma_y})^2##, but I am not sure how to include ##\sigma_x##. Thank you!
 
Physics news on Phys.org
  • #2
Hello again,

I see you haven't replied to help given earlier, so after a year of fruitless waiting, I'm not sure I want to spend any serious time on this.
The same questions apply now as well. Funny, isn't it ?

BvU said:
Did you make a plot ?
Tell us how the data were obtained and what they represent. Especially the ##y_{err}##.
And how can you obtain such unbelievably accurate estimates of ##y_{err}##. Billions of observations, or just mindless copying calculation results ?
Are you aware of the role of systematic errors ?

I'd like to know if you understood the help given in earlier threads, or just lost interest and never reacted any more. I don't mind repeating things, but it should have a reasonable purpose.

Show your data.

There is no simple mechanism for what you want. You could fold the ##\sigma_x## into the ##\sigma_y## using ##f'(x)##.

##\ ##
 
  • #3
BvU said:
Hello again,

I see you haven't replied to help given earlier, so after a year of fruitless waiting, I'm not sure I want to spend any serious time on this.
The same questions apply now as well. Funny, isn't it ?
Show your data.

There is no simple mechanism for what you want. You could fold the ##\sigma_x## into the ##\sigma_y## using ##f'(x)##.

##\ ##
I apologize for the previous post. Honestly I got confused a bit about the answers, but I realized that the problem I was trying to solve was easier than I thought, so I didn't need all that. Now I am pretty sure I do. Here is a paper similar to what I need. Figure 2 is what I want to fit to (for some reason they show the error bars diagonal, but you can assume they are both on x and y directions). What I want to do is fit this with something of the form ##y=ax+b+f(x)##, where for ##f(x)## I can try different things depending on the physics model I test. In the end I am interested in what is the error on this ##f(x)##.
 
  • #4
Malamala said:
I apologize for the previous post.
Thank you. I'm sure it wasn't intentional and it often happens difficult questions get a difficult answer before we discover that in fact a much simpler quesion was meant all along.

Malamala said:
Figure 2 is what I want to fit to
1623585971239.png

That's not your data, that's their data. They have the orthogonal distance available in something that looks like a parity plot. Are you convinced that situation is the same in your data ? Can you show ?

I don't have access to thehttps://epubs.siam.org/doi/pdf/10.1137/0908085, but it seems a bit more general.

Before embarking on an expedition, I would convince myself that the ordinary least squares (OLS) approach, where all errors are attributed to the dependent variable, is absolutely unusable:
  • systematic errors do not belong in the error bars -- all errors must be uncorrelated
  • compare $${\sum (y_i - <y>)^2 \over \sum {\sigma_{y_i}}^2 }\qquad \text {and} \qquad {\sum (x_i - <x>)^2 \over \sum {\sigma_{x_i}}^2}$$are they really approximately the same ?
  • outliers and/or observations with really small errrors quickly ruin results
  • Does the OLS result really look nonsensical ?
  • If so, does it help to fold in the errors in the independent variable as I mentioned in #2 ? I.e. use ##{\sigma'_{y_i}}^2 = {\sigma_{y_i}}^2 + \Bigl (f'(x_i) \,\sigma_{x_i}\Bigr ) ^2 \ ##
[edit] depending on magnitude of ##f'## wrt magnitude of a -- use a, f' or even a+f'​
Malamala said:
What I want to do is fit this with something of the form ##y=ax+b+f(x)##, where for ##f(x)## I can try different things depending on the physics model I test. In the end I am interested in what is the error on this ##f(x)##.

In simple LSQ your ##f(x)## in ##y=ax+b+f(x)## is a Gaussian with average zero and a variance related to the estimated errors, so what you are basically trying to do is extract higher orders of ##f## from the noise :rolleyes: -- correct me if I am wrong. (The 0th and 1st terms of a Taylor series are in a and b)

Unless of course your data is completeley different (and y is far from linear), as when we try to subtract background (linear or quadratic) from an observed peak in a spectrum. Then the signal/noise ratio determines the accuracy of the background estimate. Different game.

If ##f## has a few parameters too, you will need a whole lot of accurate data to do sensibe statistics ...

If your data aren't really normally distributed the error estimates aren't worth much, nor is the least-squares method ...

If this is serious, I recommend to run Monte Carlo simulations on simulated data to establish the effects of the various analysis methods.

##\ ##
 
  • Like
  • Informative
Likes Dale and Twigg
  • #5
BvU said:
Thank you. I'm sure it wasn't intentional and it often happens difficult questions get a difficult answer before we discover that in fact a much simpler quesion was meant all along.
That's not your data, that's their data. They have the orthogonal distance available in something that looks like a parity plot. Are you convinced that situation is the same in your data ? Can you show ?

I don't have access to thehttps://epubs.siam.org/doi/pdf/10.1137/0908085, but it seems a bit more general.

Before embarking on an expedition, I would convince myself that the ordinary least squares (OLS) approach, where all errors are attributed to the dependent variable, is absolutely unusable:
  • systematic errors do not belong in the error bars -- all errors must be uncorrelated
  • compare $${\sum (y_i - <y>)^2 \over \sum {\sigma_{y_i}}^2 }\qquad \text {and} \qquad {\sum (x_i - <x>)^2 \over \sum {\sigma_{x_i}}^2}$$are they really approximately the same ?
  • outliers and/or observations with really small errrors quickly ruin results
  • Does the OLS result really look nonsensical ?
  • If so, does it help to fold in the errors in the independent variable as I mentioned in #2 ? I.e. use ##{\sigma'_{y_i}}^2 = {\sigma_{y_i}}^2 + \Bigl (f'(x_i) \,\sigma_{x_i}\Bigr ) ^2 \ ##
[edit] depending on magnitude of ##f'## wrt magnitude of a -- use a, f' or even a+f'​

In simple LSQ your ##f(x)## in ##y=ax+b+f(x)## is a Gaussian with average zero and a variance related to the estimated errors, so what you are basically trying to do is extract higher orders of ##f## from the noise :rolleyes: -- correct me if I am wrong. (The 0th and 1st terms of a Taylor series are in a and b)

Unless of course your data is completeley different (and y is far from linear), as when we try to subtract background (linear or quadratic) from an observed peak in a spectrum. Then the signal/noise ratio determines the accuracy of the background estimate. Different game.

If ##f## has a few parameters too, you will need a whole lot of accurate data to do sensibe statistics ...

If your data aren't really normally distributed the error estimates aren't worth much, nor is the least-squares method ...

If this is serious, I recommend to run Monte Carlo simulations on simulated data to establish the effects of the various analysis methods.

##\ ##
Here are some of my data points: y = [-508.89,531.11,1190.36,1888.80], error_y = [0.09,0.09,0.49,0.11], x = [-954.76, 1000.28, 2286.75, 3655.38], error_x = [0.11,0.12,0.39,0.20]. The errors are only statistical. As I said, I want to fit this with something of the form y=ax+b+f(x). Physically, f(x) contains new physics. Given the big error on the values I have, f(x) will (most probably) be consistent with zero. What I need to do is to find a 95% exclusion interval for f(x), something like f(x)<10−10 at 95% confidence level. This is what they do in Figure 3 in that paper. Basically this is what is done in literature (I can send you several other papers if it is useful). We measure these 2 values on the x and y, and try to set limits on f(x). The hope is that by reducing the errors on x and y at a point we would be able to actually see a deviation from linearity and set an actual value, not just a bound on f(x). Here is a paper which actually claims a 3σ deviation from linearity in exactly the same type of plot.
 
  • #6
BvU said:
That's not your data, that's their data. They have the orthogonal distance available in something that looks like a parity plot. Are you convinced that situation is the same in your data ? Can you show ?
Malamala said:
Here are some of my data points: y = [-508.89,531.11,1190.36,1888.80], error_y = [0.09,0.09,0.49,0.11], x = [-954.76, 1000.28, 2286.75, 3655.38], error_x = [0.11,0.12,0.39,0.20].
OP, we appreciate your transparency and this will certainly help us help you, but I just want to say you always have the right to say "no" when sharing data. If this data is unpublished, I encourage you to be a little more protective as it represents years of work for your entire group. No judgement here, just friendly advice! If this data was published and you are just doing a re-analysis, then please disregard this comment. Just trying to look out for you! :smile:

There was actually a thread recently about this very subject. The OP was looking for an expression for King non-linearity (and the associated propagated error) on a King plot with 4 or more points. The same paper by Solaro et al was cited. Not exactly the same thing, but I encourage you to skim through starting on page 2 (the first page was a bunch of misunderstandings about what was being asked).

I'm not an expert on linear regression with measurement error (that's the fancy name for error on the x-axis), but it sounds like @BvU can help you. I also suspect @Dale is someone who could help you. Once you have a method for doing linear regression with measurement error, you can apply this to a non-linear method like Levenburg-Marquardt regression which use linear regression in their algorithm.

Of course, the fool-proof method is to contact Ian Counts or Cyrille Solaro and ask them directly how they handled the error propagation. The paper writing process for precision measurement can be grueling, and they have probably spent ~100 hours thinking about this. If you are doing research in this field, getting to know them won't hurt!
 
  • Like
Likes BvU and Dale
  • #7
Twigg said:
OP, we appreciate your transparency and this will certainly help us help you, but I just want to say you always have the right to say "no" when sharing data. If this data is unpublished, I encourage you to be a little more protective as it represents years of work for your entire group. No judgement here, just friendly advice! If this data was published and you are just doing a re-analysis, then please disregard this comment. Just trying to look out for you! :smile:

There was actually a thread recently about this very subject. The OP was looking for an expression for King non-linearity (and the associated propagated error) on a King plot with 4 or more points. The same paper by Solaro et al was cited. Not exactly the same thing, but I encourage you to skim through starting on page 2 (the first page was a bunch of misunderstandings about what was being asked).

I'm not an expert on linear regression with measurement error (that's the fancy name for error on the x-axis), but it sounds like @BvU can help you. I also suspect @Dale is someone who could help you. Once you have a method for doing linear regression with measurement error, you can apply this to a non-linear method like Levenburg-Marquardt regression which use linear regression in their algorithm.

Of course, the fool-proof method is to contact Ian Counts or Cyrille Solaro and ask them directly how they handled the error propagation. The paper writing process for precision measurement can be grueling, and they have probably spent ~100 hours thinking about this. If you are doing research in this field, getting to know them won't hurt!
Thank you for pointing me towards that thread (and all the other info), I will take a look into it. About data, well they kept insisting on the ACTUAL data that I use, even if it is exactly the same as in the paper I referenced, for the purpose of my question. But it's fine, it is out of context (and not all the measured points) so I assume it's not usable in this form. But than you for advice!
 
  • Like
Likes Twigg
  • #8
Malamala said:
Thank you for pointing me towards that thread (and all the other info), I will take a look into it. About data, well they kept insisting on the ACTUAL data that I use, even if it is exactly the same as in the paper I referenced, for the purpose of my question. But it's fine, it is out of context (and not all the measured points) so I assume it's not usable in this form. But than you for advice!

My apologies for insisting... but for me it did help. Certainly in combination with the context of the links in #3 and later ones in #6 and the thread. @Twigg is too kind in

Twigg said:
I'm not an expert on linear regression with measurement error (that's the fancy name for error on the x-axis), but it sounds like @BvU can help you. I also suspect @Dale is someone who could help you. Once you have a method for doing linear regression with measurement error, you can apply this to a non-linear method like Levenburg-Marquardt regression which use linear regression in their algorithm.
but I've gradually come to an "impression" :smile: that this has little to do with the usual LSQ and LSQ error handling: you have a large number of observations and group results to end up with four points, each with a ##\sigma_x## and a ##\sigma_y##. And here's me in #4, blabbing on about OLS and ODR and folding in dependent variable errors etcetera.

BvU said:
In simple LSQ your ##f(x)## in ##y=ax+b+f(x)## is a Gaussian with average zero and a variance related to the estimated errors, so what you are basically trying to do is extract higher orders of ##f## from the noise :rolleyes: -- correct me if I am wrong. (The 0th and 1st terms of a Taylor series are in a and b)

I can put in a plea that the drive was to help and answer the questions as good and quickly as possible. Something of the golden hammer phenomenon is seeping through...

Where in fact the best advice may well have been in the loose comments near the end:

BvU said:
Unless of course your data is completeley different (and y is far from linear), as when we try to subtract background (linear or quadratic) from an observed peak in a spectrum. Then the signal/noise ratio determines the accuracy of the background estimate. Different game.

If ##f## has a few parameters too, you will need a whole lot of accurate data to do sensibe statistics ...

If your data aren't really normally distributed the error estimates aren't worth much, nor is the least-squares method ...

If this is serious, I recommend to run Monte Carlo simulations on simulated data to establish the effects of the various analysis methods.

##\ ##

So, after a good night's sleep and fresh coffee I propose to consider myself completely unqualified to help with error analysis in the context of King plots. I never even knew they existed or what they represent. @Twigg , @Dale and others are light-years ahead of me.

But as a curious physicist I can't stop myself from commenting and asking further questions :wink:

You have a large set of N observations and a model ##y=ax_i+b+f(x_i)## with ##N >> 10##. You use two degrees of freedom to extract and subtract ##a## and ##b## (Kirchner (3) and (9) -- with ##a## and ##b## switched o0)) and are left with something that looks like

1623663539697.png
but with a small cloud of points at each of the four locations (instead of four single points with puny vertical and horizontal error bars, the latter of which I can't even draw with my old excel -- and if I could they would be invisibly small anyway).

I did unweighted regression on four points ##y-<y>## vs ##x-<x>## and got 0.520 ##\pm## 0.004 as slope and 0 ##\pm## 7 as intercept -- you can do slightly better with N points. Thanks to the subtraction those errors are now uncorrelated.​
You have a model ##f(x)## which I suspect is discrete in both ##x## and ##y##. (I can barely read the isotope shift papers, let alone understand what's going on :cry: ). And you have N-2 degrees of freedom to fit features of ##f##.

At this point I'm stuck for I don't know if you can simulate the measurements by calculating a ##y## for every ##x_i## or not. If you can, the thing to minimize is ##\sum (y_i-y_{i,\text{calc}})^2## (if necessary weighted). But Murphy makes it likely you can't -- and then I don't know what to do with ## x_i-x_{i,\text{calc}} ##.

Monte Carlo is what comes to mind -- but I'd better shut up :nb) until I know what I'm talking about !

Impressed by your progress, Best of luck and let us know !

##\ ##
 
  • Like
Likes Twigg
  • #9
I was thinking about this more over night, and it dawned on me what the folks in the https://epubs.siam.org/doi/pdf/10.1137/0908085 are doing (I think!). I also don't have access, so this is a guess based on the abstract.

In the Solaro paper, there's a quote that stuck out at me (and they say this very emphatically):
We emphasize that ##\delta \nu_{732}^{A,40}## is deduced from measurements of ##\delta \nu_{729}^{A,40}## and ##\delta \nu_{DSIS}^{A,40}##, and that ##\delta \nu_{729}^{A,40} \gg \delta \nu_{DSIS}^{A,40}##. Consequently, the measurement uncertainties on ##\delta \nu_{729}^{A,40}## and ##\delta \nu_{DSIS}^{A,40}## translate into errors bars essentially parallel and perpendicular to the fitted line, illustrating that the analysis is limited nearly exclusively by the achieved accuracy on ##\delta \nu_{DSIS}^{A,40}##.
This makes their error analysis very unique. Not only do you have measurement errors, but the error on the y-axis is partially correlated with the error on the x axis. This means you can take all the cookbook rules for linear regression and chuck them right out the window.

What I believe Solaro et al. do (and this is a guess) is they have a non-linear fitting algorithm. I couldn't tell you exactly what the algorithm is, but I can tell you one algorithm that will do the job (perhaps not efficiently, but it gets the job done). This algorithm would be identical to Levenburg-Marquardt but with the linear regression replaced by Deming regression with ##\delta = 1## (aka orthogonal distance regression or ODR). They probably entered only the uncertainty ##\delta \nu_{DSIS}^{A,40}## into the algorithm, leaving out the uncertainty on ##\delta \nu_{729}^{A,40}## entirely. This is because, as discussed above, ##\delta \nu_{DSIS}^{A,40}## represents the error orthogonal to the line. I'm guessing the "weighting" that Solaro et al mentioned was inverse variance of ##\delta \nu_{DSIS}^{A,40}##.

The problem is, looking at your data's error bars, sometimes ##\sigma_x > \sigma_y## and sometimes ##\sigma_x < \sigma_y##, so clearly you measured these isotope shifts independently. If you had measured the shift between excited states like in the Solaro paper, we would expect to see something of the form ##\sigma_y = \sqrt{\sigma_x^2 + \sigma_{exc}^2}## where ##\sigma_{exc}## is the uncertainty on the isotope shift between excited states (##\delta \nu_{DSIS}^{A,40}## in Solaro et al). Since we see ##\sigma_y < \sigma_x## for some of your data, this cannot be the case.

However, I notice that for your data, ##\sigma_x \approx \sigma_y## at least within a factor of 2 or 3. So I believe you can still use nonlinear ODR but you will need to calculate the error perpendicular to the fitting line differently. I'm still pondering the right way to do that.
 
Last edited:

FAQ: Error on non-linearity in a linear fit

What is non-linearity in a linear fit?

Non-linearity in a linear fit refers to a situation where the relationship between the independent and dependent variables is not linear. In other words, the data does not follow a straight line when plotted on a graph.

Why is non-linearity a problem in linear regression?

Non-linearity can be a problem in linear regression because it violates one of the key assumptions of the model, which is that the relationship between the variables is linear. This can lead to inaccurate predictions and biased estimates of the regression coefficients.

How can I detect non-linearity in my data?

There are a few ways to detect non-linearity in your data. One way is to visually inspect the scatter plot of the data and see if it follows a straight line. Another way is to use statistical tests, such as the Breusch-Pagan test, to check for non-linearity.

How can I address non-linearity in my linear regression model?

There are a few ways to address non-linearity in a linear regression model. One option is to transform the data using a mathematical function, such as logarithmic or exponential transformation, to make it more linear. Another option is to use a non-linear regression model, such as polynomial regression, to better fit the data.

Is it always necessary to address non-linearity in a linear regression model?

It is not always necessary to address non-linearity in a linear regression model. If the non-linearity is minor and does not significantly affect the results, it may be acceptable to leave it as is. However, if the non-linearity is significant, it is important to address it in order to obtain accurate and reliable results.

Similar threads

Replies
1
Views
879
Replies
8
Views
6K
Replies
28
Views
2K
Replies
26
Views
2K
Replies
3
Views
2K
Replies
3
Views
2K
Replies
30
Views
3K
Back
Top