Uncertainty Propagation for the Slope of a Line of Best Fit

In summary, the conversation discusses the process of calculating the uncertainty in the gradient of a line of best fit for a set of data points with varying error bars. The suggested method is to use regression analysis, specifically the standard error of the parameter estimates formula, with weighted factors based on the precision of each data point. It is also mentioned that the method may need to be adjusted to cater for horizontal error bars or varying error distributions. The conversation concludes with the suggestion of using Monte Carlo or Bayesian analysis to find the distribution of the gradient.
  • #1
Kyle91
35
0
Hi guys,

So I'm writing up a physics lab and I have a bunch of data points. All of these data points have both x and y error bars. The relationship between x and y is linear and so I've made a line of best fit using Python passing through the data.

Now the slope of that line of best fit has physical significance and I need to know its value. Of course I know the gradient of this line but what I don't have is the uncertainty in this gradient - which I need.

How can I go about calculating it?

Thanks!
 
Physics news on Phys.org
  • #2
This section should help if you can follow it:

http://en.wikipedia.org/wiki/Regression_analysis#Linear_regression

At the bottom it gives you a pair of formulas for the standard error of the parameter estimates:

05984171d0322fd1234560d72d01a764.png

dfcb8056ad409e7c473d7e63d9fa1112.png


This works if the errors in the data are uniform. For your work it might be more complicated than that.
 
  • #3
Hey thanks for the link! I had a read through but I'm still not fully understanding it. Could you please run me through an example? Let's say I've three data points -

(0 +-0.1, 8 +-0.5), (1 +-0.2, 10 +-0.4), (2+-0.3, 12+-0.6)

With my line of best fit being y = 2x + 8.

Edit: Oh so I only need the second equation you've listed because I only care about the gradient right? Even so I'm still not 100% on how to use it.
 
  • #4
So I thought I just figured it out. I used the above data points but no where in that formula did it ask for the error in my points. I think that formula is solely used to find the error assuming the data is 'almost linear'. For example if my data points were (0, 8.1), (1, 10.1), (2, 12). There isn't a linear relationship there but this formula would tell you the 'error' in the slope.

This isn't the same as my scenario as I want to factor in my error bars into my formula. Maybe I should break all of my points up into (x-max, y-max), (x-min, y-max), (x-min, y-min), (x-max, y-min) and then sub them into that formula?
 
  • #5
You may need to change your variables so that a straight line relationship 'should' apply. Plotting X against y squared or log y or something ( the theory of your particular experiment would tell you what to do) and then I think you could do the analysis on the resulting set of data OK.
This is the same trick as you would use if you wanted to do it with a ruler on graph paper.
 
  • #6
I've already got the relationship such that a straight line will apply.

I've made a quick mock up of my actual data - see here http://imgur.com/fbwIb.

So what I need to do is find the error in that light blue line. Of course the obvious and inaccurate way to do it would just be to alter the formula by hand in my code until it leaves the error bars, but I don't want to do this as I feel I should have a numerical method.

Furthermore if I were to find the uncertainty via this method I'd be overlooking the proper way to do uncertainty analysis. For example if I have a line segment that is 1m +-0.2m and another line segment that is 2m +-0.1m and I want to add them together it is not simply 3m +-0.3m. You can't just add the errors, you have to follow this formula http://imgur.com/2EidJ. So the error would be sqrt(0.2^2+0.1^2) or +-0.22m. I worry I'd be overlooking this if I just vary the gradient by hand until it no longer fits the error bars.
 
  • #7
SO you are suggesting here the data has varying error bars, I don't know the answer but I imagine it's somewhere in that long article. My guess is the optimisation condition (minimise sum of squared residuals) must be adjusted to include weighting factors depending on the precision of each data point.

Furthermore I'm not sure if this caters for horizontal error bars. If there's an error in your independent variable then you might need to also change the optimisation condition for this, eg., rather than use residuals in the vertical direction, calculate the minimum distance from the point to the line and square that, then sum it, or calculate the distance from the point to the line along some bearing which depends on the ratio of the horizontal/vertical error bars.

The work has probably been done but I've never studied statistics in this depth. I'd be very keen to know the answer as well.
 
  • #8
The standard formulae for regression analysis assume the error distribution is the same for all datapoints (i.e. as random variables the error terms are i.i.d.)
If you have error bars that vary with the data values (heteroscedasticity, since you ask) then the normal straight line fit formula is not right. You need to bias in favour of matching the more reliable datapoints.
The general method is MLE - maximum likelihood estimation.

First, you need to convert the error bars to actual distributions. You could simply take them at face value and say the distribution is uniform over the range of the bar, but that is not likely to give you a good result. Instead, I would suggest taking each datapoint to have a Gaussian error, with the error bar indicating the standard deviation in each case (1 sigma, 2 sigma, whatever you choose).

Then you need to pick the slope which maximises the overall "likelihood":

Datapoints: (Xi, Yi)
Model: Y = mX+a
Error: Ei = Yi - (mXi+a)
Std devs: Di
Find m, a to minimise: Ʃ(Ei/Di)^2

Turning to your original question, you want to find the distribution of m.
This is a bit tricky because there could also be an error in a.
One approach is Monte Carlo: generate random datasets fitting m, a, Di, apply the above procedure, and see what distributions you get for the computed m and a values.

To do it properly might require Bayesian analysis: pick a priori distributions for m and a, then adjust them based on the data.
 
Last edited:

FAQ: Uncertainty Propagation for the Slope of a Line of Best Fit

What is uncertainty propagation for the slope of a line of best fit?

Uncertainty propagation for the slope of a line of best fit is a statistical analysis method used to determine the uncertainty or error associated with the slope of a line of best fit on a scatter plot. It takes into account the uncertainty in the data points and calculates the uncertainty in the slope.

Why is uncertainty propagation important for the slope of a line of best fit?

Uncertainty propagation is important because it helps us understand the reliability of the slope of a line of best fit. It takes into account the variability in the data and provides a measure of how accurate the slope estimate is.

How is uncertainty propagation for the slope of a line of best fit calculated?

Uncertainty propagation for the slope of a line of best fit is calculated using a formula that takes into account the uncertainty in the data points and the slope calculation. This formula involves the use of derivatives and can be complex, so it is often done using statistical software.

Can uncertainty propagation be applied to any type of data set?

Yes, uncertainty propagation can be applied to any type of data set that has a linear relationship between the variables. This includes data sets with a positive, negative, or no correlation. It is commonly used in fields such as physics, engineering, and economics.

How can uncertainty propagation for the slope of a line of best fit help in decision making?

Uncertainty propagation can help in decision making by providing a range of possible values for the slope of a line of best fit. This range takes into account the uncertainty in the data, giving decision makers a better understanding of the potential variability in their results. It can also help identify any outliers or influential data points that may be affecting the slope estimate.

Similar threads

Back
Top