Error propagation leads to different uncertainties, which do I choose?

TheMercury79 · Jan 1, 2023

I understand how to compute and propagate errors but have trouble with conceptualizing all things put together.

I have performed an experiment to determine a value for some quantity. This quantity depend on two variables. The first one depend in turn on some other quantities as well but I think I have managed to propagate errors correctly.

For the experiment I have made measurements for some ten or so points. The fractional error for the first variable is constant for all these points. The second variable is magnetic flux density B for which there is a constant error of say 5 mT at each point . As B increases, the fractional error in B then decreases and so the final outcome also has decreasing error as B gets higher. and gets lower and lower for each point. The quantity I want to measure then has errors that ranges from 5.5% to 3.5%. How do I choose an uncertainty from this? I'm thinking 5 % because the value that has this error is closest to the average value, but I don't know. The thing is 5 % is kind of a good outcome because then the true value is just within the error range. But 3.5 % is a little disappointing, so is it cheating if I choose among the highest error?

I narrowed it down to 2 questions:

1. How do I choose and uncertainty here? do I pick one from the higher end, like 5% or should I average the errors? that would make it about 4 %. It feels like I'm nit picking but I'm just thinking conceptually.

2. I calculated the standard deviation as well but that narrowed down the error to about 1 % but I feel that is too low. Do we use standard deviation only when we have values without uncertainties?

Just some pointers would be appreciated.

Orodruin · Jan 1, 2023

I think your question would be easier to answer if you told us exactly what you have done rather than speak in abstract terms. Or at least be a bit more specific.

BvU · Jan 1, 2023

Hi,

Please use double # or $ as delimiters for ##\LaTeX##...
And preview before posting ...

Did you mean
$$\bigl (\Delta Q(x,y,z,...)\bigr )^2 = \left (\frac {\partial Q}{\partial x} \Delta x\right )^2+
\left (\frac {\partial Q}{\partial y} \Delta y\right )^2 +\left (\frac {\partial Q}{\partial z} \Delta z\right )^2+ ...\ ?$$

From your rather vague story I gather you have varied ##B## and measured something. What ?
There is some dependence (linear ?) so you must have made a plot. Can you post it ?

TheMercury79 said:

The quantity I want to measure

Did you mean 'determine' as opposed to measure ?

Ther is too much to guess and assume so I refrain from answering until your description is clearer.

##\ ##

haruspex · Jan 2, 2023

TheMercury79 said:

Homework Statement:: Propagation of uncertainties, standard deviation
Relevant Equations:: $$\frac{\delta Q(x,y,z,...)}{Q(x,y,z,...)} = \sqrt{(\frac{\partial Q}{\partial x}\delta x)^2+(\frac{\partial Q}{\partial y}\delta y)^2+(\frac{\partial Q}{\partial z}\delta z)^2+...}$$

1. How do I choose and uncertainty here? do I pick one from the higher end, like 5% or should I average the errors? that would make it about 4 %. It feels like I'm nit picking but I'm just thinking conceptually.

2. I calculated the standard deviation as well but that narrowed down the error to about 1 % but I feel that is too low. Do we use standard deviation only when we have values without uncertainties?

You need to distinguish two things:

the uncertainty in each individual value calculated
the overall uncertainty when they are combined.

You could ignore the first and just compute the second from the standard deviation. But the uncertainty is not actually the standard deviation. That would tend to increase with more datapoints, whereas it should reduce. Instead, use the "standard error of the mean": https://en.wikipedia.org/wiki/Standard_error.

Clearly it would be better to use both sources of information, but I am unaware of any standard procedure for doing so. @Orodruin may well know one.

TheMercury79 · Jan 2, 2023

BvU said:

Hi,

Please use double # or $ as delimiters for ##\LaTeX##...
And preview before posting ...

Did you mean
$$\bigl (\Delta Q(x,y,z,...)\bigr )^2 = \left (\frac {\partial Q}{\partial x} \Delta x\right )^2+
\left (\frac {\partial Q}{\partial y} \Delta y\right )^2 +\left (\frac {\partial Q}{\partial z} \Delta z\right )^2+ ...\ ?$$

From your rather vague story I gather you have varied ##B## and measured something. What ?
There is some dependence (linear ?) so you must have made a plot. Can you post it ?Did you mean 'determine' as opposed to measure ?

Ther is too much to guess and assume so I refrain from answering until your description is clearer.

##\ ##

What is up with your answer?. It is obvious I just forgot an extra '$' so it is not necessary to ask about my equation. Yes I said 'measure', you're right I should have said 'determine', but you still know what I meant. It depends on two quantitites, their fractional errors add in quadrature during progagation, what is unclear about this? Should I average the final errors or choose the highest one? (oh, I probably should have said 'greatest one' here, right?)

BvU · Jan 2, 2023

TheMercury79 said:

it is not necessary to ask about my equation

But it is: the equation is not correct. You can check that by looking at the dimensions.

I did not mean to be rude. Do you ?

##\ ##

haruspex · Jan 2, 2023

haruspex said:

Clearly it would be better to use both sources of information, but I am unaware of any standard procedure for doing so.

Here's a start. You have values ##(y_i)## with a priori variances ##\sigma_i##.
Taking each to be normally distributed about some unknown ideal value ##\hat y##, the marginal probability of the values is ##p((y_i))=(2\pi)^{-n/2}\frac 1{\Pi\sigma_i}e^{-\Sigma{\frac{(y_i-\hat y)^2}{(2\sigma_i)^2}}}##.
To maximise that we would want ##\hat y## to be ##y'=S\Sigma\frac{y_i}{\sigma_i^2}## where ##1/S=\Sigma\frac 1{\sigma_i^2}##. So that is our best estimate for ##\hat y##.
To find the mean error in that, we need to estimate ##E[(y'-\hat y)^2]##.
I got as far as the expression ##S^2\frac{(2\pi)^{-n/2}}{\Pi\sigma_i}\int..\int(\Sigma\frac{y_i-\hat y}{\sigma_i^2})^2e^{-\Sigma{\frac{(y_i-\hat y)^2}{(2\sigma_i)^2}}}d(y_i)##.
I see hints that this should simplify greatly, but it could take me a long time to do it without error - and that's if I've not already made some.

hutchphd · Jan 2, 2023

TheMercury79 said:

Should I average the final errors or choose the highest one? (oh, I probably should have said 'greatest one' here, right?)

You should use the formula as written. Why do otherwise?

haruspex · Jan 3, 2023

hutchphd said:

You should use the formula as written. Why do otherwise?

See post #4.

Orodruin · Jan 3, 2023

haruspex said:

Here's a start. You have values ##(y_i)## with a priori variances ##\sigma_i##.
Taking each to be normally distributed about some unknown ideal value ##\hat y##, the marginal probability of the values is ##p((y_i))=(2\pi)^{-n/2}\frac 1{\Pi\sigma_i}e^{-\Sigma{\frac{(y_i-\hat y)^2}{(2\sigma_i)^2}}}##.
To maximise that we would want ##\hat y## to be ##y'=S\Sigma\frac{y_i}{\sigma_i^2}## where ##1/S=\Sigma\frac 1{\sigma_i^2}##. So that is our best estimate for ##\hat y##.
To find the mean error in that, we need to estimate ##E[(y'-\hat y)^2]##.
I got as far as the expression ##S^2\frac{(2\pi)^{-n/2}}{\Pi\sigma_i}\int..\int(\Sigma\frac{y_i-\hat y}{\sigma_i^2})^2e^{-\Sigma{\frac{(y_i-\hat y)^2}{(2\sigma_i)^2}}}d(y_i)##.
I see hints that this should simplify greatly, but it could take me a long time to do it without error - and that's if I've not already made some.

Generally this works for independent measurements. However, it is not clear to me from the OP’s description that there are no correlations between measurements.

hutchphd · Jan 3, 2023

haruspex said:

See post #4.

I am not seeing the complication here. One assumes the errors to independent (absent information to the contrary) and the result for the two methods, if correctly applied, should be the same. If they are not, then the errors are likely correlated. In this case either a root correlation or bias needs to be identified. Absent that one should choose the larger (more conservative) error estimate.
The Central Limit Theorem is your friend here.

haruspex · Jan 3, 2023

hutchphd said:

the result for the two methods, if correctly applied, should be the same

No, because of the heteroscedasticity.
Indeed, without taking the a priori distributions into account one would use the wrong value for the best estimate.

haruspex · Jan 3, 2023

Orodruin said:

Generally this works for independent measurements. However, it is not clear to me from the OP’s description that there are no correlations between measurements.

I see no hint that they are not independent, but I am rather unclear as to how the a priori errors vary. I just assume @TheMercury79 can work out the likely variance of each datapoint.

@TheMercury79, you have not clarified how you used the standard deviation of the measurements to get an estimate of the error in your answer. Did you use the standard error of the mean formula? If that produces a much smaller error estimate than you expected then I can think of three explanations:

the a priori errors you allowed were excessive
the a priori errors are largely systematic
the very low s.d. observed was a fluke

I think we can rule out the last. The first gives you a big problem since it does not encompass the true value. So I would look for systematic errors.

hutchphd · Jan 3, 2023

haruspex said:

No, because of the heteroscedasticity.
Indeed, without taking the a priori distributions into account one would use the wrong value for the best estimate.

I have never seen the term before (good thing my clients never knew that). But isn't that what the partial derivatives in the expansion are there to address? (If the recovery curve Q is correct)
I do not pretend to understand all the nuances, but it got me throughmany FDA submissions.

haruspex · Jan 3, 2023

hutchphd said:

isn't that what the partial derivatives in the expansion are there to address?

I should have started by asking you which formula you meant in post #8.

The formula quoted in post #1 won't do it alone because it only estimates the error for a single datapoint. Taking the largest of the these would be crazy since taking more readings would tend to increase that instead of reducing it. Taking the average is hardly better.

Bullet 2 in post #1 implies @TheMercury79 tried using the formula for standard deviation (or maybe for standard error of the mean) instead. This ignores the estimates of the individual datapoint errors, so cannot account for heteroscedasticity.

haruspex · Jan 6, 2023

haruspex said:

You have values ##(y_i)## with a priori variances ##\sigma_i##.
Taking each to be normally distributed about some unknown ideal value ##\hat y##, the marginal probability of the values is ##p((y_i))=(2\pi)^{-n/2}\frac 1{\Pi\sigma_i}e^{-\Sigma{\frac{(y_i-\hat y)^2}{(2\sigma_i)^2}}}##.
To maximise that we would want ##\hat y## to be ##y'=S\Sigma\frac{y_i}{\sigma_i^2}## where ##1/S=\Sigma\frac 1{\sigma_i^2}##. So that is our best estimate for ##\hat y##.
To find the mean error in that, we need to estimate ##E[(y'-\hat y)^2]##.
I got as far as the expression ##S^2\frac{(2\pi)^{-n/2}}{\Pi\sigma_i}\int..\int(\Sigma\frac{y_i-\hat y}{\sigma_i^2})^2e^{-\Sigma{\frac{(y_i-\hat y)^2}{(2\sigma_i)^2}}}d(y_i)##.
I see hints that this should simplify greatly,

It simplified alright: ##E[(y'-\hat y)^2]=S=\frac 1{\Sigma\frac 1{\sigma_i^2}}##
Unfortunately, that's not what I set out to achieve. The scatter of the actual ##y_i## should play a part.

Error propagation leads to different uncertainties, which do I choose?

Similar threads

Hot Threads

Recent Insights