Distribution function approach to error propagation

In summary, the conversation discusses the use of calculus and probability density functions to evaluate error propagation in calculations with random variables. The classic formula for error propagation in a quotient of random variables is derived using the sum of fractional errors squared. The conversation also delves into the question of whether it is possible to formally replace the errors with the variance of the corresponding gaussian pdf and the use of Monte Carlo brute force. The link provided by EnumaElish leads to a more solid book on statistics. The conversation then shifts to a new approach of modeling the experiment, where the number of dots is in large excess compared to the number of beads. The second hurdle is realizing that the number of draws for each bead is not fixed and fluctuates, which brings
  • #1
Jezabel
3
0
Hello,

I'm familiar with the common calculus approach with partial derivatives to evaluate error propagation in calculations with random variables. However, I'm looking for a way to derive the classic formula with the sum of fractional errors squared:
[tex]{\left(\frac{\Delta Z}{Z}\right)}^2 = {\left(\frac{\Delta X}{X}\right)}^2 + {\left(\frac{\Delta Y}{Y}\right)}^2 [/tex]

for the error propagation in a quotient of random variables X & Y:
[tex]Z = X/Y[/tex]

using only the probability density functions (pdf), given that in my specific situation, X & Y are typical gaussian distributions. I've played around with transformations of pdf and multivariate joint pdf (though in my case X & Y are independent), but didn't reach my goal so far.

Is it possible to formally replace the [tex]\Delta X, \Delta Y, \Delta Z[/tex] of the above equation by the variance of the corresponding gaussian pdf? Or am I oblige to resort to Monte Carlo brute force to work with the pdf from the start?
 
Physics news on Phys.org
  • #2
If Z = X/Y, isn't [itex]\Delta Z/Z = \Delta X/X - \Delta Y/Y[/itex]? How do you go from this to the squared errors? (Are you assuming [itex]\Delta X\Delta Y = 0[/itex]?) Regardless, you may find the following useful: http://www.stata.com/support/faqs/stat/deltam.html
 
Last edited:
  • #3
In the derivation of the mean square error, you are assuming that the cross-terms are zero, hence your formula given in the first post.
 
  • #4
Thank you very much for your help, I apologize for not having answered earlier. I 'm facing a deadline to submit a paper, it's keeping me rather busy...

The link provided by EnumaElish guided me to the resource I very much needed: a more solid book on statistics than what I had already. It provided the clue I was looking for, the square error method comes from a linearization of the problem, thus passing into the realm of calculus through Taylor series expansion. And yes, as was pointed out, I was considering [tex]\Delta X \Delta Y = 0[/tex] at that point in my analysis.

However, I took a new approach to model my experiment redefining X and Y so they are very much correlated, I'll therefore include the cross-term in the end. Unfortunately, this new model has me stuck at a new place, so I thought I'd share the whole model with you instead of only the tip of the iceberg as I did previously. It goes as follow:

I have red dots and green dots that are drawn by bigger beads. The total number of dots is in large excess compared to the number of beads, so I consider what happens at each bead independently. I know the relative proportion of red dots versus green dots in my experiment, hence I can set the probability of drawing a red dot [tex]p_r[/tex] versus the probability of drawing a green dot [tex]p_g = 1 - p_r[/tex]. I should end up with a binomial distribution for the number of red dots [tex]n_r[/tex] and for the number of green dots [tex]n_g[/tex] that are drawn when I observe a large sample of beads. So far, so good. My first hurdle was the question I initially posted. Experimentally, I don't have access to the absolute numbers of red and green dots, but instead their relative value [tex]n_r/n_g[/tex]. How to calculate the resulting variance of this ratio of random variables? But given the experimental conditions this was measured in, I'm totally happy with the square error approximation method.

The second hurdle I just realized I had was that the number of draws (hence the total number of dots [tex]n_t = n_r + n_g[/tex] on a bead) is not a fixed number, it fluctuates from one bead to the next. *sigh* The number of draws for each bead is limited by the number of dots I can put to fill its surface and the beads have roughly, but not exactly, the same size. I have a few measurements that allow me to estimate the mean value of [tex]n_t[/tex] and have a very rough idea of its variance across the beads. As I write this post, I see better the link between [tex]n_t[/tex] and the ratio I measure for each bead [tex]n_r/n_g[/tex], but it's far from obvious to me how the fluctuations of [tex]n_t[/tex] will affect the distribution of my ratio measured for many beads...

Thanks again for any hindsight!

P.S. I feel any advice given beyond this point should be acknowledged in my paper. If you help me and do not mind sharing your real name with me (via private message), I'll gladly include your name in my acknowledgment section and give you enough info about the paper so you can track its publication.
 
  • #5
Progress

After a day of thinking about about it (and reading on statistics!), here is the way I'm currently going for to treat the problem:

I'm considering the drawing of dots by the bigger beads as a random experiment as well (and neglecting quite a bit of experimental factors at the same time, but I'll see about improvements to include them later). I'm taking the binomial approach where a dot have a probability [tex]1/B[/tex] (where [tex]B[/tex] is the total number of beads) to be drawn by a given bead and a probability [tex]1-1/B[/tex] by all the other beads. Since [tex]B[/tex] is large in my experiment and the number of draws made (i.e. [tex]D[/tex] equal to the total number of dots going on all beads) is even larger, I can approximate distribution of the number of dots on a given bead [tex]n_t[/tex] to be a poisson one with expected value and variance [tex]\lambda=D/B[/tex].

Then back to the red dot versus green dot part of the experiment which I described in my previous post. Another binomial distribution unless I'm very much mistaken! Looking at joint distributions and conditional distributions, I found an example (on p.86 of J.A. Rice, Mathematical Statistics and Data Analysis, Duxbury Press (1995)) that seemed similar to my situation. It showed how to sum over a poisson distribution (of [tex]n_t[/tex] in my case) multiplied by the joint binomial distribution conditional to it ([tex]n_t[/tex] fixing the number of trials of the red dot versus green dot part of the experiment). By law of total probability, this summation should give me the probability distribution of drawing [tex]n_r[/tex] red dots on a bead, which was a given in the book to be:

[tex]p(n_r) = \frac{ (\lambda p_r)^{n_r} }{ n_r! } e^{ -\lambda p_r }[/tex]

where [tex]\lambda=D/B[/tex] as above and [tex]p_r[/tex] is known as mentioned in the previous post. I now have a Poisson distribution for [tex]n_r[/tex] as well, with expectation value and variance:

[tex]\lambda_{n_r} = \frac{D \cdot p_r}{B}[/tex]

where [tex]D/B[/tex] is the expectation value of [tex]n_t[/tex] dots per bead for which I have an experimental estimate. I can take the same reasoning for [tex]n_g[/tex] and I know the proportion of red dots versus green dots, hence [tex]p_r[/tex] and [tex]p_g = 1-p_r[/tex]. Thus, I should have all the information required to use the square error method (adding the covariance term of [tex]n_r[/tex] and [tex]n_g[/tex]) to approximate the variance of the ratio [tex]Z = n_r/n_g[/tex] (see first post of this thread). Voilà.

Hoping this is somewhat clear, any comment is welcomed :smile: I'm somewhat disturbed by the absence of a contribution of the fluctuations of [tex]n_t[/tex] on the variance of [tex]n_r[/tex] and [tex]n_g[/tex], only its expected value [tex]D/B[/tex] is included, but I guess this is due to my choice of a Poisson distribution as model for [tex]n_t[/tex] which fix the variance to the expectation value...
 
  • #6
Theorem 4 of Chp. V in Mood, Graybill, Boes, Intro. to the Theory of Stat.:

E[X/Y] approx. equal to EX/EY - Cov[X,Y]/(EY)^2 + EX Var[Y]/(EY)^3,

and

Var[X/Y] approx. eq. to (EX/EY)^2 (Var[X]/(EX)^2 + Var[Y]/(EY)^2 - 2Cov[X,Y]/(EX EY)).

P.S. EX = Mean[X] and EY = Mean[Y].

If X and Y are independent, their Cov = 0. If they are identically distributed, EX = EY and Var[X] = Var[Y].

The set of random variables {X, Y, X+Y} has one redundant element. If you know any two, then you know the third.
 
Last edited:

FAQ: Distribution function approach to error propagation

What is the distribution function approach to error propagation?

The distribution function approach is a statistical method used to estimate the uncertainty or error in a calculated quantity based on the uncertainties in the input variables. It takes into account the probability distribution of each input variable and uses statistical techniques to determine the overall uncertainty in the calculated quantity.

How is the distribution function approach different from other methods of error propagation?

The distribution function approach differs from other methods of error propagation, such as the linear approximation method, in that it does not make any assumptions about the shape of the error distribution. Instead, it uses the actual probability distribution of each input variable to calculate the overall uncertainty, making it a more accurate approach.

What are the advantages of using the distribution function approach?

The distribution function approach has several advantages over other methods of error propagation. It takes into account the full probability distribution of each input variable, allowing for a more accurate estimation of uncertainty. It also allows for the incorporation of any type of distribution, making it applicable to a wide range of problems. Additionally, it can handle multiple input variables and their correlations, providing a more comprehensive analysis of uncertainty.

What are the limitations of the distribution function approach?

One limitation of the distribution function approach is that it requires knowledge of the probability distributions of each input variable. In some cases, these distributions may not be readily available or may be difficult to determine. Another limitation is that the approach may be computationally intensive, especially when dealing with a large number of input variables.

How is the distribution function approach used in practical applications?

The distribution function approach is commonly used in scientific research and engineering to estimate uncertainties in calculated quantities. It is particularly useful in fields where accurate measurements and predictions are crucial, such as in climate modeling, financial forecasting, and drug development. It is also used in quality control processes to ensure the accuracy and reliability of manufactured products.

Similar threads

Back
Top