- #1
andrewr
- 263
- 0
Hi,
I am toying around with writing a significant figures calculator written in Python.
According to NIST and other sites, the preferred way for working with uncertainty is to report the uncertain digits in concise form: eg 11234(13)kg would be 11,234kg+/-13kg with 13 being 1 standard deviation.
So I have built a real simple decimal class based object which can record and work with the mean and the standard deviation, and keep track of them for rounding purposes in simple numerical calculations.
P.S. This isn't for personal coursework -- but just experimentation to see how useful an idea would be. eg: so I don't need it to follow textbook rounding or anything...
For example, given two random variables a and b, obviously the equation (a+b)-a has a correlation problem since 'a' appears twice. Applying the Pythagorean theorem twice is going to produce a resulting standard deviation which is much too large.
I am after a very crude way of compensating for this type of an effect, where each variable keeps a history of some kind as to what the individual sample deviations were. So for example, given that 'a' represents the composite of 60 experiments (and so does b), I might record the "sign" of the sample deviation of all 60 experiments -- but only keep the numerical value of the mean and standard deviation for variables 'a' and 'b' (eg: means= a,b and std= sa, sb ).
Given that I know the sign, I figure I can use it test crudely for correlation (not quite Pearson, but cheaper computationally).
If I added 'a' to itself, then, the calculator would not know that they are the 'same' variable necessarily, but it would know that the sign correlation was identical; That seems like it might be useful in reducing the calculated deviation errors; but I get stuck when trying to figure out what the resulting standard deviation would be for adding variables together that I KNOW how many of the sample deviations share the same sign.
Has this approach been done before, and how is it attacked?
I have found this much information in the literature available on the web:
Error propagation for addition operation of random variables (I can figure out all others from this one example...)
Given two random variables:
[tex]a=\mu_{a}\pm\sigma_{a}[/tex]
[tex]b=\mu_{b}\pm\sigma_{b}[/tex]
The variance of a is simply
[tex]\sigma_{a}^{2}[/tex] and the variance of b is [tex]\sigma_{a}^{2}[/tex]
When no covariance or correlation is present:
[tex]a+b=\mu_{a}+\mu_{b}\pm(\sigma_{a}^{2}+\sigma_{b}^{2})^{0.5}[/tex]
co-variance is non-normalized correlation, it is computed from the original data point “error” as:
[tex]cov(a,b)=\sum_{i=1}^{n}\frac{\left(a-\mu_{a}\right)\left(b-\mu_{b}\right)}{n}[/tex]
Normalized covariance is called correlation:
[tex]cor(a,b)=\frac{cov(a,b)}{\sqrt{\sigma_{a}\sigma_{b}}}[/tex]
When covariance is present, the sum of variables becomes:
[tex]a+b=\mu_{a}+\mu_{b}\pm(\sigma_{a}^{2}+\sigma_{b}^{2}+2cov(a,b))^{0.5}[/tex]
When the covariance is 0, (independent variables possible), the formula reduces to the one for independent variables. The goal of my project, then, is to estimate a reasonable covariance based on knowledge of the *correlation* of signs of the point error around their variable's mean
[tex]corSgn(a,b)=\sum_{i=1}^{n}\frac{sgn\left(a-\mu_{a}\right)*sgn\left(b-\mu_{b}\right)}{n}[/tex]
and assuming a gaussian distribution of errors.
Clearly the sign correlation will optimistically estimate the variable's correlation, so that simply saying
[tex]cov(a,b)\approx corSgn(a,b)*\sqrt{\sigma_{a}*\sigma_{b}}[/tex]
wouldn't be an unreasonable guess; but I am unsure how far off this would be from the actual covariance of a data-set computed based on a Gaussian distribution about a mean.
How would I estimate how much the sign based correlation over-estimates the actual correlation of the data?
I am toying around with writing a significant figures calculator written in Python.
According to NIST and other sites, the preferred way for working with uncertainty is to report the uncertain digits in concise form: eg 11234(13)kg would be 11,234kg+/-13kg with 13 being 1 standard deviation.
So I have built a real simple decimal class based object which can record and work with the mean and the standard deviation, and keep track of them for rounding purposes in simple numerical calculations.
P.S. This isn't for personal coursework -- but just experimentation to see how useful an idea would be. eg: so I don't need it to follow textbook rounding or anything...
For example, given two random variables a and b, obviously the equation (a+b)-a has a correlation problem since 'a' appears twice. Applying the Pythagorean theorem twice is going to produce a resulting standard deviation which is much too large.
I am after a very crude way of compensating for this type of an effect, where each variable keeps a history of some kind as to what the individual sample deviations were. So for example, given that 'a' represents the composite of 60 experiments (and so does b), I might record the "sign" of the sample deviation of all 60 experiments -- but only keep the numerical value of the mean and standard deviation for variables 'a' and 'b' (eg: means= a,b and std= sa, sb ).
Given that I know the sign, I figure I can use it test crudely for correlation (not quite Pearson, but cheaper computationally).
If I added 'a' to itself, then, the calculator would not know that they are the 'same' variable necessarily, but it would know that the sign correlation was identical; That seems like it might be useful in reducing the calculated deviation errors; but I get stuck when trying to figure out what the resulting standard deviation would be for adding variables together that I KNOW how many of the sample deviations share the same sign.
Has this approach been done before, and how is it attacked?
I have found this much information in the literature available on the web:
Error propagation for addition operation of random variables (I can figure out all others from this one example...)
Given two random variables:
[tex]a=\mu_{a}\pm\sigma_{a}[/tex]
[tex]b=\mu_{b}\pm\sigma_{b}[/tex]
The variance of a is simply
[tex]\sigma_{a}^{2}[/tex] and the variance of b is [tex]\sigma_{a}^{2}[/tex]
When no covariance or correlation is present:
[tex]a+b=\mu_{a}+\mu_{b}\pm(\sigma_{a}^{2}+\sigma_{b}^{2})^{0.5}[/tex]
co-variance is non-normalized correlation, it is computed from the original data point “error” as:
[tex]cov(a,b)=\sum_{i=1}^{n}\frac{\left(a-\mu_{a}\right)\left(b-\mu_{b}\right)}{n}[/tex]
Normalized covariance is called correlation:
[tex]cor(a,b)=\frac{cov(a,b)}{\sqrt{\sigma_{a}\sigma_{b}}}[/tex]
When covariance is present, the sum of variables becomes:
[tex]a+b=\mu_{a}+\mu_{b}\pm(\sigma_{a}^{2}+\sigma_{b}^{2}+2cov(a,b))^{0.5}[/tex]
When the covariance is 0, (independent variables possible), the formula reduces to the one for independent variables. The goal of my project, then, is to estimate a reasonable covariance based on knowledge of the *correlation* of signs of the point error around their variable's mean
[tex]corSgn(a,b)=\sum_{i=1}^{n}\frac{sgn\left(a-\mu_{a}\right)*sgn\left(b-\mu_{b}\right)}{n}[/tex]
and assuming a gaussian distribution of errors.
Clearly the sign correlation will optimistically estimate the variable's correlation, so that simply saying
[tex]cov(a,b)\approx corSgn(a,b)*\sqrt{\sigma_{a}*\sigma_{b}}[/tex]
wouldn't be an unreasonable guess; but I am unsure how far off this would be from the actual covariance of a data-set computed based on a Gaussian distribution about a mean.
How would I estimate how much the sign based correlation over-estimates the actual correlation of the data?
Last edited: