Expectation value for CDF fails

In summary, the conversation is about a person trying to compute the expectation value and variance of sub-samples and operations on the normal distribution. They are using Gaussian random generators written in C and are doing numerical experiments with sample sizes ranging from 1 million to 10 billion. They have found that their generators converge accurately to at least 4 decimal places. However, they are having trouble getting results that agree with numerical simulations. They have noticed that their program is wrong and they believe their algebra is right. They are also trying to optimize their generator for speed and accuracy. The conversation also touches on other topics such as the Box-Muller method, computing the standard deviation of two N(0,1) distributions added together, and methods for
  • #36


viraltux said:
I am listening to the X files theme tune while reading this... it's just appropriate.

Well, Wikipedia got it nearly right because first they say:
which is true but then they forget the approximate symbol from that moment on.

So the article should use the ≈ sign for every no-linear case and explain that for the special case [itex]ab[/itex] we have Goodman's exact formula. The discoverer maybe should do the honors and fix it, but if you don't I'll do it myself; I am a big supported of Wikipedia philosophy and I'd like it to have it right.

:smile:

I use them enough that I donate to their fundraisers. One of these days I need to create an account and learn the process of helping them edit their articles...
Go ahead and fix it yourself... If you want to give clues on how you did that, I'd appreciate knowing... Do you edit the live article, or do you make a corrected draft and have it peer reviewed by other volunteers?

I am not civil engineer or EE, so I don't know in which scenarios this formula is applied, but every time σ is big compared to a,b then the approximation is horrible. But anyway, scenarios with big uncertainties can be found a lot in Astronomy though I don't know how often they deal with non-linear scenarios, and so it goes for the LHC at CERN, if they're dealing with big uncertainties in this non-linear scenario then their results are going to be way over-optimistic.

In engineering, most items are macroscopic; but an extreme application might be the estimation of a some property of a the space of a transistor; in such an app, there could be around 20 atoms in diameter (eg: the Atom processor transistors, anyone?) and an atom's outer electron wave packet may extend out 3 to 5 atoms...

so; we might expect sigma to be on the order of 1/6 to 1/4 of a or b. If it were 1/4, then relatively speaking; that gives -- sqrt( 1(1/4)**2 + (1/4*1/4)**2 + 1(1/4)**2 / ...
about 1.5% error...

The error, does, of course grow rapidly and becomes horrible as one approaches one atom... (EE's *ARE* having trouble getting down that small ;) )

Geeee... what a timing! :smile:

:biggrin:

Yes, cern (I'm sure) has reason for concern...
QM packets for single particles are rather, well, a large percentage of the measured value...
But they also have a staff of Mathematicians, I'd bet...

FYI: I will get back to the random generator...
I just have priorities with solving the analytical problems first; but I definitely want to make a good test generator...

I'm about ready to start tackling correlation, kurtosis, and skew...

Those uglies show up whenever multiplication of Gaussians happen, even when the muliplicands are not correlated, as I mentioned in the graphs at the bottom of post 23.

https://www.physicsforums.com/showpost.php?p=3977418&postcount=23

When Goodman produces a result, he is simply giving us the variance and mean of the operation from Gaussians; but as we know, the results themselves are NOT Gaussians...

The purpose of NIST (National Institute of Standards and Technology) is to encourage cooperation (national and international) by setting up/maintaining common and useful standards for measurement and result reporting.

NIST recommends (and even for international cooperation) that once data is collected; the result ought to always be reported (and potentially distorted/corrected) such that the results can be used *as if* they were Gaussian variables.

Since that's the case, I'm interested in a calculator/set of functions which manipulate Gaussians and do the corrections/distortions necessary to preserve the reporting format of a Gaussian.

Ultimately, I'd like to build a calculator which uses the Cyhelsky Skew idea to carry around a pair of Gaussians that can simulate the properties of non-Gaussian distributions. The calculator would carry these values around internally, and use them to improve the precision of intermediate calculation steps -- but always report a "best" Gaussian as the result.

We know that a calculator can approximate functions by polynomials; Since addition of Gaussians always produces Gaussians, addition and subtraction are not a problem, but multiplication *is* -- So, my goal is really making some kind of correction to multiplication results during intermediate calculator steps...

If I just pretend the output of Goodman's formula is a Gaussian, I'm going to get pretty bad results in the next Gaussian operation I perform. So, Here's what I am thinking in General: If I add a near-Gaussian to a true Gaussian, the result will be closer to a true Gaussian than the original (Central limit theorem). Often it is *MUCH* closer to a Gaussian...!

So, what I am thinking is that I could take the output of multiplication, (M), and then add it to an arbitrary but true un-correlated Gaussian (X) (doing so either element by element or using Calculus); Then save the final mu, and the sigma, of the result; This result can be used to back-compute a true Gaussian that will simulate the results of addition better than the sigma and mu of M would in the first place...

So, I'm thinking: set m, s = mu of ( M + X ), sigma of (M + X );
Then, set Mn = N( mu,sigma ) - X.

Then Mn can be used in subsequent calculations with better results.
What do you think, is this a decent approach -- or are there other simpler approaches already in use in the general mathematical community that you are aware of?

:smile:
 
Physics news on Phys.org
  • #37


andrewr said:
:smile:

I use them enough that I donate to their fundraisers. One of these days I need to create an account and learn the process of helping them edit their articles...
Go ahead and fix it yourself...

That's all right, I will, but before I go on commenting with the rest of your post I just let you know that I got a threat of a ban from a mentor. Just telling you in case I happen to decay into photons :-p OK?
 
  • #38


andrewr said:
:smile: If you want to give clues on how you did that, I'd appreciate knowing... Do you edit the live article, or do you make a corrected draft and have it peer reviewed by other volunteers?

Yes you do edit live, but articles are under surveillance of the authors or the people that know about the subject, when someone changes the articles they immediately get a email warning about the change and they disagree they can revert the change immediately.

When disagreement happens then you talk about it and the discussion remains in the talk section, that is why is a good idea to check that section when an article is flag as biased (usually those with political content)

andrewr said:
When Goodman produces a result, he is simply giving us the variance and mean of the operation from Gaussians; but as we know, the results themselves are NOT Gaussians...

The purpose of NIST (National Institute of Standards and Technology) is to encourage cooperation (national and international) by setting up/maintaining common and useful standards for measurement and result reporting.

NIST recommends (and even for international cooperation) that once data is collected; the result ought to always be reported (and potentially distorted/corrected) such that the results can be used *as if* they were Gaussian variables.

Since that's the case, I'm interested in a calculator/set of functions which manipulate Gaussians and do the corrections/distortions necessary to preserve the reporting format of a Gaussian.

Ultimately, I'd like to build a calculator which uses the Cyhelsky Skew idea to carry around a pair of Gaussians that can simulate the properties of non-Gaussian distributions. The calculator would carry these values around internally, and use them to improve the precision of intermediate calculation steps -- but always report a "best" Gaussian as the result.

We know that a calculator can approximate functions by polynomials; Since addition of Gaussians always produces Gaussians, addition and subtraction are not a problem, but multiplication *is* -- So, my goal is really making some kind of correction to multiplication results during intermediate calculator steps...

If I just pretend the output of Goodman's formula is a Gaussian, I'm going to get pretty bad results in the next Gaussian operation I perform. So, Here's what I am thinking in General: If I add a near-Gaussian to a true Gaussian, the result will be closer to a true Gaussian than the original (Central limit theorem). Often it is *MUCH* closer to a Gaussian...!

So, what I am thinking is that I could take the output of multiplication, (M), and then add it to an arbitrary but true un-correlated Gaussian (X) (doing so either element by element or using Calculus); Then save the final mu, and the sigma, of the result; This result can be used to back-compute a true Gaussian that will simulate the results of addition better than the sigma and mu of M would in the first place...

So, I'm thinking: set m, s = mu of ( M + X ), sigma of (M + X );
Then, set Mn = N( mu,sigma ) - X.

Then Mn can be used in subsequent calculations with better results.
What do you think, is this a decent approach -- or are there other simpler approaches already in use in the general mathematical community that you are aware of?

:smile:

Well, I am not sure I understand the problem you are trying to solve here, one thing is to apply transformations to data so that it looks to us like something familiar e.g. a Normal distribution; we do that all the time, and I assume this is what NIST means. Right? So for instance, if X follows a Normal distribution with μ>>0 and μ>>σ, and you have Z = X^2 then NIST would recommend you don't share Z but rather SQRT(Z) explaining the transformation you did, in this case SQRT. Am I right on what NIST asks?

But then it seems like if you were trying to find a way to calculate the most similar Normal distribution to something which is not Normal in which case... this is weird; for instance, no matter how you tune μ and σ you will never get anything close to an Exponential distribution.

But anyway, If you want to know how close are two distributions there is a family of probability distribution metrics that will tell you exactly that (e.g. Fortet-Mourier metric) If this is what you want you can use the metric d(Mn,M) to find the μ and σ that minimize it; This way you will have the Normal distribution Mn that would explain better M.

Now, if you want the fanciest of techniques for this then you go with those using Mixtures of Distributions, which basically are methods that use overlapped distributions. You can integrate this overlapping in a Kalman Filter model (I think EE work with these to treat noise in signals) and have a very nice approximation of the products of Normal distributions by just using Normal distributions overlapped in the filter.

Sooo.. yeah.
 
Last edited:
  • #39


viraltux said:
Well, I am not sure I understand the problem you are trying to solve here, one thing is to apply transformations to data so that it looks to us like something familiar e.g. a Normal distribution; we do that all the time, and I assume this is what NIST means. Right? So for instance, if X follows a Normal distribution with μ>>0 and μ>>σ, and you have Z = X^2 then NIST would recommend you don't share Z but rather SQRT(Z) explaining the transformation you did, in this case SQRT. Am I right on what NIST asks?

No, not quite (Though I may be under-reading them) ; They are being very simplistic. They only want the mu and sigma of a data-set reported. Period.
For example 31.412(5) would have a mu of 31.412 and a sigma of 0.005. (The parenthesis references the significance of the ls-digit..., and represents 1 sigma)

NIST does allow one to report a two sigma mark, etc, using special notation -- but essentially, it's the same thing -- a mere number. One would report the mu and sigma of Z in your example; and consider what operations the client would perform on it -- given that, we could adjust mu and sigma that we report in order to reduce the error the client will experience using our number.

But then it seems like if you were trying to find a way to calculate the most similar Normal distribution to something which is not Normal in which case... this is weird; for instance, no matter how you tune μ and σ you will never get anything close to an Exponential distribution.

Yes, quite true. However, I am operating only on "assumed" Gaussians reported by someone else; (Thankfully NOT exponenetial!) and the only operations I am performing on it are add subtract multiply and divide. For add and subtract, the operation does a convolution on the data elements making the result "look" more and more Gaussian.

Multiplication for items with σ << μ also happens to look very Gaussian...

However, multiplication and division which don't obey the above relationship introduce serious Kurtosis and increasing skew. (A problem...)

Inside my calculator, I only wish to improve the result of doing a series of multiplications and additions compared to the standard of assuming pure Gaussians at every step.

If I wanted to do sin( statistic ), then I have an issue about propagation of error; but as we have seen, the typical industry standard covariance approach is missing an important term... It is generally accurate for typical operations, but it isn't robust and totally fails in some. I don't want a total failure, *ever*; just graceful degradation...

More complicated (typical) operations like computing sin( statistic ) can be done using a polynomial, with coefficients having NIST style values nnnnn(ssss); and then, the basic add subtract multiply divide operations of the hypothetical better calculator will keep track of the error and automatically by +.-,*,/ , properly tracing the error regardless of input; it keeps track, also, of the truncation error :) making the approximation polynomial conservatively accurate (a worthy goal...)

I am not interested in getting "exact" results, but I do want a more robust calculation with improved estimation on all *intermediate* steps inside the calculator -- eg: than if I were to assume pure Gaussians are the result of every operation. BUT -- I'd like a *simple* and robust improvement, and not an extremely fancy and complicated one.

But anyway, If you want to know how close are two distributions there is a family of probability distribution metrics that will tell you exactly that (e.g. Fortet-Mourier metric) If this is what you want you can use the metric d(Mn,M) to find the μ and σ that minimize it; This way you will have the Normal distribution Mn that would explain better M.

Yes, that seems correct; that is basically what I want to do -- although, it isn't always a pure Gaussian I want to convert to; eg: I do want the calculator to handle basic skew and possibly Kurtosis... but only in a very limited way, during intermediate steps.

I'll look up Forted-Mourier later, but I do want your opinion on the basic idea I had earlier and will explain here...

Now, if you want the fanciest of techniques for this then you go with those using Mixtures of Distributions, which basically are methods that use overlapped distributions. You can integrate this overlapping in a Kalman Filter model (I think EE work with these to treat noise in signals) and have a very nice approximation of the products of Normal distributions by just using Normal distributions overlapped in the filter.

Sooo.. yeah.

The Kalman filter might be used in an application such as CNC trajectory control and statistical process control of said CNC. But, even there -- I don't tend to use it as there are more robust methods I developed for high speed CNC work -- eg: there are better ways to calibrate sensors, and use *much* faster computation techniques to avoid needing to worry about interpolating sample times. It's also quite complicated and computation intensive...

(Brag) -- using an older 3000 series Xilinx chip -- which was only a 70 MHz *effective* flip flop rate -- (ignore the 100MHz chip spec if you look it up -- as that's for an un-wired flip-flop!) -- but none the less, I was able to compute 128 bit wide real time trajectories in that chip at well over 16 times the speed of a 200MHZ Intel 486 processor could even compute it, let alone do IO after the computation (That CPU was the fastest of that time period.) The servo loop time of my chip was in the nano-second region. Interpolation using Kalman filters simply isn't necessary...! ;) And a 3000 series chip was only around $13 each.

Anyway, I'm not interested in anything so complicated -- I am using a mixture distribution already. Cyhelsky skew gives a relative measure of how much data is above a mean vs. below. That in turn, automatically relates the sigma on each side of the mean (one sided sigmas, don't know the proper name...) to the skew; and they become increasingly different as skew increases; The final result of a simple mixed distribution can trivially be converted to a single sigma Gaussian, which would be the exact same as if the distribution's data points were used without reference to the side of the mean which they were on.

What I did, is use half a bell curve for each side of the mean -- eg: with the sigma set different for each side. That allows the calculator to handle skew in a basic way -- *but* that method does introduce a discontinuity at the mean.
However, this mixed distribution *did* allow me to develop the mathematics to handle adding skewed distributions.

Using the numerical simulations, I have noticed that even the first or second operation (say addition) smears the discontinuity out very much. Eg: the convolution has that effect... which is very desirable.

The mathematics I developed happens to apply quite accurately to subsequent steps, and since each convolution due to addition causes the subsequent result to look *more* Gaussian, the accuracy does not degenerate with more steps (It beats the **** out of what using ONLY pure Gaussians can do.)

The representation is good enough for use right now for addition of Skewed data sets, although I'd like to replace the mixed distribution of half bell curves with one that is equivalent to that mixed distribution after a *single* operation with a true un-skewed Gaussian. The new representation would automatically smear out the discontinuity, and produce curves which more realistically representing typical data sets with tails going to zero, and a single mode.

Since multiplication tends to have almost the same curve (visually) as some of my mixture distributions after a *single* addition, I figure I can use this new mixture distribution as a basis for representing results of multiplications and divisions, internal to the calculator. (The only problem that stops me from finishing the calculator.)

So, all operations, inside the calculator, can be represented by a single data distribution type -- a mixed distribution based on a Cyhelsky skewed Gaussian...

The final result, of course, has to be reported in NIST format -- so I will need to convert an internal representation to an un-skewed Gaussian; I also need, internal to the calculator, to convert the output of multiplication to the "nearest" skewed Gaussian the calculator can represent . All conversions, then, have the same basic problem -- finding a reasonably "accurate" representation of any other arbitrary distribution.

So this is what I thought might be a good way to convert them ... It will, of course, convert a Gaussian back into the *same* Gaussian -- exactly -- so the calculator automatically performs the industry standard Gaussian error propagation formulas (with multiplication improved, appropriately of course!).

The conversion idea I have is this, the calculator's internal basis distribution is determined by three (or at most 4) statistics.

1 & 2) The Cyhelsky skew value, or equivalently -- both one sided sigmas
3) The mean
4) Potentially a Kurtosis correction value (not developed yet).

So, I thought -- ANY distribution, even if not the calculator's native representation; could be converted into those variables in this way:

Take any arbitrary distribution u; develop mathematics to add it to an uncorrelated skew Gaussian "x", and compute the resulting 4 statistics; ( or only TWO statistics if the result is to be a pure Gaussian...)

Then solve for an "x" such that the resulting sigma of "u+x"is exactly √2 that of x, and the mu of the result is exactly twice that of "x"; Skew result must remain exactly the same as x, and be a "Maximal" value, and Kurtosis, if developed, would also remain exactly that of "x", and be maximal.

If u *were* a true Gaussian, then x would have to be the same distribution with the *same* mu and sigma of "u".
Skew and Kurtosis can't be different from the Gaussian, or the final result will have a different Skew and Kurtosis than "x", and hence they must be zero...

For a "u" of any other distribution besides Gaussian...:
"x" tells us what skewed Gaussian will produce the same results as "u" when operated on by addition...

Since adding a Gaussian to a non-Gaussian, produces something *closer* to a Gaussian -- I conjecture the final result ought to converge toward a "perfect" Gaussian and be better than just naively accepting "u"'s statistics raw...

Now, I'll look up Fortet-Mourier, but I am curious what you think of my idea. (Saying it's bador flawed is fine... I'm just curious as always...)
 
Last edited:
  • #40
Conversion of one CDF into an approximation; Fortet-Mourier

I found this...

Fortet-Mourier or Wasserstein metric:
inf (|X-Y|)=[itex] \int _{ ℝ } | F _1 (x) - F _2 (x) | dx = \int \limits _{0} ^{1} | F _{1} ^{-1} (x) - F _{2} ^ {-1} (x) | [/itex]

Where F1 and F2 are CDFs, eg: in my case, of an original and a converted distribution.

I'm not sure why the infimum is mentioned, but this metric appears to be the total probability of choosing a value that is in in error due to the conversion. So, minimizing this metric would match the distributions so that the total accumulated error of the conversion is minimum. It is more robust to outlier type errors, than say a metric based on the total variation of the error... and it is a decent starting point ... :rolleyes: ; But I am pretty certain I need to use a metric that minimizes the error (in a particular sense) of the original distributions vs. the converted distribution *after* one of the four operators +,-,×,/ are applied to it and another value. :smile:

Since the result is just a number, I'm really wanting to minimize the error of a statistic measured in the converted distribution; eg: the standard deviation and mean for NIST purposes, and possibly the kurtosis and skew for my own records. It doesn't *really* matter to me how much the actual PDF/CDF of the conversion is in error because the actual distribution is not reported in the end.

So, what are your thoughts ? and, of course, others are welcome to answer as well.
I'm not trying to be exclusive... but just share ideas.
 
  • #41


andrewr said:
I found this...

Fortet-Mourier or Wasserstein metric:
inf ��(|X-Y|)=[itex] \int _{ ℝ } | F _1 (x) - F _2 (x) | dx = \int \limits _{0} ^{1} | F _{1} ^{-1} (x) - F _{2} ^ {-1} (x) | [/itex]

Where F1 and F2 are CDFs, eg: in my case, of an original and a converted distribution.

I'm not sure why the infimum is mentioned...

The infimum makes sense when you consider the degrees of freedom given all possible X and Y with CDF F1 and F2, so if F1=F2 then there will be one X=Y and the infimum will be zero. But anyway, the scenario where I am familiar with these metrics (stochastic programming) we already have a particular X and we want the discretized version that better explains such X, so in this case the infimum would be with respect to the parameters of the discretized version.

For your problem you already have X too (it would be M in your example) and you want to find the Normal distribution (let's call it N) that better explains M, that is, to minimize d(M,N) and in this case the infimum will be with respect to the μ and σ of N. Once you have N you can use it instead M for your calculations and the metric will account for whatever kurtosis of skewness M might have, and thus, N should be your best shot to "Normalize" your data.

andrewr said:
But I am pretty certain I need to use a metric that minimizes the error (in a particular sense) of the original distributions vs. the converted distribution *after* one of the four operators +,-,×,/ are applied to it and another value. :smile:

I am not sure I understand the conversion you are trying to do and probably is because I don't quite understand the problem you're trying to solve, let's see, let's say you have the data X,Y,Z... and then you do all sort of operations H=X*Y+Z*Z*Z+X*Z+Z^2 and yet, all these operations are themselves another random variable H. Doing any intermediate step like conversion X*Y into a normal like distribution to then operate is a sin before the eyes of God.

I would just operate normally and if you need to "normalize" anything then do it with the final result H by minimizing d(H,N) and then returning the μ and σ of N.

But then you also say

NIST does allow one to report a two sigma mark

And now seems to me like if you don't even need to normalize anything! Simply return the μ of H and different σ for left and right if H happens to be skewed.

andrewr said:
Since the result is just a number, I'm really wanting to minimize the error of a statistic measured in the converted distribution; eg: the standard deviation and mean for NIST purposes, and possibly the kurtosis and skew for my own records. It doesn't *really* matter to me how much the actual PDF/CDF of the conversion is in error because the actual distribution is not reported in the end.

So, what are your thoughts ? and, of course, others are welcome to answer as well.
I'm not trying to be exclusive... but just share ideas.

Right, that's why I have the feeling I am missing something because I can't even see why you need the calculator at all. Do you have a numerical example? Something that shows where the problem is, how NIST asks you to report it, and how you are trying to solve the problem such procedure has? I think that might help everyone to jump in.

EDIT: In fact, we've been talking about so many different issues in this thread that people might feel they cannot really join in without reading the whole thing, maybe a good idea would be for you to open a new thread with this particular problem and then you'll have new blood and ideas on it.
 
Last edited:
  • #42
No, I don't have a numerical example. You would need to specify exactly what you would like to see -- it is impossible to generalize to all cases except using algebra.

But I simplified the question, and will add detail as people ask pertinent questions.
At least, I hope they do.

Simplified Q. is Here

I can link back to this thread only where apt., but starting a new thread is a good idea.
 
Back
Top