Why Do Physicists Use Gaussian Error Distributions?

In summary, Bailey's study found that errors in scientific measurements have fat tails, meaning they are more frequent and larger than expected from a normal distribution. This is consistent with heavy-tailed Student's t-distributions often almost Cauchy. While medical research has similar uncertainties as physics, the latter improves more rapidly. Despite this evidence, physicists continue to use Gaussian distributions for estimating the likelihood of experimental errors. This is likely due to convenience and the fact that Gaussian processes have many useful properties. The fat tails in measurements may be caused by power-law effects, such as self-similarity and self-organized criticality.
  • #1
ohwilleke
Gold Member
2,537
1,500
TL;DR Summary
Empirical evidence shows that errors have fat tails relative to the normal distribution. Why do physicists keep using Gaussian error distributions anyway?
Judging the significance and reproducibility of quantitative research requires a good understanding of relevant uncertainties, but it is often unclear how well these have been evaluated and what they imply. Reported scientific uncertainties were studied by analysing 41 000 measurements of 3200 quantities from medicine, nuclear and particle physics, and interlaboratory comparisons ranging from chemistry to toxicology. Outliers are common, with 5σ disagreements up to five orders of magnitude more frequent than naively expected. Uncertainty-normalized differences between multiple measurements of the same quantity are consistent with heavy-tailed Student’s t-distributions that are often almost Cauchy, far from a Gaussian Normal bell curve. Medical research uncertainties are generally as well evaluated as those in physics, but physics uncertainty improves more rapidly, making feasible simple significance criteria such as the 5σ discovery convention in particle physics. Contributions to measurement uncertainty from mistakes and unknown problems are not completely unpredictable. Such errors appear to have power-law distributions consistent with how designed complex systems fail, and how unknown systematic errors are constrained by researchers. This better understanding may help improve analysis and meta-analysis of data, and help scientists and the public have more realistic expectations of what scientific results imply.
David C. Bailey. "Not Normal: the uncertainties of scientific measurements." Royal Society Open 4(1) Science 160600 (2017).

How bad are the tails? According to Bailey in an interview, "The chance of large differences does not fall off exponentially as you'd expect in a normal bell curve," and anomalous five sigma observations happen up to 100,000 times more often than expected.

This study and similar ones are no big secret.

Given the overwhelming evidence that systemic error in physics experiments is not distributed according to a normal bell curve, known as a Gaussian distribution, why do physicists almost universally and without meaningful comment continue to use this means of estimating the likelihood that experimental results are wrong?
 
  • Like
Likes Dale
Physics news on Phys.org
  • #2
ohwilleke said:
Summary:: Empirical evidence shows that errors have fat tails relative to the normal distribution. Why do physicists keep using Gaussian error distributions anyway?

why do physicists almost universally and without meaningful comment continue to use this means of estimating the likelihood that experimental results are wrong?
I think largely because the statistical methods that they are used to are based on Gaussian distributions.

Especially in the Bayesian community there is growing development of techniques based on the Student’s t distribution. Usually it is called “robust regression” or something similar.
 
  • Like
Likes ohwilleke
  • #3
There are several reasons. The first is that systemic errors lie outside of physics. Everyone knows they are there but there can't really be a theory for that. They are just mistakes and failures. No need to comment on this, because I'd say that 90% of a typical physics experiment is dealing with mistakes and failures.

The second is that perfect Gaussian distributions never occur in nature. It's just a useful general model. A sort of folk system has grown up to deal with "outliers." Usually they are simply excluded.

If you add up a large number of results of a repeated experiment there is a strong tendency for such sums to have a Gaussian distribution. This happens almost no matter what the experiment is. (I'm not going to go into the exceptions.) The question is how large is large enough. There are tests that can be done as to whether something is close enough to Gaussian or not.
 
Last edited:
  • Like
Likes WWGD, BvU and mpresic3
  • #4
Gaussian processes have at least three convenient reasons for study. 1. The central limit theorem demonstrates (with some exceptions), that the sum of many random variables, each with (almost) any distribution, will be distributed gaussian. 2. The linear transformation of gaussian distributed random variable remains gaussian. 3. There is a lot of spin offs as far as the techniques. For example manipulation of gaussians is important in the the study of quantum harmonic oscillators, and path integrals. Many field theory problems are tractable because they involve gaussians.

Now I have had a course in robust estimation and I am aware (the professor told me in the first 5 minutes), that there is more to statistics than gaussian estimation, but gaussian processes are good place to start. I think there was a thread a few months ago as to why the linear approximations are treated so ubiquitously in physics, and computer simulation replaces the need for this kind of analysis. I think the answer to the question in the thread is similar to the answer to the earlier question.
 
  • Like
Likes BvU, hutchphd and ohwilleke
  • #5
Interesting piece, but the author does not provide much explanation as to what drives fat tails in these results - the CLT is pretty ubiquitous for things like measurement or sampling errors. I am guessing that power-law effects drive this

There was not much explanation of this statement in the OP article
None of the data are close to Gaussian, but all can reasonably be described by almost-Cauchy Student’s t-distributions with ν∼2−3. For comparison, fits to these data with Lévy stable distributions have nominal χ2 4–30 times worse than the fits to Student’s t-distributions.
So you cannot parametrize the T-dist from a Levy Stable Distribution (but you can the normal). I guess the issue is that defining characteristic of Levy stable distributions is that they do not change when you add multiple sets of data together, but the T-dist converges to normal.

My other shallow understanding is that you do get a lot of infinite-variance power law effects in physics (like earthquake energies) or more specifically:
https://geography.as.uky.edu/blogs/jdp/dubious-power-power-laws
One is scale invariance and self-similarity. If similar form-process relationships occur across a range of spatial scales, resulting in self-similarity and fractal geometry, this will produce power-law distributions. Fractal, scale-invariance, and self-similarity concepts are widely applied in geomorphology, geography, geology, and geophysics.

Deterministic chaos is also associated with fractal geometry and power-law distributions. So is self-organized criticality, where systems (both real and “toy” hill slopes are a commonly used example) evolve toward critical thresholds. Given the threshold dominance of many geoscience phenomena, the attractiveness of this perspective in our field is obvious. Chaos, fractals, and SOC is where I first started thinking about this issue, as power law distributions were often used as proof, or supporting evidence, for one or more of those phenomena, when in fact power law distributions are a necessary, but by no means sufficient, indicator.

Power laws also arise from preferential attachment phenomena. In economics this is manifested as the rich get richer; in internet studies as the tendency of highly linked sites to get ever more links and hits. Preferential attachment models have been applied in urban, economic, and transportation geography; evolutionary biology; geomicrobiology; soil science; hydrology; and geomorphology.

Various optimization schemes, based on minimizing costs, entropy, etc. can also produce power laws. These have been linked to power laws quite extensively in studies of channel networks and drainage basins, as well as other geophysical phenomena.

Multiplicative cascade modelsfractal or multifractal patterns arising from iterative random processes—produce power laws. These have been applied in meteorology, fluid dynamics, soil physics, and geochemistry. Speaking of randomness, Mitzenmacher (2004) even shows how monkeys typing randomly could produce power law distributions of word frequencies.

Diffusion limited aggregation is a process whereby particles (or analogous objects) undergoing random motion cluster together to form aggregates. The size distribution of the aggregates follows—well, you know. DLA has been used to model evolution of drainage networks, escarpments, and eroded plateaus, and applied in several other areas of geosciences and geography.

In financial economics, its well understood that you can't do something like apply the normal distribution to power-law distributed quantities like wealth or city sizes, is it less well understood within physics?
 
  • Like
Likes ohwilleke
  • #6
BWV said:
Interesting piece, but the author does not provide much explanation as to what drives fat tails in these results - the CLT is pretty ubiquitous for things like measurement or sampling errors. I am guessing that power-law effects drive this
I think that the CLT assumes that all of the data comes from some fixed statistical distribution. But when there are “glitches” the glitches actually come from completely different distributions than the rest of the data.
 

FAQ: Why Do Physicists Use Gaussian Error Distributions?

Why do physicists use Gaussian error distributions?

Physicists use Gaussian error distributions because they provide a mathematical model for the random errors that occur in experimental data. These distributions are also known as normal distributions and are characterized by a bell-shaped curve, making them easy to work with and interpret.

What are the benefits of using Gaussian error distributions?

There are several benefits to using Gaussian error distributions in physics. They allow for easy calculation of probabilities and confidence intervals, and they are robust to outliers. Additionally, many physical phenomena naturally follow a Gaussian distribution, making it a useful tool for modeling and analyzing data.

Can other types of error distributions be used in physics?

Yes, other types of error distributions can be used in physics, such as Poisson distributions for discrete data or exponential distributions for continuous data. However, Gaussian distributions are the most commonly used due to their simplicity and applicability to a wide range of scenarios.

How do physicists determine the parameters of a Gaussian error distribution?

Physicists typically use statistical methods, such as maximum likelihood estimation or least squares fitting, to determine the mean and standard deviation of a Gaussian error distribution from their experimental data. These methods aim to find the parameters that best fit the data and minimize the overall error.

Are there any limitations to using Gaussian error distributions in physics?

While Gaussian error distributions are widely used in physics, they do have some limitations. They assume that the errors in the data are independent and normally distributed, which may not always be the case. Additionally, they may not accurately capture extreme or non-linear data points, leading to potential inaccuracies in the analysis.

Similar threads

Replies
0
Views
1K
Replies
3
Views
1K
Replies
19
Views
6K
Replies
49
Views
10K
Replies
30
Views
7K
Back
Top