Variance & Standard Deviation

  • #1
Agent Smith
345
36
TL;DR Summary
What are the uses of variance and standard deviation and how do they differ?
Going through my notes ... and I see the following:

1. Variance = ##\displaystyle \text{Var(X)} = \sigma^2 = \frac{1}{n - 1} \sum_i = 1 ^n \left(x_i - \overline x \right)^2##
2. Standard Deviation = ##\sigma = \sqrt {Var(X)} = \sqrt {\sigma^2}##

Both variance and standard deviation are measures of dispersion (colloquially the spread in the data). Higher their values, more spread out the data is.

Statement B: The square root function is not linear and so standard deviation is biased when compared to variance.

Questions:
1. Do high variance and standard deviation mean greater variability in the data?
2. What does statement B mean?
 
Physics news on Phys.org
  • #2
The answer to 1 is "Yes".
The answer to 2 is "nothing". It is a meaningless statement because the concept of 'bias' only applies to estimators. A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken". For more detail see https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation.
 
  • Like
Likes Agent Smith and FactChecker
  • #3
andrewkirk said:
A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken".
That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.
 
  • Like
Likes Agent Smith
  • #4
@andrewkirk , by downward bias, do you mean it underestimates the variability in the data?

@FactChecker I didn't know dividing by ##n - 1## instead of ##n## corrects the bias. Gracias.

Statistics is hard! And I'm merely scratching the surface.

Can someone tell me what "square root function is not linear" means? I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.
 
  • Haha
Likes hutchphd
  • #5
Agent Smith said:
Can someone tell me what "square root function is not linear" means?
The graph is not a straight line.
1724840588570.png

Agent Smith said:
I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.
That is not a good way to judge if a function is linear. Consider the function, ##y=100 x##. It is linear, but a 1 unit change in ##x## becomes a 100 unit change in ##y##.
 
  • Like
Likes Agent Smith
  • #6
Agent Smith said:
TL;DR Summary: What are the uses of variance and standard deviation and how do they differ?

The square root function is not linear and so standard deviation is biased when compared to variance.
The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.
 
Last edited:
  • Like
Likes Agent Smith
  • #7
I'd say that they measure the same thing. The advantage of variance is that the variances of independent random variables add meaningfully while their standard deviations don't.
 
  • Skeptical
Likes Agent Smith
  • #8
@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.

Dale said:
The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.
:thumbup:
 
  • #9
FactChecker said:
That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.
This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.
 
  • Like
Likes hutchphd and FactChecker
  • #10
Agent Smith said:
Statistics is hard! And I'm merely scratching the surface.
groan. Actually very nicely done. groan.
 
  • Haha
Likes Agent Smith
  • #11
mjc123 said:
This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.
I stand corrected. Thanks. I learned something I have had wrong all my life. The relevant part of this backs up what you said.
 
  • Like
Likes Agent Smith
  • #12
Agent Smith said:
@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.


:thumbup:
This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.
 
  • Like
Likes Agent Smith
  • #13
Hornbein said:
This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
 
Last edited:
  • #14
@FactChecker
Capture.PNG

So if we take a linear function ##f(X) = 2X## then ##E[f(X)] = f(E[X])##?
 
  • #15
  • #16
No, expectation itself is linear. E[2X]=2E[X].
 
  • #17
Agent Smith said:
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
What do you mean by variance contradicting the standard deviation?
Re linearity, I suspect both biasedness as well as maximum likelihood estimators may only be preserved by linear transformations. Maybe @statdad can confirm or deny this?
 
  • #18
WWGD said:
What do you mean by variance contradicting the standard deviation?
Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.
 
  • #19
WWGD said:
No, expectation itself is linear. E[2X]=2E[X].
##f(X) = 2X##? The other function/operation must be linear?
 
  • #20
Agent Smith said:
##f(X) = 2X##? The other function/operation must be linear?
What I mean is that if X is a random variable, then the expectation of the RV 2X is twice the expectation of X.
 
  • Like
Likes Agent Smith
  • #21
Agent Smith said:
Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.
@WWGD ☝️
 
  • #22
Agent Smith said:
Well, you can argue too, that when Variance increases, so does SD, by their functional dependence, at least when variance >1. But what notion, messure other than those two, do you use as a measure of variability? But I was wrong above. There are nonlinear transformations that preserve unbiasedness, maximum likelihood property.
 
  • Like
Likes Agent Smith
  • #23
@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?
 
  • #24
Agent Smith said:
@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?
That's close, yes.
 
  • Haha
Likes Agent Smith
  • #25
Agent Smith said:
Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?
Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.
 
  • Like
Likes Agent Smith and Vanadium 50
  • #26
The units are different, so it makes no sense to compare the size of variance and standard deviation. It does make sense to compare the size of the mean and the standard deviation, they are the same units.
 
  • Like
Likes Agent Smith, Hornbein, Vanadium 50 and 2 others
  • #27
There are two numbers. One is the square of the other. All the usual rules apply.
There is no statistical significance to any of this. Am I missing something??
 
  • Like
Likes Hornbein
  • #28
All this is why statistics are standardized/normalized to the unitless "Z-score."
 
  • Like
Likes Agent Smith and Dale
  • #29
hutchphd said:
There are two numbers. One is the square of the other. All the usual rules apply.
Including the most important - they have different units. What is bigger, a gallon or a calorie?
 
  • Like
Likes hutchphd and Dale
  • #30
@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra 👇 )
Hornbein said:
Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.
 
  • Like
Likes hutchphd
  • #31
Agent Smith said:
which is a bette
Which is better, momentum or energy? Temperature or pressure? Volume or area?
 
  • Haha
Likes Agent Smith
  • #32
Agent Smith said:
which is a better, the variance or the standard deviation, in giving us an accurate measure of variability
The distinctions between them are not in terms of accuracy. Variance is nice because it is additive and you can partition a total variance into separate portions. The standard deviation is nice because it has the same units as the random variable itself and can be meaningfully compared to the mean.

Agent Smith said:
By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability
Neither of these statements is true.
 
  • Like
  • Wow
Likes Agent Smith, Vanadium 50 and hutchphd
  • #33
Agent Smith said:
@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra 👇 )
@Dale & @Vanadium 50 ☝️ 🤔

For ##\sigma = 9## (say) meters, the ##9## meters by itself is meaningless, oui?
 
  • #34
One can deduce that the variance is a measure of the spread or dispersion of a probability distribution through Chebyshev's inequality. E. Parsen: "Modern Probability Theory and Its Applications"

Since I am not well versed in this subject I leave it to the cognoscenti to elaborate.
 
  • Like
Likes Agent Smith
  • #35
gleem said:
One can deduce that the variance is a measure of the spread or dispersion of a probability distribution through Chebyshev's inequality. E. Parsen: "Modern Probability Theory and Its Applications"

Since I am not well versed in this subject I leave it to the cognoscenti to elaborate.
Gracias.
 

Similar threads

Back
Top