Variance & Standard Deviation

Agent Smith · Aug 28, 2024

Going through my notes ... and I see the following:

1. Variance = ##\displaystyle \text{Var(X)} = \sigma^2 = \frac{1}{n - 1} \sum_i = 1 ^n \left(x_i - \overline x \right)^2##
2. Standard Deviation = ##\sigma = \sqrt {Var(X)} = \sqrt {\sigma^2}##

Both variance and standard deviation are measures of dispersion (colloquially the spread in the data). Higher their values, more spread out the data is.

Statement B: The square root function is not linear and so standard deviation is biased when compared to variance.

Questions:
1. Do high variance and standard deviation mean greater variability in the data?
2. What does statement B mean?

andrewkirk · Aug 28, 2024

The answer to 1 is "Yes".
The answer to 2 is "nothing". It is a meaningless statement because the concept of 'bias' only applies to estimators. A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken". For more detail see https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation.

FactChecker · Aug 28, 2024

andrewkirk said:

A correct statement would be "The standard deviation of a sample from a population is a downwards-biased estimator of the standard deviation of the population from which the sample is taken".

That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.

Agent Smith · Aug 28, 2024

@andrewkirk , by downward bias, do you mean it underestimates the variability in the data?

@FactChecker I didn't know dividing by ##n - 1## instead of ##n## corrects the bias. Gracias.

Statistics is hard! And I'm merely scratching the surface.

Can someone tell me what "square root function is not linear" means? I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.

FactChecker · Aug 28, 2024

Agent Smith said:

Can someone tell me what "square root function is not linear" means?

The graph is not a straight line.

Agent Smith said:

I know that, if ##\text{Var(X)} = 100## and ##\text{Var(Y)} = 81## that ##100 - 81 = 9## but that ##\sqrt {100} - \sqrt {81} = 10 - 9 = 1## A ##9## point difference in variance looks like becomes only a ##1## point difference in standard deviation.

That is not a good way to judge if a function is linear. Consider the function, ##y=100 x##. It is linear, but a 1 unit change in ##x## becomes a 100 unit change in ##y##.

Dale · Aug 28, 2024

Agent Smith said:

TL;DR Summary: What are the uses of variance and standard deviation and how do they differ?

The square root function is not linear and so standard deviation is biased when compared to variance.

The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.

Hornbein · Aug 28, 2024

I'd say that they measure the same thing. The advantage of variance is that the variances of independent random variables add meaningfully while their standard deviations don't.

Agent Smith · Aug 28, 2024

@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.

Dale said:

The big difference between variance and standard deviation is that the standard deviation has the same units as the mean.

mjc123 · Aug 28, 2024

FactChecker said:

That is true if the sum is divided by n, but the OP has divided by n-1, giving an unbiased estimator.

This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.

hutchphd · Aug 28, 2024

Agent Smith said:

Statistics is hard! And I'm merely scratching the surface.

groan. Actually very nicely done. groan.

FactChecker · Aug 28, 2024

mjc123 said:

This is true of the variance, but not of the standard deviation. Sample standard deviation is always a biased estimator of population standard deviation.

I stand corrected. Thanks. I learned something I have had wrong all my life. The relevant part of this backs up what you said.

Hornbein · Aug 28, 2024

Agent Smith said:

@FactChecker , gracias for the graph, it makes it clearer. I was trying to show how a given difference in variance becomes a much smaller difference between the corresponding standard deviations.

This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.

Agent Smith · Aug 28, 2024

Hornbein said:

This is not correct. If the variance is less than one then the standard deviation is larger than the variance. And this depends entirely on what units one is using.

Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?

Agent Smith · Aug 28, 2024

@FactChecker

So if we take a linear function ##f(X) = 2X## then ##E[f(X)] = f(E[X])##?

WWGD · Aug 28, 2024

Agent Smith said:

@FactChecker View attachment 350525
So if we take a linear function ##f(X) = 2X## then ##E[f(X)] = f(E[X])##?

WWGD · Aug 28, 2024

No, expectation itself is linear. E[2X]=2E[X].

WWGD · Aug 28, 2024

Agent Smith said:

Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?

What do you mean by variance contradicting the standard deviation?
Re linearity, I suspect both biasedness as well as maximum likelihood estimators may only be preserved by linear transformations. Maybe @statdad can confirm or deny this?

Agent Smith · Aug 28, 2024

WWGD said:

What do you mean by variance contradicting the standard deviation?

Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.

Agent Smith · Aug 28, 2024

WWGD said:

No, expectation itself is linear. E[2X]=2E[X].

##f(X) = 2X##? The other function/operation must be linear?

WWGD · Aug 28, 2024

Agent Smith said:

##f(X) = 2X##? The other function/operation must be linear?

What I mean is that if X is a random variable, then the expectation of the RV 2X is twice the expectation of X.

Agent Smith · Aug 28, 2024

Agent Smith said:

Well, ##\text{Standard Deviation } \sigma > \text{Variance } \sigma^2##

The standard deviation suggests there's high variation while the variance suggests there's low variation. The same problem arises when ##\sigma^2 > 1##.

@WWGD

WWGD · Aug 28, 2024

Agent Smith said:

@WWGD

Well, you can argue too, that when Variance increases, so does SD, by their functional dependence, at least when variance >1. But what notion, messure other than those two, do you use as a measure of variability? But I was wrong above. There are nonlinear transformations that preserve unbiasedness, maximum likelihood property.

Agent Smith · Aug 28, 2024

@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?

Hornbein · Aug 29, 2024

Agent Smith said:

@WWGD it seems I unable to read statistical results. What does a standard deviation ##\sigma = 2## mean? Assuming a normal distribution, and a mean, ##\mu = 35##, I can infer that ##95\%## of the data lie in the range ##35 \pm 2(2)##. Is that how we interpret standard deviation?

That's close, yes.

Hornbein · Aug 29, 2024

Agent Smith said:

Point! But can the variance ever be < 1? I know variance can be 0 e.g. for the data set 4, 4, 4, 4, 4.

Say we take this data set: 4, 4, 4, 5.
The mean = 4.25
The variance = ##\displaystyle \frac{1}{n} \sum_{i = 1} ^4 \left(x_i - 4.25 \right)^2 = 0.203125##
The standard deviation = ##\sqrt {0.203125} \approx 0.451##

##\sigma > \sigma^2##

Is the variance contradicting the standard deviation?

Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.

Dale · Aug 29, 2024

The units are different, so it makes no sense to compare the size of variance and standard deviation. It does make sense to compare the size of the mean and the standard deviation, they are the same units.

hutchphd · Aug 29, 2024

There are two numbers. One is the square of the other. All the usual rules apply.
There is no statistical significance to any of this. Am I missing something??

Hornbein · Aug 29, 2024

All this is why statistics are standardized/normalized to the unitless "Z-score."

Vanadium 50 · Aug 29, 2024

hutchphd said:

There are two numbers. One is the square of the other. All the usual rules apply.

Including the most important - they have different units. What is bigger, a gallon or a calorie?

Agent Smith · Aug 29, 2024

@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra

)

Hornbein said:

Say your data set is 4, 4, 4, 5 is in liters.

This is exactly the same as 4000, 4000, 4000, 5000 milliliters. Calculate again and you get a larger variance, which is greater than the standard deviation. As you can see, comparing the magnitudes of the two isn't meaningful.

Vanadium 50 · Aug 29, 2024

Agent Smith said:

which is a bette

Which is better, momentum or energy? Temperature or pressure? Volume or area?

Dale · Aug 29, 2024

Agent Smith said:

which is a better, the variance or the standard deviation, in giving us an accurate measure of variability

The distinctions between them are not in terms of accuracy. Variance is nice because it is additive and you can partition a total variance into separate portions. The standard deviation is nice because it has the same units as the random variable itself and can be meaningfully compared to the mean.

Agent Smith said:

By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability

Neither of these statements is true.

Agent Smith · Aug 29, 2024

Agent Smith said:

@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra )

@Dale & @Vanadium 50

For ##\sigma = 9## (say) meters, the ##9## meters by itself is meaningless, oui?

gleem · Aug 30, 2024

One can deduce that the variance is a measure of the spread or dispersion of a probability distribution through Chebyshev's inequality. E. Parsen: "Modern Probability Theory and Its Applications"

Since I am not well versed in this subject I leave it to the cognoscenti to elaborate.

Agent Smith · Aug 30, 2024

gleem said:

One can deduce that the variance is a measure of the spread or dispersion of a probability distribution through Chebyshev's inequality. E. Parsen: "Modern Probability Theory and Its Applications"

Since I am not well versed in this subject I leave it to the cognoscenti to elaborate.

Gracias.

Variance & Standard Deviation

Similar threads

Hot Threads

Recent Insights