Variance & Standard Deviation

Dale · Aug 30, 2024

Agent Smith said:

@Dale & @Vanadium 50

For ##\sigma = 9## (say) meters, the ##9## meters by itself is meaningless, oui?

##9 \mathrm{\ m}## is meaningful since it is never "by itself" but is in the context of the SI system. What I was objecting to was your claim that "By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability". Neither of these is true.

The numerical point you are alluding to is numerically wrong. Yes, ##9^2 = 81 > 9##, but ##0.9^2 = 0.81 < 0.9##. So it is not always true that squaring a number "amplifies" the number. Nor is it always true that taking the square root always "downsizes" the number. Yesterday I was working with a data set with means in the ##-0.010## range and standard deviations in the ##0.005## range, so variances had even smaller numbers that were annoying to format. The variances were decidedly not "amplified".

Also, the units don't work out. ##(9\mathrm{\ m})^2 = 81 \mathrm{\ m^2}## cannot be compared to ##9\mathrm{\ m}## at all. So you cannot say that ##81 \mathrm{\ m^2}## is "amplified" from ##9 \mathrm{\ m}##.

It doesn't make sense to me to talk about the magnitude or the size of the variability as something different from the standard deviation and the variance.

Agent Smith · Aug 30, 2024

Dale said:

##9 \mathrm{\ m}## is meaningful since it is never "by itself" but is in the context of the SI system. What I was objecting to was your claim that "By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability". Neither of these is true.

The numerical point you are alluding to is numerically wrong. Yes, ##9^2 = 81 > 9##, but ##0.9^2 = 0.81 < 0.9##. So it is not always true that squaring a number "amplifies" the number. Nor is it always true that taking the square root always "downsizes" the number. Yesterday I was working with a data set with means in the ##-0.010## range and standard deviations in the ##0.005## range, so variances had even smaller numbers that were annoying to format. The variances were decidedly not "amplified".

Also, the units don't work out. ##(9\mathrm{\ m})^2 = 81 \mathrm{\ m^2}## cannot be compared to ##9\mathrm{\ m}## at all. So you cannot say that ##81 \mathrm{\ m^2}## is "amplified" from ##9 \mathrm{\ m}##.

It doesn't make sense to me to talk about the magnitude or the size of the variability as something different from the standard deviation and the variance.

Gracias for the clarification, but it's still non liquet.

Say we have a statistical report saying that for whale sharks, their lengths, ##\mu = 20 \text{ m}## and ##\sigma = 3 \text{ m}##, for me ##\sigma## only makes sense if linked to ##\mu## and one also has to have a fair idea of what a ##\text{meter}## is. I think someone mentioned that we need keep our units in mind before interpreting statistical information.

Correct?

Dale · Aug 30, 2024

Agent Smith said:

for me ##\sigma## only makes sense if linked to ##\mu##

Why? I could have a random variable with ##\sigma = 3 \mathrm{\ m}## regardless of ##\mu##. The only thing that ##\sigma=3\mathrm{\ m}## tells us about ##\mu## is that ##\mu## has dimensions of length.

Agent Smith · Oct 10, 2024

A follow up question.
Watched a video on Pearson's correlation coefficient: ##r = \frac{\sum (x - \overline x)(y - \overline y)}{\sqrt {\sum (x_i - \overline x)^2 \sum (y_i - \overline y)^2}}##. The denominator, says the author, is the process of standardization. I don't what that means. Is this ##\frac{x - \overline x}{\sigma}## (z score) also standardization?

FactChecker · Oct 10, 2024

Agent Smith said:

A follow up question.
Watched a video on Pearson's correlation coefficient: ##r = \frac{\sum (x - \overline x)(y - \overline y)}{\sqrt {\sum (x_i - \overline x)^2 \sum (y_i - \overline y)^2}}##. The denominator, says the author, is the process of standardization. I don't what that means. Is this ##\frac{x - \overline x}{\sigma}## (z score) also standardization?

It normalized the result to be on within [-1, 1], where 1 is perfectly positively correlated, -1 is perfectly negatively correlated, and anything else is in between. When you want to know how strongly two things are related, you want to be able to ignore the scale of variation of each one. So you divide each one by its standard deviation.
Consider what happens without the denominators to normalize the calculations. Suppose I wanted to know how strongly the length of a person's left toe tended to imply other size characteristics.
In one case, I want to see how well it implied the length of the same person's right toe. Without the denominators, all the numbers and variations from the mean would be small and the calculation would be a small number even though you know the relationship is very strong.
In another case, I want to see how well it implied the heights of the same people. You know that the relationship is not as strong, but the calculation will give a larger number because the heights are larger and have more variation.

statdad · Dec 24, 2024

The words "standardized" and "normalized" get tossed around quite a bit in these discussions, rather like "derivative" and "integral" get tossed around in analysis: the words can mean different procedures while describing things that are quite similar in concept.

Think about the correlation coefficient you have posted. The expression $$\frac 1 n \sum{\left(x-\overline{x}\right)\left(y-\overline{y}\right)}$$ is the covariance of the variables, and its size depends on the units of both variables, so it can be arbitrarily large or small, depending on the scales of those variables.

Remember that the (biased) standard deviation for x is $$\sqrt{\frac 1 n \sum{\left(x-\overline{x}\right)^2}}$$, with a similar expression for y. The correlation coefficient is the covariance divided by the product of the two standard deviations: this has the effect of cancelling out the units of both variables so that correlation is a pure number. That process is usually referred to as normalization, but unfortunately standardization is also used.

How does it ensure correlation is between -1 and 1? It comes from the Cauchy-Schwarz inequality that says $$\left(\sum{ab}\right)^2 \le \left(\sum{a^2}\right)\left(\sum{b^2}\right)$$. If you divide the both sides by the right you get $$\frac{\left(\sum{ab}\right)^2}{\left(\sum{a^2}\right)\left(\sum{b^2}\right)} \le 1$$, and taking square roots gets the "-1 <= *** <= 1" part. Use ##x - \overline{x}## for a and##y - \overline y## for b.

Agent Smith · Dec 24, 2024

What is this

FactChecker · Dec 24, 2024

Agent Smith said:

What is this View attachment 354857

It's the Cauchy-Schwartz inequality. See this.

Variance & Standard Deviation

Similar threads

Hot Threads

Recent Insights