Variance & Standard Deviation

  • #36
Agent Smith said:
@Dale & @Vanadium 50 ☝️ 🤔

For ##\sigma = 9## (say) meters, the ##9## meters by itself is meaningless, oui?
##9 \mathrm{\ m}## is meaningful since it is never "by itself" but is in the context of the SI system. What I was objecting to was your claim that "By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability". Neither of these is true.

The numerical point you are alluding to is numerically wrong. Yes, ##9^2 = 81 > 9##, but ##0.9^2 = 0.81 < 0.9##. So it is not always true that squaring a number "amplifies" the number. Nor is it always true that taking the square root always "downsizes" the number. Yesterday I was working with a data set with means in the ##-0.010## range and standard deviations in the ##0.005## range, so variances had even smaller numbers that were annoying to format. The variances were decidedly not "amplified".

Also, the units don't work out. ##(9\mathrm{\ m})^2 = 81 \mathrm{\ m^2}## cannot be compared to ##9\mathrm{\ m}## at all. So you cannot say that ##81 \mathrm{\ m^2}## is "amplified" from ##9 \mathrm{\ m}##.

It doesn't make sense to me to talk about the magnitude or the size of the variability as something different from the standard deviation and the variance.
 
  • Like
Likes Vanadium 50 and Agent Smith
Physics news on Phys.org
  • #37
Dale said:
##9 \mathrm{\ m}## is meaningful since it is never "by itself" but is in the context of the SI system. What I was objecting to was your claim that "By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability". Neither of these is true.

The numerical point you are alluding to is numerically wrong. Yes, ##9^2 = 81 > 9##, but ##0.9^2 = 0.81 < 0.9##. So it is not always true that squaring a number "amplifies" the number. Nor is it always true that taking the square root always "downsizes" the number. Yesterday I was working with a data set with means in the ##-0.010## range and standard deviations in the ##0.005## range, so variances had even smaller numbers that were annoying to format. The variances were decidedly not "amplified".

Also, the units don't work out. ##(9\mathrm{\ m})^2 = 81 \mathrm{\ m^2}## cannot be compared to ##9\mathrm{\ m}## at all. So you cannot say that ##81 \mathrm{\ m^2}## is "amplified" from ##9 \mathrm{\ m}##.

It doesn't make sense to me to talk about the magnitude or the size of the variability as something different from the standard deviation and the variance.
Gracias for the clarification, but it's still non liquet.

Say we have a statistical report saying that for whale sharks, their lengths, ##\mu = 20 \text{ m}## and ##\sigma = 3 \text{ m}##, for me ##\sigma## only makes sense if linked to ##\mu## and one also has to have a fair idea of what a ##\text{meter}## is. I think someone mentioned that we need keep our units in mind before interpreting statistical information.

Correct?
 
  • #38
Agent Smith said:
for me ##\sigma## only makes sense if linked to ##\mu##
Why? I could have a random variable with ##\sigma = 3 \mathrm{\ m}## regardless of ##\mu##. The only thing that ##\sigma=3\mathrm{\ m}## tells us about ##\mu## is that ##\mu## has dimensions of length.
 
  • Like
Likes Agent Smith
  • #39
A follow up question.
Watched a video on Pearson's correlation coefficient: ##r = \frac{\sum (x - \overline x)(y - \overline y)}{\sqrt {\sum (x_i - \overline x)^2 \sum (y_i - \overline y)^2}}##. The denominator, says the author, is the process of standardization. I don't what that means. Is this ##\frac{x - \overline x}{\sigma}## (z score) also standardization?
 
  • #40
Agent Smith said:
A follow up question.
Watched a video on Pearson's correlation coefficient: ##r = \frac{\sum (x - \overline x)(y - \overline y)}{\sqrt {\sum (x_i - \overline x)^2 \sum (y_i - \overline y)^2}}##. The denominator, says the author, is the process of standardization. I don't what that means. Is this ##\frac{x - \overline x}{\sigma}## (z score) also standardization?
It normalized the result to be on within [-1, 1], where 1 is perfectly positively correlated, -1 is perfectly negatively correlated, and anything else is in between. When you want to know how strongly two things are related, you want to be able to ignore the scale of variation of each one. So you divide each one by its standard deviation.
Consider what happens without the denominators to normalize the calculations. Suppose I wanted to know how strongly the length of a person's left toe tended to imply other size characteristics.
In one case, I want to see how well it implied the length of the same person's right toe. Without the denominators, all the numbers and variations from the mean would be small and the calculation would be a small number even though you know the relationship is very strong.
In another case, I want to see how well it implied the heights of the same people. You know that the relationship is not as strong, but the calculation will give a larger number because the heights are larger and have more variation.
 
Last edited:
  • Like
Likes Dale and Agent Smith
  • #41
The words "standardized" and "normalized" get tossed around quite a bit in these discussions, rather like "derivative" and "integral" get tossed around in analysis: the words can mean different procedures while describing things that are quite similar in concept.

Think about the correlation coefficient you have posted. The expression $$\frac 1 n \sum{\left(x-\overline{x}\right)\left(y-\overline{y}\right)}$$ is the covariance of the variables, and its size depends on the units of both variables, so it can be arbitrarily large or small, depending on the scales of those variables.

Remember that the (biased) standard deviation for x is $$\sqrt{\frac 1 n \sum{\left(x-\overline{x}\right)^2}}$$, with a similar expression for y. The correlation coefficient is the covariance divided by the product of the two standard deviations: this has the effect of cancelling out the units of both variables so that correlation is a pure number. That process is usually referred to as normalization, but unfortunately standardization is also used.

How does it ensure correlation is between -1 and 1? It comes from the Cauchy-Schwarz inequality that says $$\left(\sum{ab}\right)^2 \le \left(\sum{a^2}\right)\left(\sum{b^2}\right)$$. If you divide the both sides by the right you get $$\frac{\left(\sum{ab}\right)^2}{\left(\sum{a^2}\right)\left(\sum{b^2}\right)} \le 1$$, and taking square roots gets the "-1 <= *** <= 1" part. Use ##x - \overline{x}## for a and##y - \overline y## for b.
 
  • Like
Likes Agent Smith and FactChecker
  • #42
What is this
Capture.PNG
 
  • #43
  • Like
Likes Agent Smith

Similar threads

Back
Top