Why is standard deviation the preferred method for measuring dispersion?

  • Thread starter striphe
  • Start date
  • Tags
    Dispersion
In summary: No, it is the same as MD multiplied by the square root of pi over two, not divided by it. This is because the population SD is typically represented by the Greek letter sigma, which has the value of \sqrt{{1 \over n} \sum_{i=1}^{n} (x_i - \mu)^2}. However, the mean absolute deviation is typically represented by the letter d, not MD, so there is some confusion there.In summary, there is a difference between standard deviation and mean absolute deviation in terms of their use and properties in measuring dispersion. While standard deviation is commonly used due to its linearity properties and assumption of normality, mean absolute deviation may provide a more robust measure for non-normal
  • #1
striphe
125
1
I’m wondering why standard deviation is used as the main method of measuring dispersion, when i would consider that more ergonomic (user friendly) measurements are possible.

An example of such would be, the sum of |x-mean|/ n . I would think that the mean plus and minus this value would account for 50% of a normal distribution.
 
Physics news on Phys.org
  • #2
striphe said:
An example of such would be, the sum of |x-mean|/ n . I would think that the mean plus and minus this value would account for 50% of a normal distribution.

What is the x in this formula?
 
  • #3
an observation value.
 
  • #4
striphe said:
An example of such would be, the sum of |x-mean|/ n .

This would be the average absolute deviation, useful for distributions without a second moment but doesn't have the same linearity properties as variance (squared stdev). Other measures such as the median absolute deviation are useful for being more robust to outliers.
 
  • #5
whats the advantages of stdev over average absolute deviation
 
  • #6
Much of the reason the standard deviation and variance are prevalent is due to one thing: for a long time the notion that data comes from normally distributed populations ruled (some may argue it still rules). In probability/mathematical statistics, as soon as the form of the population distribution is assumed, certain statistics are "better" than others because they are motivated by the distribution itself. If you believe data are normally distributed
* the biased version of the sample variance is the maximum likelihood estimate of population variance
* the unbiased version is independent of the sample mean


The quantity

[tex]
\frac 1 n \sum_{i=1}^n |x_i - \overline x|
[/tex]

doesn't have a simple analog in the normal model, and as given is not an unbiased estimate of the standard deviation or variance. It is worth noting, however, that the question of whether the sample standard deviation or the average absolute deviation was the more appropriate measure of dispersion was more appropriate (the physicist eddington, and statistician fisher were involved). fisher's argument, again based on the assumption of normality, was considered better at the time.

This write-up
http://www.leeds.ac.uk/educol/documents/00003759.htm

may give you a better feel for the discussion.
 
  • #7
The two main arguments for standard deviation:
(a) in perfect normal distributions conditions the sample SD is closer to the population SD
(b) SD is easier to manipulate than MD

Argument (a) is cut down rather easily by the paper, but argument (b) i am unsure why it exists. The absolute value of a figure is no different to squaring the figure than square rooting the figure. Using this logic there is minimal difference between the two

SD = ((x-mean)^2/n)^0.5
MD = (((x-mean)^2)^0.5)/n

To me they would be no difference in difficulty in working with these algebraically.

Am I right in suggesting that mean+- SD represents 50% of a normal distribution?
 
  • #8
=striphe;2872228

Am I right in suggesting that mean+- SD represents 50% of a normal distribution?

You are not right.

By Chebyshev's Theorem for at least 50% of the data lying within k standard deviations of the mean:

[tex]P(min_ x=0.5)=1-\frac{1}{k^2}=1-\frac{1}{2}[/tex]

So [tex]k=\sqrt{2}=1.414[/tex] SD of the mean
 
Last edited:
  • #9
striphe said:
Am I right in suggesting that mean+- SD represents 50% of a normal distribution?

No, that would be 68.2% of a normal distribution.
 
  • #10
CRGreathouse said:
No, that would be 68.2% of a normal distribution.

Yes. That would be for the Standard Normal Distribution. Chebyshev's Theorem is more general and applies to any distribution. For the p(x)=0.682 Chebychev's Theorem gives approximately [tex]\sqrt{3}[/tex]. standard deviations.

http://www.philender.com/courses/intro/notes3/chebyshev.html

In other words, the normal assumption is not needed to refute the suggestion.
 
Last edited:
  • #11
sorry I meant MD
 
  • #12
striphe said:
sorry I meant MD

Since you specifically referred to the normal distribution in your most recent question, Chebyshev's theorem is not needed to answer any question about the percentage within one standard deviation of the mean - it wouldn't even be appropriate since, if you know or are assuming normality, you an leverage that to get the 68% value. (Whether normality is a reasonable assumption is an entirely different question.) However, even this is poorly worded.

* If you are discussing only the population, then you need to work with the parameters, and [tex] \mu \pm \sigma [/tex] contains roughly the central 68% of the distribution

* If you have a sample which you've deemed to be mound-shaped and symmetric in its distribution, then [tex] \overline x \pm sd [/tex] contains roughly 68% of the sample values

Now the population MD is [tex] \sigma \sqrt{\, \frac 2 {\pi}} [/tex] for a normal distribution, so (population again)

[tex]
\mu \pm MD \sqrt{\, \frac{\pi} 2}
[/tex]

will contain roughly the central 68% of the population. A similar comment could be made for the sample versions when the sample distribution is mound-shaped and symmetric

However, if the sample is skewed, there is not (that I know of) anything like Chebyshev's theorem for using [tex] \overline x [/tex] and MD. (The idea above won't work, since the
simple relationship between SD and MD doesn't hold without the normality assumption).
 
  • #13
statdad said:
* If you are discussing only the population, then you need to work with the parameters, and [tex] \mu \pm \sigma [/tex] contains roughly the central 68% of the distribution

* If you have a sample which you've deemed to be mound-shaped and symmetric in its distribution, then [tex] \overline x \pm sd [/tex] contains roughly 68% of the sample values

Now the population MD is [tex] \sigma \sqrt{\, \frac 2 {\pi}} [/tex] for a normal distribution, so (population again)

[tex]
\mu \pm MD \sqrt{\, \frac{\pi} 2}
[/tex]

will contain roughly the central 68% of the population. A similar comment could be made for the sample versions when the sample distribution is mound-shaped and symmetric

Did you mean:

[tex]AMD=\sigma \sqrt{\, \frac 2{\pi}}[/tex] ?

Then for 1 AMD: 0682*0.798=0.544 of the population.
 
  • #14
Yes, it is true that
[tex]
MD = \sigma \sqrt{\, \frac{2}{\pi}}
[/tex]

This means that

[tex]
\sigma = MD \sqrt{\, \frac{\pi}2}
[/tex]

so that

[tex]
\mu \pm \sigma
[/tex]

is the same as

[tex]
\mu \pm MD \sqrt{\, \frac{\pi}{2}}
[/tex]

The percentage of area trapped between these limits does not get multiplied by the square root term.

In other words,

[tex]
MD \sqrt{\, \frac{\pi}{2}}
[/tex]

is simply another way of writing the population SD and can be used interchangeably with it.
 
Last edited:
  • #15
statdad said:
In other words,

[tex]
MD \sqrt{\, \frac{\pi}{2}}
[/tex]

is simply another way of writing the population SD and can be used interchangeably with it.

Yes, but the OP was asking ( after correcting him/herself in post 11) if 50% of the population was within 1 AMD of the mean with a normal distribution. In fact it's 54.4%. No?

EDIT: Also, shouldn't we talking about AMD, not MD? The latter is signed and is used in the calculation of covariance.

EDIT: BTW I misread the OPs post 7 and thought he/she was talking about the AMD anyway. I jumped to Chevychev's Theorem because I've never related the AMD to the normal distribution. I didn't know of the relation until you pointed it out and I checked it for myself.
 
Last edited:
  • #16
SW VandeCarr said:
Yes, but the OP was asking ( after correcting him/herself in post 11) if 50% of the population was within 1 AMD of the mean with a normal distribution. In fact it's 54.4%. No?

EDIT: Also, shouldn't we talking about AMD, not MD? The latter is signed and is used in the calculation of covariance.

EDIT: BTW I misread the OPs post 7 and thought he/she was talking about the AMD anyway. I jumped to Chevychev's Theorem because I've never related the AMD to the normal distribution. I didn't know of the relation until you pointed it out and I checked it for myself.

I guess I've been using MD meaning absolute mean deviation - sorry.

The question in post 11: unless the AMD is modified to represent the standard deviation, I don't believe there is a simple way to provide an answer. That's why I wrote what I did.
 
  • #17
I considered that 50% of all values lie on each side of the mean and that if one was to split a population in two so that there exists one set of values below the mean and one set above the mean, you could calculate a mean for each of them, which i will refer to as the half means.

The observations between the half means represent 50% of the population. This average mean deviation, is this distance divided by 2 and so i would have thought that if a population is not skewed, then 50% would be the correct value of how much of a population lies between mean+-MD.
 
  • #18
striphe said:
I considered that 50% of all values lie on each side of the mean and that if one was to split a population in two so that there exists one set of values below the mean and one set above the mean, you could calculate a mean for each of them, which i will refer to as the half means.

The observations between the half means represent 50% of the population. This average mean deviation, is this distance divided by 2 and so i would have thought that if a population is not skewed, then 50% would be the correct value of how much of a population lies between mean+-MD.

If you are talking about a distribution where the concept of a standard deviation applies (it doesn't apply to all distributions) then, using Chevychev's Theorem, 50% of observations will lie within 1.414 SD of the mean. That's the total of both sides of the mean.

In a standard normal distribution 68.2% of the area under the curve lies within 1 SD of the mean (both sides). If you want to use the AMD with the normal distribution I calculate that 55.4% of the area under the curve will lie within 1 AMD of the mean. See statdad's and my previous posts. I've never considered using AMD in hypothesis testing.

In my own experience, we only used the normal curve SD when a good symmetrical normal distribution of the data was at hand. Otherwise, some of us preferred using Chevychev's Theorem (CT) instead of normalization techniques. CT is more conservative for hypothesis testing in the tails of the distribution when the population is skewed. A two sided p=0.05 alpha requires only 1.96 SD for the normal distribution while the two sided CTSD requires 3.16 SD.

The two sided test is based on both tails having 0.05 of the area under the curve so the total area in the tails is 0.10 and the area between is 0.90; so with CT; [tex] Pr=1-\frac {1}{10}, SD=\sqrt 10.[/tex]

http://webusers.globale.net/josborne/Stats/ChebychevTheorem.PDF
 
Last edited by a moderator:
  • #19
Finally, does there exist terminology for these half means that i was talking about?
 
  • #20
striphe said:
I considered that 50% of all values lie on each side of the mean and that if one was to split a population in two so that there exists one set of values below the mean and one set above the mean, you could calculate a mean for each of them, which i will refer to as the half means.

The observations between the half means represent 50% of the population. This average mean deviation, is this distance divided by 2 and so i would have thought that if a population is not skewed, then 50% would be the correct value of how much of a population lies between mean+-MD.

If I understand you correctly your situation defines the quartiles. As a very crude picture, imagine the data stretched along the number line.




Min Q1 Median Q3 Max


Q1 = first quartile (same as 25th percentile)
Q3 = third quartile (same as 75th percentile)

Then 25% of the data is between Min and Q1
25% of the data is between Q1 and median
25% of the data is between median and Q3
25% of the data is between Q3 and max

Does this sound like Q1 and Q3 are the half-means you are thinking of?
 
  • #21
SW VandeCarr said:
If you are talking about a distribution where the concept of a standard deviation applies (it d

In my own experience, we only used the normal curve SD when a good symmetrical normal distribution of the data was at hand. Otherwise, some of us preferred using Chevychev's Theorem (CT) instead of normalization techniques. CT is more conservative for hypothesis testing in the tails of the distribution when the population is skewed. A two sided p=0.05 alpha requires only 1.96 SD for the normal distribution while the two sided CTSD requires 3.16 SD.

The two sided test is based on both tails having 0.05 of the area under the curve so the total area in the tails is 0.10 and the area between is 0.90; so with CT; [tex] Pr=1-\frac {1}{10}, SD=\sqrt 10.[/tex]

http://webusers.globale.net/josborne/Stats/ChebychevTheorem.PDF

I'd be very cautious in the use of Chebychev's theorem this way: even though you find the central 90% of the area with Chebyshev's rule, if the distribution is skewed there is no way at all to know that the two tails split the remaining 10% equally: it could be 2% and 8% or any other combination. If you assume it's equally split you are essentially assuming symmetry.

In fact, since Chebyshev's theorem gives a lower bound for the central area, the proper statement is that you've trapped at least 90% of the central area: if it's higher, there is less than 10% of the area in the tails, and your 90% confidence level goes by the wayside.
 
Last edited by a moderator:
  • #22
When i made my 50% assumption, I made the mistake of considering that 50% of observations always lie on each side of mean and than passed that on to these half means.

As median and mean are different, so are these half means from quartiles
 
  • #23
statdad said:
In fact, since Chebyshev's theorem gives a lower bound for the central area, the proper statement is that you've trapped at least 90% of the central area: if it's higher, there is less than 10% of the area in the tails, and your 90% confidence level goes by the wayside.

That's my point. We were much more concerned with alpha rather than beta error, so C's theorem in more conservative. We know at least 90% of the area is between the tails Moreover we know the shape of the distribution and which tail is in the direction of the alternative hypothesis.
 
Last edited:
  • #24
SW VandeCarr said:
That's my point. We were much more concerned with alpha rather than beta error, so C's theorem in more conservative. We know at least 90% of the area is between the tails Moreover we know the shape of the distribution and which tail is in the direction of the alternative hypothesis.

Then I misunderstood, although I'm not sure I quite have it still. If you know which tail is of interest, why use a two-sided procedure? (Mostly rhetorical question here, not a burning issue: as long as you and your colleagues have sorted things out, there is no reason to use your time explaining to me.)
 
  • #25
statdad said:
Then I misunderstood, although I'm not sure I quite have it still. If you know which tail is of interest, why use a two-sided procedure? (Mostly rhetorical question here, not a burning issue: as long as you and your colleagues have sorted things out, there is no reason to use your time explaining to me.)

It's straightforward. If you have a skewed distribution in say a clinical trail population (not really a random sample of a population) you can normalize the distribution with some transformation. However some of us thought to use the C theorem instead of transforming data points (which can be problematic). It wasn't intended for publication, but to satisfy ourselves that we had solid statistical significance. Afaik, the C theorem doesn't provide for a one sided evaluation. We mainly focused on getting (for the actual trial data) least 95% between the tails ie:
[tex]\sqrt{20}= 4.472 SD[/tex].
 
Last edited:
  • #26
Concerning the initial question of why variance (and its square root, the standard deviation) is the most commonly used measure of uncertainty:

The use of variance is linked to the use of the mean of a random variable as an estimator of the variable, since the choosing an estimator that minimizes the expected squared distance to the true value is equivalent to choosing the mean value [itex]\mu[/itex] as estimator

[tex]\texttt{argmin}E\left[(X-\hat{x})^2\right] = \mu[/tex]

If you use some other measure of uncertainty, such as the expected absolute distance to the true value, the mean will no longer the estimator that minimizes uncertainty
 

FAQ: Why is standard deviation the preferred method for measuring dispersion?

What is standard deviation and why is it important in measuring dispersion?

Standard deviation is a statistical measure of how spread out the data is from the mean. It is important in measuring dispersion because it gives us a better understanding of the variability in a data set.

How is standard deviation calculated?

Standard deviation is calculated by taking the square root of the variance, which is the average of the squared differences from the mean.

What are the advantages of using standard deviation over other measures of dispersion?

Standard deviation is preferred over other measures of dispersion because it takes into account all of the data points in a data set, rather than just a few extreme values. It also provides a more intuitive understanding of the spread of the data.

Is standard deviation affected by outliers in the data?

Yes, standard deviation can be affected by outliers in the data. However, because it takes into account all of the data points, it is less affected by outliers than other measures of dispersion, such as range or interquartile range.

In what situations is standard deviation not the preferred method for measuring dispersion?

Standard deviation may not be the preferred method for measuring dispersion when the data is significantly skewed or when the data contains extreme outliers. In these cases, other measures such as median absolute deviation or interquartile range may be more appropriate.

Similar threads

2
Replies
42
Views
2K
Replies
3
Views
1K
Replies
4
Views
1K
Replies
15
Views
2K
Replies
6
Views
2K
Replies
42
Views
3K
Replies
9
Views
2K
Back
Top