Which is better? Mean with standard deviation or Median with IQR?

In summary, the conversation discusses the dilemma of choosing between using mean with standard deviation or median with IQR as a representation for data. The group discusses the advantages and disadvantages of each statistic and how they help in understanding the distribution of the data. The mean is considered the ideal or expected score, while the standard deviation measures the spread of the data. The median, on the other hand, is not affected by outliers and is more suitable for skewed data. Ultimately, the choice of statistic depends on the purpose and interpretation of the data.
  • #1
3vo
7
0
Hi guys,

I hope someone is able to help, I'm currently stuck on a problem.

I'm having trouble justifying which representation is more accurate for my data; either mean with standard deviation or median with IQR.
I've calculated both averages for the my data, however I was advised by someone that the mean with standard deviation was a better representation. I don't understand how this could be the case as my ogive graph seems to indicate that the distribution is wide and uneven. I was under the impression if distribution is ever uneven, to always use the median over mean as median is not affected by outliers.

Why in this case would the mean be better than the median? Or is median better representation, if so why?
My mean value was 8.6 with a standard deviation of 14.5 whilst the median was 3.44 with IQR of 9.5.

TBH I don't know what to do with the standard deviation and IQR values. I understand they are measures of spread, but what do they mean to the data? Would appreciate if anyone could also explain what I do with them so I can justify which would be the better representation.

I've included some links to a copy of my data table and ogive graph if that helps.

Any advice given would be appreciated. Thanks in advance.

P.s. Please correct me if I've done my ogive graph wrong, first real attempt at it.

Data table:
http://www.talkstats.com/attachment.php?attachmentid=3930&d=1385031179

Ogive:
http://www.talkstats.com/attachment.php?attachmentid=3931&d=1385031208

Calculation of standard deviation, incase I've gone wrong:
http://www.talkstats.com/attachment.php?attachmentid=3932&d=1385031219
 
Mathematics news on Phys.org
  • #2
3vo said:
Hi guys,

I hope someone is able to help, I'm currently stuck on a problem.

I'm having trouble justifying which representation is more accurate for my data; either mean with standard deviation or median with IQR.
I've calculated both averages for the my data, however I was advised by someone that the mean with standard deviation was a better representation. I don't understand how this could be the case as my ogive graph seems to indicate that the distribution is wide and uneven. I was under the impression if distribution is ever uneven, to always use the median over mean as median is not affected by outliers.

Why in this case would the mean be better than the median? Or is median better representation, if so why?
My mean value was 8.6 with a standard deviation of 14.5 whilst the median was 3.44 with IQR of 9.5.

TBH I don't know what to do with the standard deviation and IQR values. I understand they are measures of spread, but what do they mean to the data? Would appreciate if anyone could also explain what I do with them so I can justify which would be the better representation.

I've included some links to a copy of my data table and ogive graph if that helps.

Any advice given would be appreciated. Thanks in advance.

P.s. Please correct me if I've done my ogive graph wrong, first real attempt at it.

Data table:
http://www.talkstats.com/attachment.php?attachmentid=3930&d=1385031179

Ogive:
http://www.talkstats.com/attachment.php?attachmentid=3931&d=1385031208

Calculation of standard deviation, incase I've gone wrong:
http://www.talkstats.com/attachment.php?attachmentid=3932&d=1385031219

Wellcome on MHB 3vo!... before to say if is better to use mean value and variance or median value and interquartile range it is iportant to remember that the last exist for any PDF and the first not. As example we can consider the so called Cauchy distribution that in the case of symmetry around x=0 has the form...

$\displaystyle f(x) = \frac{1}{\pi} \frac{1}{1+x^{2}}\ (1)$

In You try to find mean and variance in standard fashion You obtain...

$\displaystyle \mu = \frac{1}{\pi}\ \int_{- \infty}^{+ \infty} \frac{x}{1+x^{2}}\ dx\ (2)$

$\displaystyle \sigma^{2} = \frac{1}{\pi}\ \int_{- \infty}^{+ \infty} \frac{x^{2}}{1+x^{2}}\ dx\ (3)$

... and both the integrals don't converge...

Kind regards

$\chi$ $\sigma$
 
  • #3
Hi 3vo, (Wave)

Welcome to MHB!

I've been trying to make sense of the table you were given for a bit now and think I finally understand what is going on, but this isn't normally how I have seen these problems. You are essentially supposed to use the midpoint in each range as the value of $x$ for that range then you can multiply by the corresponding probability to get the mean and S.D.

You aren't given probabilities though, you are given frequencies so you we are calculating probabilities indirectly. You can think of the mean as:

\(\displaystyle \sum_{i=1}^{n}x_i \left( \frac{f_i}{n_i} \right)\) or as in the problem \(\displaystyle \sum_{i=1}^{n}\frac{f_i x_i}{n_i}=\frac{\sum_{i=1}^{n}f_i x_i}{n}\)

Anyway, the above isn't too important. It's just interesting to see how this is presented to you. I think I got the table now. :)

You are correct that the mean is easily affected by outliers so in those cases we usually use the median instead. This is also true when the data is skewed left or right. For a normal distribution, 3 S.D. to the left and right covers about 99.7% of the data. For your data set one S.D. to the left is already outside of the possible values for $t$, so this value only has meaning for one side of the mean. It does imply skewed data though.

It depends on what information can be presented with your choice, like can we state the $0 \le t < 60$? I would probably go with the median and IQR because this shows where most of the data is contained but I don't know all the details of what your teacher expects.

Hope this helps some, even if it's not that definitive.
 
  • #4
Since this has been posted in BASIC Probability and Statistics, we should just discuss what each of the statistics does, and the situations when each statistic is better to be used.

We need to remember that to calculate the mean, we need to add all the scores, then divide by the number of scores. This means that EVERY score is taken into account.
The mean could then be considered your "ideal" or "expected" score. It also gives us an idea of what happens in the "centre" of the distribution.
Obviously not every score is going to be this ideal score, and only talks about what happens in the centre, so we would like a way to measure the "spread" of the data as well.
A deviation is the difference between the observed score and the expected (mean) score. You would probably think that a measure of spread from the mean could be done just by averaging all the deviations, and it could be, but a problem is that positive and negative deviations could cancel each other out, thereby losing information. So to get around this, we square each deviation first (thereby making everything nonnegative) and then average. This gives the "variance". Then the standard deviation is found by taking the square root of the variance (kind of taking into account the fact that we had to square all the deviations first).

As for the median, we need to note that it is found by having the data put in order and then finding the middle score. This splits the data into two halves. Also note that not every score will affect the median, only its position. This means that the median is not affected by outliers, like the mean would be.
We can then go further and find the median of each of the halves of the data, giving us quartiles (quarters of the data). The difference between the upper (3rd) quartile and the lower (1st) quartile gives us the InterQuartile Range (IQR), another measure of spread. The IQR also gives us a way to determine if we have outliers in the data, as each of the first and last quarters of the data can not spread any further than 1.5 x IQR, so any values that do lie outside this spread are considered outliers.

So to answer your question, when is it better to use each pair of statistics, you really should always determine both sets, and then determine if you have outliers. If you do, use the median and IQR. If not, use the mean and SD.
 
  • #5
Thanks to all those that replied!
 

FAQ: Which is better? Mean with standard deviation or Median with IQR?

Which measure of central tendency is more commonly used in statistical analysis?

The mean with standard deviation is more commonly used in statistical analysis as it takes into account all the values in a dataset and provides a measure of the overall average.

When should I use median with IQR instead of mean with standard deviation?

Median with IQR should be used when the data is skewed or has outliers. This is because the median is not affected by extreme values, unlike the mean which can be greatly influenced by outliers.

How do I interpret the results when using mean with standard deviation?

When using mean with standard deviation, the mean represents the average value of the dataset and the standard deviation represents the spread of the data around the mean. A higher standard deviation indicates a larger spread of the data points, while a lower standard deviation indicates a more clustered dataset.

Is one measure of central tendency more accurate than the other?

Neither measure of central tendency is inherently more accurate than the other. The choice between mean with standard deviation and median with IQR depends on the type of data and the research question being addressed. Both measures have their own strengths and limitations, and it is important to consider the context before deciding which one to use.

Can I use both mean with standard deviation and median with IQR in my analysis?

Yes, it is possible to use both measures in your analysis. In fact, using both can provide a more comprehensive understanding of the data. However, it is important to clearly state which measure you are using and why in order to avoid confusion and ensure the accuracy of your results.

Back
Top