Finding Quartiles for Ungrouped Data: Correct Method and Precision

  • Thread starter stpmmaths
  • Start date
  • Tags
    Data
In summary: I also missed the point that you already have a median value. When you have an even n, you use that median value as the middle of the middle half of the data. So for the quartiles you need to find the median of the lower half and the median of the upper half. That's why you go 2.5 and 7.5 from the median value. In summary, when calculating quartiles, it is important to use the correct formula and understand the concept of quantiles. Quartiles divide a set of data into four equal parts, with the first quartile containing the lowest 25% and the fourth quartile containing the highest 25%. The second quartile is the median and the third quartile falls
  • #1
stpmmaths
28
0
From the question, is the way to find Lower Quartiles and Upper Quartiles correct? I have seen books taking the 3rd and 8th (from the question) as Lower Quartiles and Upper Quartiles respectively. Which should be the correct Quartiles?
 

Attachments

  • DSC00198.jpg
    DSC00198.jpg
    34.1 KB · Views: 5,915
Physics news on Phys.org
  • #2
stpmmaths said:
From the question, is the way to find Lower Quartiles and Upper Quartiles correct? I have seen books taking the 3rd and 8th (from the question) as Lower Quartiles and Upper Quartiles respectively. Which should be the correct Quartiles?

Are you asking how to calculate quartiles or how to interpret them? First, there's no such thing as an eighth quartile. By definition a quartile partitions the data into four ordered sets of data. The generic term is quantile which can be any number of equal divisions of data points. The 1st quartile contains the upper 25% of the data points. the last quartile the lowest 25% of data. All data values must be ranked putting them into correspondence with the integers 1 through k, 1<k. If the total number of ranked data points is n and k is a chosen data point [itex] k \leq n[/itex] then:

[itex]P[X < x] \leq k/n; P[X\geq x] \geq 1 - (k/n)[/itex]

So by the first inequality if x is ranked 5th highest point out of 100 data points, then k=95 and P=0.95 which is the 95th percentile. It seems you want the upper quartile (top 25%), and lower quartile (bottom 25%) . The meaning of the term 75th percentile is that 75% of all data points are less than the lowest data point of the upper quartile.
 
Last edited:
  • #4
stpmmaths said:
Based on the attachment https://www.physicsforums.com/attachment.php?attachmentid=44365&d=1330184818, is this the correct way to interpret quartile?

I'm having a hard time reading it, but to establish quartiles it's the number of data points and their quantitative rank that matter, not their actual values. So if n=15, the median value is k/n= 0.5. Solve for k to get 7.5. For the quartile: k/n= 0.25. k= 3.75. So the lower boundary of the upper quartile would be 15-3.75=11.25. This would include your top four ranked values which would be your last four data points in counting order: the 12th, 13th, 14th and 15th data points.

If you type out what you're doing, I can tell you more, You seem to be doing it correctly. For an even number of values, some people use k+1, as you have, so quantile boundaries do not fall on data points. The value of your median is then 5.5 and the quartile boundaries would be calculated using 2.75. So 5.5 - 2.75 = 2.75. Your answer could be this or 2,25. I'm not sure which.
 
Last edited:
  • #5
There are 10 data values in my attached example.

{51, 55, 57, 61, 62, 67, 70, 72, 73, 74}

Q1 = 56.5
Q3 = 72.25

But
Even-sized population

Consider an ordered population of 10 data values {3, 6, 7, 8, 8, 10, 13, 15, 16, 20}.

The rank of the first quartile is 10×(1/4) = 2.5, which rounds up to 3, meaning that 3 is the rank in the population (from least to greatest values) at which approximately 1/4 of the values are less than the value of the first quartile. The third value in the population is 7.
The rank of the second quartile (same as the median) is 10×(2/4) = 5, which is an integer, while the number of values (10) is an even number, so the average of both the fifth and sixth values is taken—that is (8+10)/2 = 9, though any value from 8 through to 10 could be taken to be the median.
The rank of the third quartile is 10×(3/4) = 7.5, which rounds up to 8. The eighth value in the population is 15.

from http://en.wikipedia.org/wiki/Quantile

Q1 = 57
Q3 = 72 instead
 
  • #6
stpmmaths said:
There are 10 data values in my attached example.

{51, 55, 57, 61, 62, 67, 70, 72, 73, 74}

Q1 = 56.5
Q3 = 72.25

ButQ1 = 57
Q3 = 72 instead

As far as I know, with sparse data like this, you can't be very precise in the placing quantile boundaries in terms of extrapolations of the actual data values. All you can say is the median falls between 62 and 67. The quartile boundaries fall on 57 and 72. If you use k+1 and center the rank distribution on the median, using 2.75 ranks as the quartile width, than 57 will fall into the second quartile while 72 will fall into the third quartile when strictly observing the boundaries 2.75 and 8.25. With n=10+1, you can't be more precise than that IMO. Note I'm using Q4 for the quartile with the highest values and Q1 as the one with the lowest values as you did.
 
Last edited:
  • #7
SW VandeCarr said:
As far as I know, with sparse data like this, you can't be very precise in the placing quantile boundaries in terms of extrapolations of the actual data values. All you can say is the median falls between 62 and 67. The quartile boundaries fall on 57 and 72. If you use k+1 and center the rank distribution on the median, using 2.75 ranks as the quartile width, than 57 will fall into the second quartile while 72 will fall into the third quartile when strictly observing the boundaries 2.75 and 8.25. With n=10+1, you can't be more precise than that IMO. Note I'm using Q4 for the quartile with the highest values and Q1 as the one with the lowest values as you did.

I made a mistake with the "k+1" adjustment for even n. It should be n+1 of course.
 

FAQ: Finding Quartiles for Ungrouped Data: Correct Method and Precision

What is the definition of quartiles in ungrouped data?

Quartiles refer to the values that divide a dataset into four equal parts. The first quartile (Q1) is the 25th percentile, the second quartile (Q2) is the 50th percentile (also known as the median), and the third quartile (Q3) is the 75th percentile.

2. How do you calculate quartiles for ungrouped data?

To calculate quartiles for ungrouped data, you need to first arrange the data in ascending order. Then, find the median (Q2) of the dataset. Next, find the median of the lower half of the data, which will be the first quartile (Q1). Lastly, find the median of the upper half of the data, which will be the third quartile (Q3).

3. What is the purpose of quartiles in ungrouped data?

Quartiles in ungrouped data are used to help analyze the spread and distribution of a dataset. They can also be used to identify outliers and understand the central tendency of the data.

4. How are quartiles related to the interquartile range (IQR)?

The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1). It represents the middle 50% of the data and is a measure of the spread or variability in the dataset.

5. Can quartiles be used to compare two or more datasets?

Yes, quartiles can be used to compare two or more datasets. By looking at the quartiles of each dataset, you can compare the spread and central tendency of the data. However, it is important to note that quartiles should not be used as the only method of comparison, and other statistical measures should also be considered.

Back
Top