Finding the skewness and Kurtosis of grouped data

  • Thread starter chwala
  • Start date
  • Tags
    Data
In summary, the conversation is discussing the correct methods for calculating skewness and kurtosis of grouped data. The participants are using different formulas and approaches, and there is some confusion over the correct values. They are also discussing the use of SPSS and PSPP for these calculations.
  • #1
chwala
Gold Member
2,746
387
Homework Statement
See attached below
Note that this is an original problem created by myself.
Relevant Equations
Skewness and kurtosis- Statistics
See the grouped data below; I just want to be certain that i have followed the correct step in trying to find skewness of the grouped data.

1638258802517.png


1638258853711.png
 
Physics news on Phys.org
  • #2
Your calculation of the variance is slightly off; it is correct for the variance of a population but not for a sample - you need to multiply it by n/(n-1). (Compare your definitions of Var and S.)
 
  • #3
Ok i will check on that...on a side note am trying to also self learn spss; i tried to feed the information as per the table below;

1638280490837.png


and the results were as follows;

1638280560348.png


looks like my calculation of skewness was on point.
 
Last edited:
  • #4
Well, the same to 2 sig figs (which may be perfectly adequate for your purposes). There are several slightly different formulas for sample skewness (see the Wikipedia article on skewness) and yours is not identical to any of them. It says that the one it calls G1 is the one used in SPSS, and I calculate its value as -0.1824 - very similar to yours but not identical.
 
  • #5
I have just checked; The sample variance is now ok fully understood;

1638282134679.png
 
  • #6
mjc123 said:
Well, the same to 2 sig figs (which may be perfectly adequate for your purposes). There are several slightly different formulas for sample skewness (see the Wikipedia article on skewness) and yours is not identical to any of them. It says that the one it calls G1 is the one used in SPSS, and I calculate its value as -0.1824 - very similar to yours but not identical.
Yes, i saw something like that, Peason's measures on skewness...using median, mode...which is the general accepted method world wide? And for kurtosis, which method is preffered? I will try and calculate kurtosis tomorrow...but hey, i do like the two methods that i have indicated...i think i will stick with that...:cool:
 
  • #7
I am now trying to calculate Kurtosis, allow me to share an example that draws my point of reference;
This is an example from my research;

1638361764895.png

1638361790176.png


Now having shown this, and going back to our problem, it follows that, Kurtosis, ##K## for a sample population is given by,

##K##=##\frac {\sum(x- μ)^4}{ (n-1)s^4}##=##\frac{1,024,059.383+ 472,291.898+16,607.53125+1,561.929+215,755.34+876,623.45}{39×12.89^4}##

##K=\frac{2,606,899.531}{1,076,654.293}##=##2.42##

But checking this with spss, i am getting ##K= -##0.42##,

1638363008576.png


where could the mistake be?
 
Last edited:
  • #8
Marks
x​
ffx(x-X)f(x-X)^2 =mf(x-X)^3
1-105.5211-26.751,431.13-38,282.59
11-2015.5693-16.751,683.38-28,196.53
21-3025.58204-6.75364.5-2,460.38
31-4035.5144973.25147.88480.59
41-5045.57318.513.251,228.9316,283.42
51-6055.53166.523.251,621.69
37,704.23​
4012906477.51-14,471.26

Also looking at this and calculating skewness, i am getting,

Skewness= ##\frac{-14,471.26}{39 ×12.89^3}##= ##-0.173##

and not ##-0.18## as we had expected...
Method 2; Using joanes and gills approach

Let n=##\sum (x-μ)^4= 2,606,899.531## then;
kurtosis = ##\frac {n}{m^2}## & excess kurtosis = ##n-3##

kutosis = ##\frac {2,606,899.531}{6,477.51^2}##= ##\frac {2,606,899.531}{41,958,135.8}##=##0.062##
excess kurtosis = ##0.062-3##=##-2.938##
then the sample excess kurtosis= ##\frac{40-1}{38×37}##×##[41×-2.938+6]##=0.02773×-114.458=-##3.17##
 
Last edited:
  • #9
Skewness: I think this comes from using your formula with the correct value of S. As I said, your formula is not the same as that used by SPSS; the fact that you previously got (using your formula with the wrong value of S) the same value as SPSS (to 2 sig figs at least) was a coincidence.

Kurtosis: again there are several definitions. See wikipedia page; the quantity used by SPSS is the one they call G2. (You could also check the help for the KURT function in Excel.) Note that this is a formula for excess kurtosis = kurtosis - 3. In your second method, note that m = (n-1)s2, so your formula is a factor of (n-1) too small. (By the way, too many n's meaning different things!)
 
  • Like
Likes chwala
  • #10
Actually i am using PSPP which assumably, ought not be different from SPSS. I will dedicate this day to studying Kurtosis:cool::cool:...weighted tails...cheers mate
 
  • #11
chwala said:
Marks
x​
ffx(x-X)f(x-X)^2 =mf(x-X)^3
1-105.5211-26.751,431.13-38,282.59
11-2015.5693-16.751,683.38-28,196.53
21-3025.58204-6.75364.5-2,460.38
31-4035.5144973.25147.88480.59
41-5045.57318.513.251,228.9316,283.42
51-6055.53166.523.251,621.69
37,704.23​
4012906477.51-14,471.26

Also looking at this and calculating skewness, i am getting,

Skewness= ##\frac{-14,471.26}{39 ×12.89^3}##= ##-0.173##

and not ##-0.18## as we had expected...
Method 2; Using joanes and gills approach

Let n=##\sum (x-μ)^4= 2,606,899.531## then;
kurtosis = ##\frac {n}{m^2}## & excess kurtosis = ##n-3##

kutosis = ##\frac {2,606,899.531}{6,477.51^2}##= ##\frac {2,606,899.531}{41,958,135.8}##=##0.062##
excess kurtosis = ##0.062-3##=##-2.938##
then the sample excess kurtosis= ##\frac{40-1}{38×37}##×##[41×-2.938+6]##=0.02773×-114.458=-##3.17##
I have read on a better way of finding or rather measuring kurtosis and skewness. This is by using moments...quite straightforward. I need clarity on the highlighted part...

In general,
1639985641932.png

1639985677509.png


1639985747343.png


##d##=##\frac {(ΔMid-point)}{20}##
Now, we need to establish ##d## values from the class intervals. My point is that the textbook has a mistake on the highlighted part of the class interval...it ought to be, ##100- 120, 120-140, 140-160##... this is the part that i need clarity on, otherwise the other steps of the working are clear. i.e...
1639986246549.png


1639986275660.png


Therefore,

##β_1## and ##β_2## =

1639986387276.png
implying a 'symmetrical distribution' as the skewness is almost zero and a leptokurtic distribution.

I think the errors are typo...
in the last part, we should have;
##β_1##=##\frac{(μ_3)^2}{(μ_2)^2}##...
 
Last edited:
  • #12
chwala said:
Homework Statement:: See attached below
Note that this is an original problem created by myself.
Relevant Equations:: Skewness and kurtosis- Statistics

See the grouped data below; I just want to be certain that i have followed the correct step in trying to find skewness of the grouped data.

View attachment 293330

View attachment 293331
Now i will go ahead and calculate the skewness and kurtosis of this data...Give me a moment. Our table will look like this;
MarksFrequency##x_i####d####fd####fd^2####fd^3####fd^4##
1-1025.5-3.33-6.6622.178-73.852245.927
11-20615.5-2.22-13.3229.57-65.646145.735
21-30825.5-1.11-8.889.857-10.94112.145
31-401435.500000
41-50745.51.117.778.6259.57310.626
51-60355.52.226.6614.78532.82372.867
-14.4385.015-108.043487.103

I hope i am doing it right...i have finished my calculations and am getting different values i.e
##β_1=4.98## and ##β_2=2.48##...i may need to re check my calculations and at same time i would like to know on whether my table of values is correctly done...

A rough check on the Mean = Assumed mean + [##\frac{fd}{f}]×i##
= ##35.5##+##\frac{-14.43}{40}×9## =##35.5+-3.24675##=##32.25## which agrees with what we had found earlier in post ##1##
I may need to check moments ##2,3## and ##4## later...
 
Last edited:
  • #13
Hello again,

I don't agree with your calculation for the ##d##. You want to use the distance between the bin centers as denominator, i.e. 10, not 9. Think of the marks as a continuous distribution, or else you get gaps.

##\ ##
 
  • Like
Likes chwala
  • #14
BvU said:
Hello again,

I don't agree with your calculation for the ##d##. You want to use the distance between the bin centers as denominator, i.e. 10, not 9. Think of the marks as a continuous distribution, or else you get gaps.
Bvu nice talking again, I'll amend my excel and recalculate everything again...means for continously data we are thinking along the lines of [##0.5##( lower class limit) is less than or equal to ##x## <10.5]... to have our class interval as ##10##...cheers
 
  • #15
BvU said:
Hello again,

I don't agree with your calculation for the ##d##. You want to use the distance between the bin centers as denominator, i.e. 10, not 9. Think of the marks as a continuous distribution, or else you get gaps.

##\ ##
Just a minute what about ##x_i##? Will i now use ##5## instead of ##5.5## using the first class interval ##[0.5≤x<10.5]##as my point of reference?
 
  • #16
I do not think it would be possible to calculate skewness and kurtosis because of the discontinuity at the class intervals...I may need to create another suitable example...
 

FAQ: Finding the skewness and Kurtosis of grouped data

What is the purpose of finding the skewness and kurtosis of grouped data?

The purpose of finding the skewness and kurtosis of grouped data is to understand the shape and distribution of the data. Skewness measures the asymmetry of the data, while kurtosis measures the peakedness or flatness of the data. These measures can provide valuable insights into the underlying patterns and trends in the data, which can be useful for making informed decisions.

How do you calculate the skewness and kurtosis of grouped data?

To calculate the skewness and kurtosis of grouped data, you will need to first calculate the mean, standard deviation, and median of the data. Then, you can use specific formulas to calculate the skewness and kurtosis. For example, the formula for skewness is (mean - median) / standard deviation, and the formula for kurtosis is ((mean - median) / standard deviation)^4.

What do positive and negative values of skewness and kurtosis indicate?

A positive skewness value indicates that the data is skewed to the right, meaning that the tail of the distribution is longer on the right side. A negative skewness value indicates that the data is skewed to the left, meaning that the tail of the distribution is longer on the left side. Similarly, a positive kurtosis value indicates that the data is more peaked than a normal distribution, while a negative kurtosis value indicates that the data is flatter than a normal distribution.

How can you interpret the values of skewness and kurtosis?

The values of skewness and kurtosis can help you interpret the shape and distribution of the data. If the skewness and kurtosis values are close to 0, it indicates that the data is approximately normally distributed. If the values are significantly different from 0, it indicates that the data is not normally distributed and may be skewed or have a different level of peakedness or flatness. Additionally, comparing the skewness and kurtosis values of different datasets can provide insights into their relative shapes and distributions.

Are there any limitations to using skewness and kurtosis for grouped data?

Yes, there are some limitations to using skewness and kurtosis for grouped data. These measures are sensitive to the grouping of data and may not accurately reflect the underlying distribution. Additionally, they may not be appropriate for all types of data, such as highly skewed or non-numerical data. It is important to consider the context and characteristics of the data when interpreting the values of skewness and kurtosis.

Back
Top