Finding the skewness and Kurtosis of grouped data

chwala · Nov 30, 2021

See the grouped data below; I just want to be certain that i have followed the correct step in trying to find skewness of the grouped data.

mjc123 · Nov 30, 2021

Your calculation of the variance is slightly off; it is correct for the variance of a population but not for a sample - you need to multiply it by n/(n-1). (Compare your definitions of Var and S.)

chwala · Nov 30, 2021

Ok i will check on that...on a side note am trying to also self learn spss; i tried to feed the information as per the table below;

and the results were as follows;

looks like my calculation of skewness was on point.

mjc123 · Nov 30, 2021

Well, the same to 2 sig figs (which may be perfectly adequate for your purposes). There are several slightly different formulas for sample skewness (see the Wikipedia article on skewness) and yours is not identical to any of them. It says that the one it calls G₁ is the one used in SPSS, and I calculate its value as -0.1824 - very similar to yours but not identical.

chwala · Nov 30, 2021

I have just checked; The sample variance is now ok fully understood;

chwala · Nov 30, 2021

mjc123 said:

Well, the same to 2 sig figs (which may be perfectly adequate for your purposes). There are several slightly different formulas for sample skewness (see the Wikipedia article on skewness) and yours is not identical to any of them. It says that the one it calls G₁ is the one used in SPSS, and I calculate its value as -0.1824 - very similar to yours but not identical.

Yes, i saw something like that, Peason's measures on skewness...using median, mode...which is the general accepted method world wide? And for kurtosis, which method is preffered? I will try and calculate kurtosis tomorrow...but hey, i do like the two methods that i have indicated...i think i will stick with that...

chwala · Dec 1, 2021

I am now trying to calculate Kurtosis, allow me to share an example that draws my point of reference;
This is an example from my research;

Now having shown this, and going back to our problem, it follows that, Kurtosis, ##K## for a sample population is given by,

##K##=##\frac {\sum(x- μ)^4}{ (n-1)s^4}##=##\frac{1,024,059.383+ 472,291.898+16,607.53125+1,561.929+215,755.34+876,623.45}{39×12.89^4}##

##K=\frac{2,606,899.531}{1,076,654.293}##=##2.42##

But checking this with spss, i am getting ##K= -##0.42##,

where could the mistake be?

chwala · Dec 1, 2021

Marks	x	f	fx	(x-X)	f(x-X)^2 =m	f(x-X)^3
1-10	5.5	2	11	-26.75	1,431.13	-38,282.59
11-20	15.5	6	93	-16.75	1,683.38	-28,196.53
21-30	25.5	8	204	-6.75	364.5	-2,460.38
31-40	35.5	14	497	3.25	147.88	480.59
41-50	45.5	7	318.5	13.25	1,228.93	16,283.42
51-60	55.5	3	166.5	23.25	1,621.69	37,704.23
		40	1290		6477.51	-14,471.26

Also looking at this and calculating skewness, i am getting,

Skewness= ##\frac{-14,471.26}{39 ×12.89^3}##= ##-0.173##

and not ##-0.18## as we had expected...
Method 2; Using joanes and gills approach

Let n=##\sum (x-μ)^4= 2,606,899.531## then;
kurtosis = ##\frac {n}{m^2}## & excess kurtosis = ##n-3##

kutosis = ##\frac {2,606,899.531}{6,477.51^2}##= ##\frac {2,606,899.531}{41,958,135.8}##=##0.062##
excess kurtosis = ##0.062-3##=##-2.938##
then the sample excess kurtosis= ##\frac{40-1}{38×37}##×##[41×-2.938+6]##=0.02773×-114.458=-##3.17##

mjc123 · Dec 1, 2021

Skewness: I think this comes from using your formula with the correct value of S. As I said, your formula is not the same as that used by SPSS; the fact that you previously got (using your formula with the wrong value of S) the same value as SPSS (to 2 sig figs at least) was a coincidence.

Kurtosis: again there are several definitions. See wikipedia page; the quantity used by SPSS is the one they call G₂. (You could also check the help for the KURT function in Excel.) Note that this is a formula for excess kurtosis = kurtosis - 3. In your second method, note that m = (n-1)s², so your formula is a factor of (n-1) too small. (By the way, too many n's meaning different things!)

chwala · Dec 1, 2021

Actually i am using PSPP which assumably, ought not be different from SPSS. I will dedicate this day to studying Kurtosis

...weighted tails...cheers mate

chwala · Dec 20, 2021

chwala said:

Marks
x
f fx (x-X) f(x-X)^2 =m f(x-X)^3
1-10 5.5 2 11 -26.75 1,431.13 -38,282.59
11-20 15.5 6 93 -16.75 1,683.38 -28,196.53
21-30 25.5 8 204 -6.75 364.5 -2,460.38
31-40 35.5 14 497 3.25 147.88 480.59
41-50 45.5 7 318.5 13.25 1,228.93 16,283.42
51-60 55.5 3 166.5 23.25 1,621.69
37,704.23
40 1290 6477.51 -14,471.26

Also looking at this and calculating skewness, i am getting,

Skewness= ##\frac{-14,471.26}{39 ×12.89^3}##= ##-0.173##

and not ##-0.18## as we had expected...
Method 2; Using joanes and gills approach

Let n=##\sum (x-μ)^4= 2,606,899.531## then;
kurtosis = ##\frac {n}{m^2}## & excess kurtosis = ##n-3##

kutosis = ##\frac {2,606,899.531}{6,477.51^2}##= ##\frac {2,606,899.531}{41,958,135.8}##=##0.062##
excess kurtosis = ##0.062-3##=##-2.938##
then the sample excess kurtosis= ##\frac{40-1}{38×37}##×##[41×-2.938+6]##=0.02773×-114.458=-##3.17##

I have read on a better way of finding or rather measuring kurtosis and skewness. This is by using moments...quite straightforward. I need clarity on the highlighted part...

In general,

##d##=##\frac {(ΔMid-point)}{20}##
Now, we need to establish ##d## values from the class intervals. My point is that the textbook has a mistake on the highlighted part of the class interval...it ought to be, ##100- 120, 120-140, 140-160##... this is the part that i need clarity on, otherwise the other steps of the working are clear. i.e...

Therefore,

##β_1## and ##β_2## =

implying a 'symmetrical distribution' as the skewness is almost zero and a leptokurtic distribution.

I think the errors are typo...
in the last part, we should have;
##β_1##=##\frac{(μ_3)^2}{(μ_2)^2}##...

chwala · Dec 20, 2021

chwala said:

Homework Statement:: See attached below
Note that this is an original problem created by myself.
Relevant Equations:: Skewness and kurtosis- Statistics

See the grouped data below; I just want to be certain that i have followed the correct step in trying to find skewness of the grouped data.

View attachment 293330

View attachment 293331

Now i will go ahead and calculate the skewness and kurtosis of this data...Give me a moment. Our table will look like this;

Marks	Frequency	##x_i##	##d##	##fd##	##fd^2##	##fd^3##	##fd^4##
1-10	2	5.5	-3.33	-6.66	22.178	-73.852	245.927
11-20	6	15.5	-2.22	-13.32	29.57	-65.646	145.735
21-30	8	25.5	-1.11	-8.88	9.857	-10.941	12.145
31-40	14	35.5	0	0	0	0	0
41-50	7	45.5	1.11	7.77	8.625	9.573	10.626
51-60	3	55.5	2.22	6.66	14.785	32.823	72.867
				-14.43	85.015	-108.043	487.103

I hope i am doing it right...i have finished my calculations and am getting different values i.e
##β_1=4.98## and ##β_2=2.48##...i may need to re check my calculations and at same time i would like to know on whether my table of values is correctly done...

A rough check on the Mean = Assumed mean + [##\frac{fd}{f}]×i##
= ##35.5##+##\frac{-14.43}{40}×9## =##35.5+-3.24675##=##32.25## which agrees with what we had found earlier in post ##1##
I may need to check moments ##2,3## and ##4## later...

BvU · Dec 20, 2021

Hello again,

I don't agree with your calculation for the ##d##. You want to use the distance between the bin centers as denominator, i.e. 10, not 9. Think of the marks as a continuous distribution, or else you get gaps.

##\ ##

chwala · Dec 20, 2021

BvU said:

Hello again,

I don't agree with your calculation for the ##d##. You want to use the distance between the bin centers as denominator, i.e. 10, not 9. Think of the marks as a continuous distribution, or else you get gaps.

Bvu nice talking again, I'll amend my excel and recalculate everything again...means for continously data we are thinking along the lines of [##0.5##( lower class limit) is less than or equal to ##x## <10.5]... to have our class interval as ##10##...cheers

chwala · Dec 21, 2021

BvU said:

Hello again,

I don't agree with your calculation for the ##d##. You want to use the distance between the bin centers as denominator, i.e. 10, not 9. Think of the marks as a continuous distribution, or else you get gaps.

##\ ##

Just a minute what about ##x_i##? Will i now use ##5## instead of ##5.5## using the first class interval ##[0.5≤x<10.5]##as my point of reference?

chwala · Dec 21, 2021

I do not think it would be possible to calculate skewness and kurtosis because of the discontinuity at the class intervals...I may need to create another suitable example...

Finding the skewness and Kurtosis of grouped data

FAQ: Finding the skewness and Kurtosis of grouped data

What is the purpose of finding the skewness and kurtosis of grouped data?

How do you calculate the skewness and kurtosis of grouped data?

What do positive and negative values of skewness and kurtosis indicate?

How can you interpret the values of skewness and kurtosis?

Are there any limitations to using skewness and kurtosis for grouped data?

Similar threads

Hot Threads

Recent Insights