- #1
kimberley
- 14
- 0
I've been conducting a series of natural experiments and examining their distributions for normality/departures therefrom. One distribution, in particular, resulted in a conversation with a friend and some resulting confusion about whether its primary infirmity is a skewness problem or a kurtosis problem. The question at hand is likely to be pedestrian to most of you, but it's an important basic distinction that I obviously need to grasp and, therefore, your comments will be very appreciated.
The distribution at issue has 377 data points (N=377). The arithmetic mean is 8.56. The standard deviation from the arithmetic mean is .77. Skewness is -.37. Kurtosis is .4. The distribution's Jarque-Bera test statistic is 11.23--thus seriously challenging the likelihood of normality at about the Chi-square (.005;2df) critical level.
With these descriptives in mind, the confusion that I speak of relates to a particular feature of the distribution--the number of data points (15) that are below 2 standard deviations from the arithmetic mean. That is, there are 7 data points above 10.1, which is 2 standard deviations ABOVE the arithmetic mean. As noted, however, there are 15 data points that are below 7.02, which is 2 standard deviations BELOW the arithmetic mean. In a normal distribution, with skewness and excess kurtosis of 0, I understand that we'd expect about 19 (377 x .05) total data points beyond +/- 2 standard deviations, with 10 or 11 at each extreme. In this distribution, we have slightly more, with 22 total data points located either above or below 2 standard deviations from the arithmetic mean, but we also have a "non-normal" number of data points at the lower extreme.
In the discussion I reference above, I expressed the view that the normality of this distribution is most challenged by its skewness as opposed to being leptokurtic ("fat tails"). Surely, it is leptokurtic, as shown by its positive kurtosis of .4, and 3 additional data points at the extremes (22 as opposed to 19 in total), but I don't think we would have discussed this distribution if there were 11 and 11 data points +/- 2 standard deviations respectively.
So, with all of this in mind, in the most definitional sense (if not all others) are the 15 data points that are 2 standard deviations below the arithmetic mean a skewness problem or a kurtosis (lepto-"fat tails") problem? Beyond that, I'd also be really interested to know what additional thoughts, if any, you have about this distribution based on the descriptives.
Thank you again.
Kimberley
The distribution at issue has 377 data points (N=377). The arithmetic mean is 8.56. The standard deviation from the arithmetic mean is .77. Skewness is -.37. Kurtosis is .4. The distribution's Jarque-Bera test statistic is 11.23--thus seriously challenging the likelihood of normality at about the Chi-square (.005;2df) critical level.
With these descriptives in mind, the confusion that I speak of relates to a particular feature of the distribution--the number of data points (15) that are below 2 standard deviations from the arithmetic mean. That is, there are 7 data points above 10.1, which is 2 standard deviations ABOVE the arithmetic mean. As noted, however, there are 15 data points that are below 7.02, which is 2 standard deviations BELOW the arithmetic mean. In a normal distribution, with skewness and excess kurtosis of 0, I understand that we'd expect about 19 (377 x .05) total data points beyond +/- 2 standard deviations, with 10 or 11 at each extreme. In this distribution, we have slightly more, with 22 total data points located either above or below 2 standard deviations from the arithmetic mean, but we also have a "non-normal" number of data points at the lower extreme.
In the discussion I reference above, I expressed the view that the normality of this distribution is most challenged by its skewness as opposed to being leptokurtic ("fat tails"). Surely, it is leptokurtic, as shown by its positive kurtosis of .4, and 3 additional data points at the extremes (22 as opposed to 19 in total), but I don't think we would have discussed this distribution if there were 11 and 11 data points +/- 2 standard deviations respectively.
So, with all of this in mind, in the most definitional sense (if not all others) are the 15 data points that are 2 standard deviations below the arithmetic mean a skewness problem or a kurtosis (lepto-"fat tails") problem? Beyond that, I'd also be really interested to know what additional thoughts, if any, you have about this distribution based on the descriptives.
Thank you again.
Kimberley