Is It a Skewness or Kurtosis Issue in My Data Distribution?

kimberley · Nov 25, 2007

I've been conducting a series of natural experiments and examining their distributions for normality/departures therefrom. One distribution, in particular, resulted in a conversation with a friend and some resulting confusion about whether its primary infirmity is a skewness problem or a kurtosis problem. The question at hand is likely to be pedestrian to most of you, but it's an important basic distinction that I obviously need to grasp and, therefore, your comments will be very appreciated.

The distribution at issue has 377 data points (N=377). The arithmetic mean is 8.56. The standard deviation from the arithmetic mean is .77. Skewness is -.37. Kurtosis is .4. The distribution's Jarque-Bera test statistic is 11.23--thus seriously challenging the likelihood of normality at about the Chi-square (.005;2df) critical level.

With these descriptives in mind, the confusion that I speak of relates to a particular feature of the distribution--the number of data points (15) that are below 2 standard deviations from the arithmetic mean. That is, there are 7 data points above 10.1, which is 2 standard deviations ABOVE the arithmetic mean. As noted, however, there are 15 data points that are below 7.02, which is 2 standard deviations BELOW the arithmetic mean. In a normal distribution, with skewness and excess kurtosis of 0, I understand that we'd expect about 19 (377 x .05) total data points beyond +/- 2 standard deviations, with 10 or 11 at each extreme. In this distribution, we have slightly more, with 22 total data points located either above or below 2 standard deviations from the arithmetic mean, but we also have a "non-normal" number of data points at the lower extreme.

In the discussion I reference above, I expressed the view that the normality of this distribution is most challenged by its skewness as opposed to being leptokurtic ("fat tails"). Surely, it is leptokurtic, as shown by its positive kurtosis of .4, and 3 additional data points at the extremes (22 as opposed to 19 in total), but I don't think we would have discussed this distribution if there were 11 and 11 data points +/- 2 standard deviations respectively.

So, with all of this in mind, in the most definitional sense (if not all others) are the 15 data points that are 2 standard deviations below the arithmetic mean a skewness problem or a kurtosis (lepto-"fat tails") problem? Beyond that, I'd also be really interested to know what additional thoughts, if any, you have about this distribution based on the descriptives.

Thank you again.

Kimberley

EnumaElish · Nov 25, 2007

Those 15 are a 4th-order problem in addition to contributing to a 3rd-order problem.

tacman · Dec 2, 2007

Based on the information provided, it appears that this distribution has both skewness and kurtosis issues. The negative skewness value indicates that the data is skewed to the left, meaning that there are more data points below the mean than above it. This could potentially be a skewness problem.

However, the positive kurtosis value indicates that the distribution has "fat tails", meaning there are more extreme values than would be expected in a normal distribution. This could be considered a kurtosis problem.

In terms of the 15 data points that are 2 standard deviations below the mean, this could be seen as a combination of both skewness and kurtosis issues. It is possible that these data points are contributing to the negative skewness, but they could also be considered as part of the "fat tails" in the distribution.

Overall, it is difficult to say definitively whether the distribution has a skewness or kurtosis problem without further analysis. It is possible that both factors are contributing to the departure from normality. It may be helpful to plot the data and visually examine the shape of the distribution to get a better understanding of its overall characteristics.

In terms of additional thoughts on this distribution, it would be helpful to know more about the nature of the data and the natural experiments being conducted. This could provide insight into potential reasons for the observed skewness and kurtosis. Additionally, it may be useful to conduct further statistical tests or transformations to see if the data can be brought closer to a normal distribution.

Is It a Skewness or Kurtosis Issue in My Data Distribution?

FAQ: Is It a Skewness or Kurtosis Issue in My Data Distribution?

1. What is skewness and kurtosis?

2. How do you calculate skewness and kurtosis?

3. What do positive and negative values of skewness and kurtosis indicate?

4. How do skewness and kurtosis affect data analysis?

5. Can skewness and kurtosis be used to determine if a dataset is normally distributed?

Similar threads

Hot Threads

Recent Insights