Is It a Skewness or Kurtosis Issue in My Data Distribution?

  • Thread starter kimberley
  • Start date
In summary, the conversation revolves around a distribution with 377 data points and descriptives such as an arithmetic mean of 8.56, a standard deviation of .77, skewness of -.37, and kurtosis of .4. The Jarque-Bera test statistic of 11.23 suggests a potential departure from normality. The confusion arises from the number of data points (15) that are 2 standard deviations below the mean, which is higher than expected in a normal distribution. The question is whether this is a skewness problem or a problem with the "fat tails" of the distribution. The distribution appears to be both leptokurtic and skewed.
  • #1
kimberley
14
0
I've been conducting a series of natural experiments and examining their distributions for normality/departures therefrom. One distribution, in particular, resulted in a conversation with a friend and some resulting confusion about whether its primary infirmity is a skewness problem or a kurtosis problem. The question at hand is likely to be pedestrian to most of you, but it's an important basic distinction that I obviously need to grasp and, therefore, your comments will be very appreciated.

The distribution at issue has 377 data points (N=377). The arithmetic mean is 8.56. The standard deviation from the arithmetic mean is .77. Skewness is -.37. Kurtosis is .4. The distribution's Jarque-Bera test statistic is 11.23--thus seriously challenging the likelihood of normality at about the Chi-square (.005;2df) critical level.

With these descriptives in mind, the confusion that I speak of relates to a particular feature of the distribution--the number of data points (15) that are below 2 standard deviations from the arithmetic mean. That is, there are 7 data points above 10.1, which is 2 standard deviations ABOVE the arithmetic mean. As noted, however, there are 15 data points that are below 7.02, which is 2 standard deviations BELOW the arithmetic mean. In a normal distribution, with skewness and excess kurtosis of 0, I understand that we'd expect about 19 (377 x .05) total data points beyond +/- 2 standard deviations, with 10 or 11 at each extreme. In this distribution, we have slightly more, with 22 total data points located either above or below 2 standard deviations from the arithmetic mean, but we also have a "non-normal" number of data points at the lower extreme.

In the discussion I reference above, I expressed the view that the normality of this distribution is most challenged by its skewness as opposed to being leptokurtic ("fat tails"). Surely, it is leptokurtic, as shown by its positive kurtosis of .4, and 3 additional data points at the extremes (22 as opposed to 19 in total), but I don't think we would have discussed this distribution if there were 11 and 11 data points +/- 2 standard deviations respectively.

So, with all of this in mind, in the most definitional sense (if not all others) are the 15 data points that are 2 standard deviations below the arithmetic mean a skewness problem or a kurtosis (lepto-"fat tails") problem? Beyond that, I'd also be really interested to know what additional thoughts, if any, you have about this distribution based on the descriptives.

Thank you again.

Kimberley
 
Physics news on Phys.org
  • #2
Those 15 are a 4th-order problem in addition to contributing to a 3rd-order problem.
 
  • #3


Based on the information provided, it appears that this distribution has both skewness and kurtosis issues. The negative skewness value indicates that the data is skewed to the left, meaning that there are more data points below the mean than above it. This could potentially be a skewness problem.

However, the positive kurtosis value indicates that the distribution has "fat tails", meaning there are more extreme values than would be expected in a normal distribution. This could be considered a kurtosis problem.

In terms of the 15 data points that are 2 standard deviations below the mean, this could be seen as a combination of both skewness and kurtosis issues. It is possible that these data points are contributing to the negative skewness, but they could also be considered as part of the "fat tails" in the distribution.

Overall, it is difficult to say definitively whether the distribution has a skewness or kurtosis problem without further analysis. It is possible that both factors are contributing to the departure from normality. It may be helpful to plot the data and visually examine the shape of the distribution to get a better understanding of its overall characteristics.

In terms of additional thoughts on this distribution, it would be helpful to know more about the nature of the data and the natural experiments being conducted. This could provide insight into potential reasons for the observed skewness and kurtosis. Additionally, it may be useful to conduct further statistical tests or transformations to see if the data can be brought closer to a normal distribution.
 

FAQ: Is It a Skewness or Kurtosis Issue in My Data Distribution?

1. What is skewness and kurtosis?

Skewness and kurtosis are statistical measures used to describe the shape of a distribution. Skewness measures the symmetry or lack of symmetry of a distribution, while kurtosis measures the peakedness or flatness of a distribution.

2. How do you calculate skewness and kurtosis?

Skewness can be calculated by dividing the difference between the mean and the mode by the standard deviation. Kurtosis can be calculated by dividing the difference between the mean and the mode by the standard deviation squared.

3. What do positive and negative values of skewness and kurtosis indicate?

A positive skewness value indicates that the distribution is skewed to the right, with a longer tail on the right side. A negative skewness value indicates that the distribution is skewed to the left, with a longer tail on the left side. Positive kurtosis values indicate a distribution that is more peaked than a normal distribution, while negative kurtosis values indicate a flatter distribution.

4. How do skewness and kurtosis affect data analysis?

Skewness and kurtosis can affect data analysis by providing information about the shape of the distribution. For example, if a dataset has a high kurtosis value, it may indicate that the data is more concentrated around the mean and has a higher likelihood of extreme values. This information can be used to make more accurate predictions and inferences.

5. Can skewness and kurtosis be used to determine if a dataset is normally distributed?

Yes, skewness and kurtosis can be used as indicators of normality. A skewness value close to 0 and a kurtosis value close to 3 indicate a normal distribution. However, it is important to note that these measures are not definitive and other tests, such as the Shapiro-Wilk test, should also be used to determine normality.

Similar threads

Replies
1
Views
1K
Replies
1
Views
881
Replies
3
Views
5K
Replies
6
Views
3K
Replies
2
Views
47K
Replies
3
Views
1K
Replies
2
Views
8K
Back
Top