Unsure about shapiro-wilk test

  • Thread starter A_B
  • Start date
  • Tags
    Test
In summary, the Shapiro-Wilk test is a statistical test used to determine whether a given data set follows a normal distribution. It works by calculating a test statistic and comparing it to a critical value, and is typically used to assess the assumption of normality in research. The test is best used when trying to determine if a data set is normally distributed, but it has limitations in sensitivity to large sample sizes and is not suitable for data sets with more than 5,000 observations. It can be used for small sample sizes, but may not be accurate for very small sample sizes or data sets with less than 3 observations.
  • #1
A_B
93
1

Homework Statement


I have a project for statistics, and one question asks to first investigate the normality of "Wage", and "if this isn't normally distributed" to check normality for "lnWage", the log of "Wage". I'm supposed to use the program "R" for all this.(http://www.r-project.org/" )

I've attached the relevant data, and a normal qq plot of lnWage


Homework Equations


none (in "R" command: shapiro.test("data"))


The Attempt at a Solution


> shapiro.test(Bwages$Wage)

Shapiro-Wilk normality test

data: Bwages$Wage
W = 0.8675, p-value < 2.2e-16

> shapiro.test(Bwages$lnWage)

Shapiro-Wilk normality test

data: Bwages$lnWage
W = 0.9892, p-value = 5.993e-09


As you can see, the p-values are very very small for both, so that we must conclude neither are distributed normally.
I't just that I don't really trust this, if they give you something like this in a project you'd expect the transformation to be normally distributed so you can say "jippie" and use normality in later questions


Thanks,
Alex
 

Attachments

  • Wage.txt
    11.2 KB · Views: 405
  • lnWage.txt
    11.4 KB · Views: 391
  • qqnormLNWAGES.PNG
    qqnormLNWAGES.PNG
    6.8 KB · Views: 395
Last edited by a moderator:
Physics news on Phys.org
  • #2


Dear Alex,

Thank you for sharing your results from the Shapiro-Wilk normality test. From your findings, it appears that neither the "Wage" nor the "lnWage" data is normally distributed. This means that the assumption of normality cannot be applied to these variables in your project.

However, as you mentioned, it is important to consider the transformation of the data. In this case, the log transformation of "Wage" seems to have improved the normality of the data, as shown by the higher p-value for the "lnWage" variable. This may be something worth mentioning in your project, as it could potentially affect the results of any subsequent analyses.

It is also important to keep in mind that the Shapiro-Wilk test is just one way to assess normality and there are other methods that could be used. It may be helpful to explore other statistical tests or visualizations to further investigate the normality of your data.

Best of luck with your project!
 

FAQ: Unsure about shapiro-wilk test

1. What is the Shapiro-Wilk test and why is it used?

The Shapiro-Wilk test is a statistical test used to determine whether a given data set follows a normal distribution. It is commonly used in research to assess the assumption of normality, which is important for many statistical analyses.

2. How does the Shapiro-Wilk test work?

The Shapiro-Wilk test works by calculating a test statistic based on the sample data. This statistic is then compared to a critical value to determine whether the data is likely to have come from a normal distribution. If the test statistic is greater than the critical value, the data is considered to be non-normally distributed.

3. When should I use the Shapiro-Wilk test?

The Shapiro-Wilk test should be used when you need to determine whether a data set is normally distributed. This is important for many statistical analyses, as they may assume that the data follows a normal distribution in order to produce accurate results. If your data is not normally distributed, alternative methods may need to be used.

4. What are the limitations of the Shapiro-Wilk test?

The main limitation of the Shapiro-Wilk test is that it is sensitive to sample size. This means that for large sample sizes, the test may detect even small departures from normality, leading to a rejection of the assumption of normality. Additionally, the Shapiro-Wilk test is not suitable for data sets with more than 5,000 observations.

5. Can I use the Shapiro-Wilk test for small sample sizes?

Yes, the Shapiro-Wilk test can be used for small sample sizes. However, it is important to note that for very small sample sizes (less than 3), the test may not be accurate. Additionally, the test may not be suitable for data sets with less than 3 observations in any one group or category.

Similar threads

Back
Top