Is Normalization Necessary for Chi-Square and K-S Tests?

In summary, the conversation discusses the need to normalize two digit numbers between 0 and 1 for the K-S test, but it is not clear if this is necessary for the chi-square test. The idea of normalization is mentioned, but it is unclear how it should be done and if it is necessary for the chi-square test. The concept of normalization for this calculation is ambiguous and may depend on personal standards.
  • #1
shivajikobardan
674
54
Say two digits numbers are given like 10,20,30,55,95,85,12,13,52...etc. Is it necessary to normalize them to numbers between 0 to 1? i.e 0.10 for 10, 0.20 for 20 and so on? I've read this to be the case for K-S test. But I'm not sure for chi-square test. I'm not 100% sure on this information as I've not seen it everywhere.
http://www-i4.informatik.rwth-aache...s/sub/simulation/simulationSS06/slides/05.pdf
It looks like it's the case for K-S test as x needs to be between 0 to 1, not sure for Chi-square test?
 
Technology news on Phys.org
  • #2
It doesn't seem as though it's necessary according to https://www.scribbr.com/statistics/chi-square-tests/
(also 10 doesn't normalize to .10 unless the total is 100)
They define:
##\chi^2 = \Sigma \frac{(O-E)^2}{E}## which isn't quite a normalization. O is observed E is expected.
You also need some standard, which they call a critical value.
I am not sure if it matters, though.

If we look at a specific example for 1 data point (using your "normalization", i.e. divide by 100):
O = 15, E =10 -> ##\chi^2 = \frac{5^2}{10} = 2.5## vs ## \frac{.05^2}{.1} =.025##

Thinking about it, it's not even clear how you should normalize O and E. Is it always necessary that Sum(O) = Sum(E)? I think it depends.
If your list is randomly generated, you should expect 50 from each, say we have 10 data points, that's 500 total expected units. It could be the case that in some niche trial the computer generates all 1's or all 99's, in which case there would be more observed units than expected units. The only way to actually normalize both O and E would be to normalize them separately, like ##\frac{O_i}{\Sigma_k O_k}## and the same for E. I'm not sure if that's reasonable or not.
Looking at the case where the numbers are all random, we generate 10 points from 1 to 99, the average is 50, which is what we should expect from each. The computer generates 10-1's. Our normalized O values would be 1/10, and our normalized E values would be 50/500 = 1/10. This would give chi^2 = 0, which would imply that our model was right on the money. This is clearly not the case.

If we normalize using ##\frac{O_i}{\Sigma_k E_k}## we get ##\bar{O_i} = \frac{1}{500} \to \chi^2 = \Sigma \frac{(.002 - .1)^2}{.1} = .9604##
Compare that without "normalization" and we have
##\chi^2 = \Sigma \frac{(1-50)^2}{50} = 480.2##
2 very different numbers, but interestingly enough the 2nd one is 500 times larger.
The first one seems to be something to the effect of %error, and at the end of they day, I think it's all going to come down to standards. The website I read didn't do it, nor did they mention it, and in the wikipedia, they give a specific example and don't do it either.

I haven't seen it written, but I think the concept of "normalization" is ambiguous for this calculation.
https://en.wikipedia.org/wiki/Chi-squared_test#Example_chi-squared_test_for_categorical_data
 

FAQ: Is Normalization Necessary for Chi-Square and K-S Tests?

What is the difference between Chi-square test and K-S test of uniformity?

The Chi-square test is used to compare observed data with expected data, while the K-S test of uniformity is used to test whether a set of data follows a specific distribution. In other words, the Chi-square test determines if there is a significant difference between observed and expected frequencies, while the K-S test determines if a set of data is normally distributed.

When should I use a Chi-square test vs a K-S test of uniformity?

A Chi-square test should be used when comparing categorical data, such as yes/no or multiple choice responses. A K-S test of uniformity should be used when testing for normality or if the data is continuous.

What do the results of a Chi-square test and K-S test of uniformity mean?

The results of a Chi-square test will give you a p-value, which indicates the probability of obtaining the observed results if there is no significant difference between the observed and expected data. A p-value of less than 0.05 is considered statistically significant. The results of a K-S test will give you a D statistic, which represents the maximum difference between the cumulative distribution functions of the observed and expected data. A smaller D statistic indicates a better fit to the expected distribution.

Do I need to normalize values between 0-1 before conducting a Chi-square or K-S test?

It depends on the type of data you are analyzing. If your data is already in a proportion or percentage format, then normalization is not necessary. However, if your data is in a different format, such as counts or continuous values, then normalization may be necessary to ensure that the tests are accurate.

Can I use both Chi-square and K-S tests in the same analysis?

Yes, it is possible to use both tests in the same analysis, but it is important to understand the differences between the two and when to use each one. It is also important to consider the assumptions of each test and make sure they are met before interpreting the results.

Similar threads

Back
Top