Chi-Square Tests for Homogeneity & Association

  • #1
Agent Smith
333
35
TL;DR Summary
Chi-square tests for homogeneity and association, how to tell the difference?
Capture1.PNG

Capture2.PNG

Capture 3.PNG


=============================================================================================
Capture4.PNG
Capture5.PNG


From the above example questions, we have ##2## different kinds of Chi-square tests.
1. One for homogeneity
2. One for association (independence/dependence).

The answer guide says that if we take ##1## sample, we're testing for association. The geneticist took ##1## sample of ##500## peeps. If we take ##2## samples then we're testing for homogeneity. The market researcher takes ##2## samples, 180 sedans and 180 trucks.

I can make some sense of a Chi-square test for homogeneity. We take ##2## samples i.e. there are ##2## different categories (in the questions above, cars and sedans) and see if the distribution of colors for these ##2## categories differ in a statistically significant way or not.

But I have trouble with Chi-square tests for association. In the above example, aren't we checking for the "distribution" of handedness (left/right/both) for the ##2## categories, men and women? They look same to me. Suppose I had taken, could I?, ##2## identical samples and got the exact same data. Does the Chi-square computation (which doesn't seem to distinguish the ##2## cases) now become one for homogeneity?

Gracias.
 
Physics news on Phys.org
  • #2
:smile:
 
  • #3
Agent Smith said:
TL;DR Summary: Chi-square tests for homogeneity and association, how to tell the difference?

But I have trouble with Chi-square tests for association. In the above example, aren't we checking for the "distribution" of handedness (left/right/both) for the 2 categories, men and women? They look same to me. Suppose I had taken, could I?, 2 identical samples and got the exact same data. Does the Chi-square computation (which doesn't seem to distinguish the 2 cases) now become one for homogeneity?
Do you know how they arrived at the estimates and Chi-2s? If not I will show you below. But first, let me say the way I understand the chi-2 test for the association of mutually exclusive categories with two (or more) mutually exclusive categories is using the format called contingency tables a matrix with rows for one category and columns for the other.

So for the geneticist experiment, we would have


rt -handednessleft-handednessambidextrousmarginal row sum
men207 2112240
women227 276260
marginal col sum4344818total 500


Where the estimates for a cell are given by a marginal col sum of handedness containing that cell divided by the total number in the study then multiply that by the corresponding marginal row sum of gender containing that cell. e.g., for left-handed men the Estimate = (48/500)*240 = 23.04

For association, to determine the total Chi-2, you calculate the Chi-2 of each cell in a row and sum them. For the homogeneity, you sum the Chi-2 of each cell of the columns and sum them. Then sum those Chi-2 to get the total and divide by the DF.
 
  • Like
Likes Agent Smith
  • #4
@gleem did you edit your post? Apologies, I didn't quite get what you were trying to say.

Now that I "solved" a similar type of question, I kinda sort understand the part of the post concerning the geneticist.

However, I still don't comprehend the difference between an association test and a homogeneity test.
I believe you confirmed my suspicion that the actual computation in a ##\chi ^2## test does a bad job of distinguishing the two.

Is it right for me to say that in both scenarios (association and homogeneity) we're trying to figure out the distribution of one category (handedness for example) with respect to another category (gender for example).

All of the above (which I suppose I can say that I know) doesn't help me make the distinction between ##\chi ^2## tests for association and ##\chi ^2## tests for homogeneity.
 
  • #5
gleem said:
divide by the DF
🤔 I was taught to use the DF in a table.
 
  • #6
Agent Smith said:
@gleem did you edit your post? Apologies, I didn't quite get what you were trying to say.

Sorry I took so long to try and clarify my post. I was having a difficult time putting my thoughts into words. As they say the distinction is subtle. Let me try this. You are looking at data of some relationships from two points of view. The analysis is the same. The H0s are different. The method of selection of the samples is different.

For the genetic experiment for association, you have one sample with N members that contain two groups male and female, and three categories. There is only one sample so no comparison can be made although you are comparing your sample to a hypothetical one based on the expected values. You lump all cells together in your mind. You assume (H0) that the difference between the observed and the expected values is from a different distribution, i.e., you expect a large Chi2/DF. For α =.05 that would be about 3. For our experiment Chi2/DF =1.438. We find that we cannot accept H0. The Chi2 is too small to be from different distributions at the 0.05 level.

For homogeneity, you select samples from separate groups. Your null hypothesis is that they are from the same distribution therefore you are looking for a small Chi2/DF. Notice that the ratio of estimates for each column is the same. That's why I was focusing on columns. In your mind, you separate the colors of one vehicle from that of the other and compare each to these ratios In this case the Chi2/DF is too large. For the 0.05 level, it would be 2.605. For our experiment, it is 4.186. Too much variation. So we reject the null hypothesis with this view in mind. For the homogeneity test the samples need not be the same size. For the genetic case, this ratio is 1.0833.

If you checked the genetics data for homogeneity you would determine that they were homogeneous to the 0.237 level so about 1 out of 4.2 samples would have given you at least this value.
 
  • #7
@gleem , gracias for the response, but I kinda get what you a ##\chi ^2## test is about. Also the difference in the number of samples with regard to tests of association and homogeneity is not lost on me. However in both cases we're measuring the distribution of (in this case) 2/more variables with respect to 2 other variables.
 
  • #8
I wouldn't say measuring but comparing. In the genetic example, you are comparing the values of expected distributions to those observed as an aggregate. It is possible that each Chi2 is related to a different distribution. You don't know, and it is irrelevant. Keep in mind the Chi2 statistic is non-parametric and therefore can be used for any distribution. In the vehicle example, you are comparing the distributions of the observed colors for each type of vehicle to an expected one. I think this is the clearest comparison.
 

Similar threads

Replies
1
Views
1K
Replies
2
Views
2K
Replies
5
Views
351
Replies
5
Views
3K
Replies
5
Views
3K
Replies
7
Views
2K
Replies
20
Views
3K
Back
Top