High School Chi-Square Tests for Homogeneity & Association

Agent Smith · Oct 17, 2024

=============================================================================================

From the above example questions, we have ##2## different kinds of Chi-square tests.
1. One for homogeneity
2. One for association (independence/dependence).

The answer guide says that if we take ##1## sample, we're testing for association. The geneticist took ##1## sample of ##500## peeps. If we take ##2## samples then we're testing for homogeneity. The market researcher takes ##2## samples, 180 sedans and 180 trucks.

I can make some sense of a Chi-square test for homogeneity. We take ##2## samples i.e. there are ##2## different categories (in the questions above, cars and sedans) and see if the distribution of colors for these ##2## categories differ in a statistically significant way or not.

But I have trouble with Chi-square tests for association. In the above example, aren't we checking for the "distribution" of handedness (left/right/both) for the ##2## categories, men and women? They look same to me. Suppose I had taken, could I?, ##2## identical samples and got the exact same data. Does the Chi-square computation (which doesn't seem to distinguish the ##2## cases) now become one for homogeneity?

Gracias.

Agent Smith · Oct 18, 2024

gleem · Oct 18, 2024

Agent Smith said:

TL;DR Summary: Chi-square tests for homogeneity and association, how to tell the difference?

But I have trouble with Chi-square tests for association. In the above example, aren't we checking for the "distribution" of handedness (left/right/both) for the 2 categories, men and women? They look same to me. Suppose I had taken, could I?, 2 identical samples and got the exact same data. Does the Chi-square computation (which doesn't seem to distinguish the 2 cases) now become one for homogeneity?

Do you know how they arrived at the estimates and Chi-2s? If not I will show you below. But first, let me say the way I understand the chi-2 test for the association of mutually exclusive categories with two (or more) mutually exclusive categories is using the format called contingency tables a matrix with rows for one category and columns for the other.

So for the geneticist experiment, we would have

	rt -handedness	left-handedness	ambidextrous	marginal row sum
men	207	21	12	240
women	227	27	6	260
marginal col sum	434	48	18	total 500

Where the estimates for a cell are given by a marginal col sum of handedness containing that cell divided by the total number in the study then multiply that by the corresponding marginal row sum of gender containing that cell. e.g., for left-handed men the Estimate = (48/500)*240 = 23.04

For association, to determine the total Chi-2, you calculate the Chi-2 of each cell in a row and sum them. For the homogeneity, you sum the Chi-2 of each cell of the columns and sum them. Then sum those Chi-2 to get the total and divide by the DF.

Agent Smith · Oct 20, 2024

@gleem did you edit your post? Apologies, I didn't quite get what you were trying to say.

Now that I "solved" a similar type of question, I kinda sort understand the part of the post concerning the geneticist.

However, I still don't comprehend the difference between an association test and a homogeneity test.
I believe you confirmed my suspicion that the actual computation in a ##\chi ^2## test does a bad job of distinguishing the two.

Is it right for me to say that in both scenarios (association and homogeneity) we're trying to figure out the distribution of one category (handedness for example) with respect to another category (gender for example).

All of the above (which I suppose I can say that I know) doesn't help me make the distinction between ##\chi ^2## tests for association and ##\chi ^2## tests for homogeneity.

Agent Smith · Oct 20, 2024

gleem said:

divide by the DF

I was taught to use the DF in a table.

gleem · Oct 20, 2024

Agent Smith said:

@gleem did you edit your post? Apologies, I didn't quite get what you were trying to say.

Sorry I took so long to try and clarify my post. I was having a difficult time putting my thoughts into words. As they say the distinction is subtle. Let me try this. You are looking at data of some relationships from two points of view. The analysis is the same. The H0s are different. The method of selection of the samples is different.

For the genetic experiment for association, you have one sample with N members that contain two groups male and female, and three categories. There is only one sample so no comparison can be made although you are comparing your sample to a hypothetical one based on the expected values. You lump all cells together in your mind. You assume (H0) that the difference between the observed and the expected values is from a different distribution, i.e., you expect a large Chi2/DF. For α =.05 that would be about 3. For our experiment Chi2/DF =1.438. We find that we cannot accept H0. The Chi2 is too small to be from different distributions at the 0.05 level.

For homogeneity, you select samples from separate groups. Your null hypothesis is that they are from the same distribution therefore you are looking for a small Chi2/DF. Notice that the ratio of estimates for each column is the same. That's why I was focusing on columns. In your mind, you separate the colors of one vehicle from that of the other and compare each to these ratios In this case the Chi2/DF is too large. For the 0.05 level, it would be 2.605. For our experiment, it is 4.186. Too much variation. So we reject the null hypothesis with this view in mind. For the homogeneity test the samples need not be the same size. For the genetic case, this ratio is 1.0833.

If you checked the genetics data for homogeneity you would determine that they were homogeneous to the 0.237 level so about 1 out of 4.2 samples would have given you at least this value.

Agent Smith · Oct 21, 2024

@gleem , gracias for the response, but I kinda get what you a ##\chi ^2## test is about. Also the difference in the number of samples with regard to tests of association and homogeneity is not lost on me. However in both cases we're measuring the distribution of (in this case) 2/more variables with respect to 2 other variables.

gleem · Oct 21, 2024

I wouldn't say measuring but comparing. In the genetic example, you are comparing the values of expected distributions to those observed as an aggregate. It is possible that each Chi2 is related to a different distribution. You don't know, and it is irrelevant. Keep in mind the Chi2 statistic is non-parametric and therefore can be used for any distribution. In the vehicle example, you are comparing the distributions of the observed colors for each type of vehicle to an expected one. I think this is the clearest comparison.

High School Chi-Square Tests for Homogeneity & Association

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers