Goodness of fit: How to decide which ratio to deal with?

  • Thread starter Tyto alba
  • Start date
  • Tags
    Fit Ratio
In summary: I'll look into it more. In summary, the problem statement, all variables and given/known data suggests that a researcher should consider a ratio to find the test statistic from given set of observations, but there is confusion on which ratio to use.
  • #1
Tyto alba
62
0
The problem statement, all variables and given/known data with attempts
While solving problems of Goodness of fit, I'm faced with an issue, how to decide which ratio to consider to find the test statistic from given set of observations.
E.g. 1
A supplied sample contains four types of seeds and the total number is 64. The types of seeds are large red 42, large white 8, small red 10 and small white 4. Calculate goodness of fit.

Attempts & Problem: As df=3, Ratio= 9:3:3:1 / 1:1:1:1?

E.g.2
You are supplied with two different varieties of plant samples;tall-76 and short-24. Determine the observed number, apply Chi square test to state whether it is in agreement with expected ratio.

Attempts & Problem: As df=1, Ratio= 3:1 /1:1?

I was vaguely told by my professor that which ever ratio seems to apply(by logical guess) we should choose that one to determine the expected values and thus the statistic.

I couldn't find any good read in this regard, except those that are full of mistakes. I've been reading Statistic blogs to understand the concepts but they didn't cover these typical biological problems and those that did had the ratio mentioned.

I've another question in mind, from experimental result it is also likely to happen that we won't get one of the four types of seeds (give that mating is random and the progenies appeared by dihybrid crosses) so determining the actual ratio behind becomes more difficult as df = 2 =/= 3!
 
Last edited:
Physics news on Phys.org
  • #2
The Chi-squared goodness of fit test will test how well the data fits a hypothesized theoretical distribution. So you need to hypothesize a theoretical distribution. In problem 1, I can think of 5 possibilities:
1: Colors and sizes of seed equally likely and independent of each other. That would give 16 seeds expected in each category (small red, small white, large red, large white)
2: Colors as the data shows, sizes equally likely, independent: That would give expected totals of 52 red, 12 white, 32 large and 32 small ( 26 red small, 26 red large, 6 white small, 6 white large)
3: Colors equally likely, sizes as the data shows, independent: That would give expected totals of 32 red, 32 white, 50 large and 14 small (25 red large, 25 white large, 7 red small, 7 white small)
4: Colors and sizes as the data shows, independent: That would give expected totals of 52 red, 12 white, 50 large, 14 small (40.625 red large, 11.375 red small, 9.375 white large, 2.625 white small) rounded to (41 red large, 11 red small, 9 white large, 3 white small)
5: Colors and sizes as the data shows and they are dependent: This would be the same as the sample data and there is nothing to test the data against.

I would pick the first option simply because nothing in it is derived from the sample data. Basing any part of the theoretical distribution on the sample data is complicated and not covered by any statistical test that I know of.
 
  • #3
FactChecker said:
I can think of 5 possibilities:
Hi @FactChecker:

I think you omitted four other plausible models based on four plausible Mendelian assumptions:
(A) Color has Red (R) recessive and White (W) dominant, or vice versa. (i) With R dominant, the ratio of R to W would be 3:1. (ii) With W dominant, the ratio would be 1:3.
(B) Size has Small (S) recessive and Large (L) dominant, or vice versa. (i) With S dominant, the ratio of S to L would be 3:1. (ii) With L dominant, the ratio would be 1:3.
For all four of these assumptions it would also be assumed that color and size are independent.
The four models are as follows.
6. Ai and Bi: RS 36, RL 12, WS 12, WL 4
7. Ai and Bii: RL 36, RS 12, WL 12, WS 4
8. Aii and Bi: WS 36, WL 12, RS 12, RL 4
9 Aii and Bii: WL 36, WS 12, RL 12, RS 4

Regards,
Buzz
 
  • Like
Likes Ygggdrasil and FactChecker
  • #4
Buzz Bloom said:
Hi @FactChecker:I think you omitted four other plausible models based on four plausible Mendelian assumptions:
Good point. I was only thinking in terms of general statistics and forgot about recessive / dominant. I don't really know about that.
 

FAQ: Goodness of fit: How to decide which ratio to deal with?

What is "goodness of fit" in statistics?

"Goodness of fit" is a statistical term that refers to how well a model or theory fits the observed data. It measures the extent to which the data is consistent with the expected values based on the model.

Why is it important to evaluate the goodness of fit?

Evaluating the goodness of fit is important because it helps us determine the validity and usefulness of a model or theory. If the model fits the data well, it can be used to make accurate predictions and draw meaningful conclusions. If the fit is poor, the model may need to be revised or discarded.

How do you decide which ratio to use when evaluating the goodness of fit?

The choice of ratio depends on the type of data and the specific model being used. Some commonly used ratios include the chi-square test, the F-test, and the coefficient of determination (R-squared). It is important to choose a ratio that is appropriate for the data and the research question at hand.

Can a model have a perfect fit?

In theory, it is possible for a model to have a perfect fit, meaning that the observed data perfectly matches the expected values. However, in practice, it is rare for a model to have a perfect fit, and some degree of discrepancy between the observed and expected values is expected. The goal is to have a good fit, rather than a perfect fit, that accurately represents the underlying relationship between the variables.

How do you interpret the results of a goodness of fit test?

The results of a goodness of fit test are typically reported as a p-value, which indicates the probability of obtaining the observed data if the model is true. A low p-value (usually less than 0.05) suggests that the model does not fit the data well, while a high p-value indicates that the model is a good fit. However, the interpretation of the p-value should also take into account the sample size and the chosen significance level.

Back
Top