- #1
eskimotaro
- 15
- 1
Hello everyone. I have been given a problem in my Introductory Mathematical Statistics class. Been thinking about this one for a while and I am simply stuck.
1. Homework Statement
"There has been found a DNA of type S on a crime scene. We will assume a total population of N = 5000000 that are potential contributors to the lead. Next assume there is a DNA-database consisting of n = 30000 individuals. Also assume that there are M = 50 individuals in the whole population that have a DNA of type S."
There are six sub-questions (a)-(f), and I am stuck on (d)-(f). I will simply explain what questions (a)-(c) are, and then write up questions (d)-(f).
2. The attempt at a solution [part 1]
In (a) we let X = the number of individuals with type S in the database. Here I am to find the probability distribution of X. I think that the sample space must be x = {0, 1, 2, ..., 50}. To calculate the distribution of x I have used MATLAB and a hypergeometric distribution formula. That was no problem.
In (b) I am to use a binomic distribution formula instead to calculate the probability distribution of X, that was also not much of a problem.
For (c) I am just asked to calculate P(X = 1), which was just to take the relevant calculation from (a) or (b). P(X = 1) is approximately 0.22.
3. Sub-questions
Here are the sub-questions (d)-(f) which I am stuck on:
"(d) Assume that every individual in the population have the same likelihood of being a contributor. Let A be the event that the contributor is one of the individuals in the database. Calculate P(A).
(e) Find P(X = 1 | A).
Hint: When we know that the contributor is in the database, then there are M - 1 = 49 left who we do not know is in the database or not. Argue that we then are interested in the probability that none of these are in the database.
(f) Find P(A | X = 1). Argue that this corresponds to the probability that the individual with matchin DNA profile in the database is the culprit."
4. The attempt at a solution [part 2]
I have just not been able to get past these questions. For (d) I think that P(A) might be 1/30000, because that's simply how I interpret the question.
So I would be forever grateful if anyone could give me tips on how to solve this. Excuse my language if anything is unclear; English is my second language.
1. Homework Statement
"There has been found a DNA of type S on a crime scene. We will assume a total population of N = 5000000 that are potential contributors to the lead. Next assume there is a DNA-database consisting of n = 30000 individuals. Also assume that there are M = 50 individuals in the whole population that have a DNA of type S."
There are six sub-questions (a)-(f), and I am stuck on (d)-(f). I will simply explain what questions (a)-(c) are, and then write up questions (d)-(f).
2. The attempt at a solution [part 1]
In (a) we let X = the number of individuals with type S in the database. Here I am to find the probability distribution of X. I think that the sample space must be x = {0, 1, 2, ..., 50}. To calculate the distribution of x I have used MATLAB and a hypergeometric distribution formula. That was no problem.
In (b) I am to use a binomic distribution formula instead to calculate the probability distribution of X, that was also not much of a problem.
For (c) I am just asked to calculate P(X = 1), which was just to take the relevant calculation from (a) or (b). P(X = 1) is approximately 0.22.
3. Sub-questions
Here are the sub-questions (d)-(f) which I am stuck on:
"(d) Assume that every individual in the population have the same likelihood of being a contributor. Let A be the event that the contributor is one of the individuals in the database. Calculate P(A).
(e) Find P(X = 1 | A).
Hint: When we know that the contributor is in the database, then there are M - 1 = 49 left who we do not know is in the database or not. Argue that we then are interested in the probability that none of these are in the database.
(f) Find P(A | X = 1). Argue that this corresponds to the probability that the individual with matchin DNA profile in the database is the culprit."
4. The attempt at a solution [part 2]
I have just not been able to get past these questions. For (d) I think that P(A) might be 1/30000, because that's simply how I interpret the question.
So I would be forever grateful if anyone could give me tips on how to solve this. Excuse my language if anything is unclear; English is my second language.