Discrete Random Variables - Geometric Distribution

In summary, the conversation discusses a question involving the probability of purchasing a cow with a rare genetic disease. The solution involves using the definition of the cumulative distribution function (CDF) for the geometric distribution and solving for the number of cows purchased until the first one with the disease. The conversation also includes a discussion about the different ways of approaching the problem and the importance of understanding the CDF.
  • #1
mrmt
6
0
Hi Guys,

Long time reader first time poster...

This simple question has stumped me all day and I think I've finally cracked it! I'm hoping someone can confirm that, or tell me how wrong I am - either is fine :)

One in 1000 cows have a rare genetic disease. The disease is not contagious, therefore cases are independent.

Let X be the number of cows purchased by a farmer. How many cows are purchased by the farmer until the 1st cow with the disease, given:

P(X≤r)=0.05
P(X≤r)=0.90


This is what I've done:

p = 1/1000 = 0.001 (? was unsure if this was in fact my p value)

P(X>r)=(1-p)^r

P(X≤r)=0.05 (given)

P(X≤r) + P(X>r) = 1 for geometric distribution

Therefore:

0.05 + (1-p)^r=1

0.05 + (1-0.001)^r=1

0.999^r=0.95

ln(0.999^r)=ln(0.95)

r≈51

And same again for P(X≤r)=0.90

Can someone tell me if I'm heading in the right direction - or is there a better way?

Thanks
 
Physics news on Phys.org
  • #2
Hey mrmt and welcome to the forums.

You are right in that the distribution is geometric, but I think you are not using the right definition for the cumulative distribution CDF for the geometric distribution.

According to wikipedia, the CDF of the cumulative distribution is given by:

P(X <= x) = 1 - (1-p)^x

In the above examples you're given x = r and for this r you're also given the cumulative probability, so you have to figure out r. This means for the 0.05 case we do:

P(X <= r) = 1 - (1-0.001)^r = 0.05 which implies
0.95 = (1-0.001)^r = 0.999^r
log_0.999(0.95) = r = ln(0.95)/ln(0.999) ~ 52 (rounded up)

For the other one we get

P(X <= r) = 1 - (0.999)^r = 0.90
log_0.999(0.1) = r = ln(0.1)/ln(0.999) ~ 2302 (rounded up)

Remember always round up for these kinds of problems.

You got the right answer but I feel you got it for the wrong reasons. Remember that the CDF is well defined for this problem and we need to use that definition: there is no need to split the probabilities like you have done and I think it's the wrong way to think about it. Remember that you are using the definition of the CDF directly and don't try and improvise.

If you need to understand my comments, then I will do my best to answer your questions.
 
  • #3
Thank you chiro,

Your response helped me greatly and solidified my (lack of) understanding of a cdf.

However, in the interest of discussion/further learning only I do have take you up on one point:

You got the right answer but I feel you got it for the wrong reasons. Remember that the CDF is well defined for this problem and we need to use that definition: there is no need to split the probabilities like you have done and I think it's the wrong way to think about it. Remember that you are using the definition of the CDF directly and don't try and improvise.

The definition of a cdf is in fact a splitting of the probabilities. From wikipedia (which I'm now going to use more often because of your advice) a cdf "describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x"

i.e splitting the probabilities, as you put it, into two categories - what is ≤ x and what is > x.

I think you'll find that my working, although some of which was unnecessary given a solid understang of a cdf and where to find it, is still in fact the cdf for geometric distribution slightly rearranged:

P(X≤r) + P(X>r) = 1

P(X≤r) = 1 - P(x>r)

= 1 - (1-p)^r

I guess my slightly ego driven point is that there is no "right" way to view problems, even in mathematics...
 
  • #4
mrmt said:
Thank you chiro,

The definition of a cdf is in fact a splitting of the probabilities. From wikipedia (which I'm now going to use more often because of your advice) a cdf "describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x"

i.e splitting the probabilities, as you put it, into two categories - what is ≤ x and what is > x.

I think you'll find that my working, although some of which was unnecessary given a solid understang of a cdf and where to find it, is still in fact the cdf for geometric distribution slightly rearranged:

You are right in your definition for when you split up the probabilities using complementarity, but the only reason I made the comment I made was that you might get in situation where it is misused. This was an interpretation on my part and it may have standing or it may not. The important point was to treat the CDF in this context as a general thing for all events which means that you don't need to split things up. The real crux for my response was that I didn't understand why you split it up because it was unnecessary for solving the problem. It doesn't mean that you don't necessarily understand what you are doing but from my point of view it was unnecessary and as a consequence I generated a thought that you may be misunderstanding either the question or probability. Also remember correlation doesn't imply causation.

Don't stress about my comments for this issue though because if you understand the answer and agree with it, then whatever you did as long as you can put it into context with my suggestion should be enough. As you said mathematics can have multiple ways of getting to the answer and all of these are just as correct as one another and that is more true than most people (sometimes even mathematicians) realize.
 

Related to Discrete Random Variables - Geometric Distribution

1. What is a discrete random variable?

A discrete random variable is a type of random variable that can only take on a finite or countably infinite number of possible values. These values are often represented by integers.

2. What is the geometric distribution?

The geometric distribution is a probability distribution that models the number of trials needed to achieve the first success in a series of independent trials with a constant probability of success on each trial. It is often used to model the number of failures before the first success in a series of repeated trials.

3. What is the formula for the geometric distribution?

The formula for the geometric distribution is P(X = x) = (1-p)^{x-1}p, where x represents the number of trials needed to achieve the first success and p represents the probability of success on each trial.

4. How does the geometric distribution differ from the binomial distribution?

While both the geometric distribution and the binomial distribution are used to model the outcomes of repeated independent trials, the geometric distribution only looks at the number of trials needed to achieve the first success, while the binomial distribution looks at the number of successes in a fixed number of trials.

5. What is the expected value of a geometric random variable?

The expected value of a geometric random variable is equal to 1/p, where p is the probability of success on each trial. This means that on average, it takes 1/p trials to achieve the first success in a series of repeated trials.

Similar threads

  • Calculus and Beyond Homework Help
Replies
9
Views
2K
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
881
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
347
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
13
Views
1K
Replies
12
Views
879
  • Precalculus Mathematics Homework Help
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
6
Views
2K
Back
Top