Proof that an interval is a confidence interval for Geom(q)

In summary: I mean, I don't understand what I'm supposed to do with that result.In summary, the conversation is about a problem proving that a given function is a confidence interval for a parameter, using geometric distribution and the definition of a confidence interval. The conversation includes tips and equations that the person has attempted to use but is still unsure of how to proceed. They are looking for further guidance and tips.
  • #1
Alex_Doge
8
0
Hello Physicsforum

Homework Statement


I have a problem proving this:
Given [itex]C(x)=[0, 3/x][/itex] for all [itex]x\in\chi[/itex], with [itex]\chi=\Omega[/itex] being the sample space and [itex]P_q=Geom(q)[/itex] being the geometric distribution.

I have to show that C(x) is a confidence Interval for q but I don't know how to get started.

I've been given the tip [itex]P_q([0,3/q])=P_q(x\in[0,3/q])=P_q(\{1,2,\lfloor3/q\rfloor\})[/itex] and then use the geometric series. It also says that the function won't be steady and that I should nest it between two steady ones.

Homework Equations


The definition of a confidence interval [itex]P_q(u(X)<q<v(X))=\gamma[/itex] for all [itex]q\in(0,1][/itex] and [itex]\gamma[/itex] close to near 1.
Geometric summation formula and sigma additivity for disjoint sets.

The Attempt at a Solution


I tried using the definition but don't know how to continue. I think I have to prove the equalities:
[itex]P_q(u(X)<q)=\gamma[/itex] and [itex]P_q(q<v(X))=\gamma[/itex] but I don't know what I'm supposed to use for X. And I don't know what they mean with functions. I can't seem to see any dependency of a variable anywhere.
Any tips are very welcome!

Kind regards
Alex
 
Physics news on Phys.org
  • #2
Alex_Doge said:
Hello Physicsforum

Homework Statement


I have a problem proving this:
Given [itex]C(x)=[0, 3/x][/itex] for all [itex]x\in\chi[/itex], with [itex]\chi=\Omega[/itex] being the sample space and [itex]P_q=Geom(q)[/itex] being the geometric distribution.

I have to show that C(x) is a confidence Interval for q but I don't know how to get started.

I've been given the tip [itex]P_q([0,3/q])=P_q(x\in[0,3/q])=P_q(\{1,2,\lfloor3/q\rfloor\})[/itex] and then use the geometric series. It also says that the function won't be steady and that I should nest it between two steady ones.

Homework Equations


The definition of a confidence interval [itex]P_q(u(X)<q<v(X))=\gamma[/itex] for all [itex]q\in(0,1][/itex] and [itex]\gamma[/itex] close to near 1.
Geometric summation formula and sigma additivity for disjoint sets.

The Attempt at a Solution


I tried using the definition but don't know how to continue. I think I have to prove the equalities:
[itex]P_q(u(X)<q)=\gamma[/itex] and [itex]P_q(q<v(X))=\gamma[/itex] but I don't know what I'm supposed to use for X. And I don't know what they mean with functions. I can't seem to see any dependency of a variable anywhere.
Any tips are very welcome!

Kind regards
Alex

This is most definitely a calculus problem, so does not belong in the precalculus forum.

Here is how I would approach it. I would operate as a Bayesian, and suppose the Geometic parameter ##q## is governed by a prior distribution ##f_0(q)## for ##0 <q<1##. Let ##X = 1,2,3, \ldots## be the Geometric random variable under observation. The probability of seeing ##X = k##, given a value of ##q##, is
[tex] P(X = k\,| \, q) = q\,(1-q)^{k-1}, \: k = 1,2,3, \ldots [/tex]
The posterior probability density of ##q##, given the observation ##X = k##, is
[tex] f(q \,| \, k) = \frac{f_0(q) P(X=k\, | \, q)}{P(k)}, [/tex]
where
[tex] P(k) = \int_0^1 f_0(q)P(X = k\,| \, q) \, dq = \int_0^1 f_0(q) \, q\,(1-q)^{k-1} \, dq [/tex]
Note that ##P(k)## is the prior probability of observing ##X=k##.

Things become much easier if we use the so-called uninformative prior, which in this case means that ##f_0(q) = 1## is the uniform distribution on ##(0,1)##; that is, we assume initially that ##q## is equally likely to take any value between 0 and 1. Basically, we know nothing at all about ##q##, except that it must be between 0 and 1.

In this case we can do the integrals:
[tex] P(k) = \int_0^1 q (1-q)^{k-1} dq = \frac{1}{k(k+1)} , [/tex]
so the posterior probability density of ##q## is
##f(q|k) = k(k+1) q (1-q)^{k-1}, \; 0 < q < 1##.

You can now look at the interval ##(0,3/k)##. Clearly, the probability that the (random quantity) ##q## lies in ##(0,3/k)## is 1 for ##k = 1, 2, 3##. For ##k \geq 4## the probability that ##q## lies in ##(0,3/k)## is
[tex] P(0 < q < 3/k) = \int_0^{3/k} k(k+1) q (1-q)^{k-1} \, dq [/tex]
You can evaluate this as a function of ##k## and plot it out for ##k = 3, 4, 5, 6, \ldots ## to see if it is near 1 or not.
 
  • #3
First of all thanks for the detailed answer. Sorry I didn't know it would turn out to be a calculus problem.
The integral gives [itex]
P(0 < q < 3/k) = \int_0^{3/k} k(k+1) q (1-q)^{k-1} \, dq=\frac{12\cdot3^k k^{-k} (-1)^k(1-1/3k)^{k+1}+k-3}{k-3}
[/itex] It converges towards 0.8 if I'm not mistaken
upload_2016-1-28_22-12-58.png

What does this mean?
 
  • #4
Alex_Doge said:
First of all thanks for the detailed answer. Sorry I didn't know it would turn out to be a calculus problem.
The integral gives [itex]
P(0 < q < 3/k) = \int_0^{3/k} k(k+1) q (1-q)^{k-1} \, dq=\frac{12\cdot3^k k^{-k} (-1)^k(1-1/3k)^{k+1}+k-3}{k-3}
[/itex] It converges towards 0.8 if I'm not mistaken
View attachment 94952
What does this mean?

Why are you plotting it for negative values of ##k##? We need ##k = 1,2,3,4, \ldots##, so plotting it for ##k \geq 4## has meaning. Negative values of ##k## have no meaning at all in this problem.
 
  • #5
Oh sorry, I'm getting tired, been stuck on this problem all day now.
upload_2016-1-28_22-33-36.png

That's a strange plot. What does this mean then? :)
Is it something similar to a delta function, or have I made a mistake plotting it?
 
  • #6
Alex_Doge said:
Oh sorry, I'm getting tired, been stuck on this problem all day now.
View attachment 94954
That's a strange plot. What does this mean then? :)
Is it something similar to a delta function, or have I made a mistake plotting it?

Part of your problem is that you have a result for your integral that seems to work for all ##k##, but when you specify that ##k## is a positive integer, it simplifies a lot; in particular, the pesky factors ##(-1)^k## disappear, giving you a formula that works well for all positive values of ##k \geq 3## (no division by 0 anymore). Then it plots out nicely.
 
  • #7
Yea it looks better now:
upload_2016-1-28_23-6-34.png

But how do I continue from here?
 
  • #8
So because q lies in C with the probability 1, C is a confidence interval? For [itex]
k \geq 4
[/itex] the probability is not 1 for large k. Is that a problem?
 
  • #9
Alex_Doge said:
So because q lies in C with the probability 1, C is a confidence interval? For [itex]
k \geq 4
[/itex] the probability is not 1 for large k. Is that a problem?

A confidence interval (with confidence ##p \in (0,1)##) is an interval for which the probability is at least ##p## that it contains the unknown parameter of interest. So, if the parameter we want to estimate is ##q##, we want an interval that has a probability of at least ##p## to overlap the unknown ##q##.

In most problems there is not much difference between the Bayesian approach (with non-informative prior) that I outlined above, and the classical (non-Bayesian) confidence-interval method; the interpretations are different, but usually the computations are almost the same. However, that is not the case in your problem (because the alleged confidence interval is a bit unusual). So: the confidence-interval method will deliver different results in your problem.

In your case (without yet specifying ##p##) the claim is that for observation ##\{X=k\}## the interval ##(0,3/k)## overlaps ##q## with a probability of ##p## or more. Note that the interval overlaps ##q## if and only if ##q < 3/k##, so the probability is ##P(k/3 > q) = P(k < 3/q)##. For a geometric random variable ##X## with parameter ##q## this probability is
[tex] P(X < 3/q) = \sum_{k=1}^{\lfloor 3/q \rfloor} q (1-q)^{k-1} [/tex]
where ##\lfloor u \rfloor## is the greatest integer ##\leq u##.

The problem is asking you to figure out a value of ##p## (hopefully, near 1.0) that is a lower bound on that probability (so that you can be at least ##100 p\%## sure the interval contains the true parameter value).
 
  • Like
Likes Alex_Doge
  • #10
I calculated that Probability [itex]
P(X < 3/q) = \sum_{k=1}^{\lfloor 3/q \rfloor} q (1-q)^{k-1}
=q\sum_{k=1}^{\lfloor3/q\rfloor}(1-q)^{k-1}=(1-‌q)^{\lfloor3/q\rfloor}
[/itex]How do I calculate this lower bound? Like this: [itex](1-‌q)^{\lfloor3/q\rfloor}=1[/itex] and then solve for q?
Or do I minimize and maximize to find lower and upper bound?
 
Last edited:
  • #11
Alex_Doge said:
I calculated that Probability [itex]
P(X < 3/q) = \sum_{k=1}^{\lfloor 3/q \rfloor} q (1-q)^{k-1}
=q\sum_{k=1}^{\lfloor3/q\rfloor}(1-q)^{k-1}=(1-‌q)^{\lfloor3/q\rfloor}
[/itex]How do I calculate this lower bound? Like this: [itex](1-‌q)^{\lfloor3/q\rfloor}=1[/itex] and then solve for q?
Or do I minimize and maximize to find lower and upper bound?

Actually, ##\sum_{k=1}^n q (1-q)^{k-1} = 1 - (1-q)^n##, NOT ##(1-q)^n##.

You are not "solving for ##q##"; you do not know the value of ##q##, but want to know a value ##\alpha## (called ##p## before), such that
[tex] \sum_{k=1}^{\lfloor3/q\rfloor} q (1-q)^{k-1} \geq \alpha [/tex]
for all ##q \in (0,1)##. If it happens that ##\alpha## is "large" (near 1) then you have a useful ##100 \alpha \%## confidence interval.
 
  • #12
That means I have this:
[itex]\sum_{k=1}^{\lfloor3/q\rfloor} q (1-q)^{k-1} =1-(1-q)^{\lfloor3/q\rfloor}\geq \alpha[/itex] and need to find alpha.
Can I look at [itex]1-(1-q)^{\lfloor3/q\rfloor}[/itex] and see where it's minimum is, for [itex]
q \in (0,1)
[/itex], and then define alpha to be just lower? What do you mean by "large"?
Thanks for the help so far
 
  • #13
Thanks a lot for the help. I solved it now.
upload_2016-1-29_22-18-56.png

upload_2016-1-29_22-24-2.png

The red graph is the probability. The other two are the bounds of the floor function. The level is then 0.05.
The plots helped me understand it and gave the hint sense.
 

FAQ: Proof that an interval is a confidence interval for Geom(q)

What is a confidence interval?

A confidence interval is a range of values that is likely to contain the true value of a population parameter with a certain level of confidence. It is a statistical tool used to estimate the precision and accuracy of a sample statistic.

How is a confidence interval calculated?

A confidence interval is calculated by using a sample statistic, such as the mean or proportion, and a margin of error. The margin of error is based on the standard error of the sample statistic and the desired level of confidence. The formula for a confidence interval is: sample statistic ± margin of error.

What is the purpose of a confidence interval?

The purpose of a confidence interval is to provide a range of values that is likely to contain the true value of a population parameter. It helps to quantify the uncertainty in our estimate and allows us to make more accurate and reliable conclusions about the population.

How does a confidence interval relate to Geom(q)?

Geom(q) is a distribution used to model the number of trials needed before a success occurs in a series of independent Bernoulli trials. A confidence interval for Geom(q) would provide a range of values for the probability of success (q) in a population, with a certain level of confidence.

How is the confidence level chosen for a confidence interval?

The confidence level for a confidence interval is typically chosen by the researcher based on the level of certainty they want in their estimate. The most common confidence levels are 90%, 95%, and 99%. A higher confidence level means a wider interval, and a lower confidence level means a narrower interval.

Back
Top