Estimating the Mean Christmas Spending: A Comparison of Two Sampling Methods

  • MHB
  • Thread starter mathmari
  • Start date
In summary, two employees, A and B, discuss the best procedure for estimating the average amount of money adults give for Christmas presents. A suggests questioning a random sample of people and using their average spend as the estimate, while B proposes separately surveying a sample of employed and unemployed people and using their respective averages. Both estimators are unbiased, but B's variance is smaller and thus considered a better estimate.
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey! :eek:

The variable $ Y $ denotes the amount of money that an adult person gives out for Christmas presents.
The distribution of $ Y $ depends on whether the person is employed ($ E = 1 $) or not ($ E = 0 $).
It holds that $ P (E = 1) = p $, i.e a randomly selected person is employed with probability $ p $.

We have the following
\begin{align*}&E(Y\mid E=1)=\mu_1 \\ &V(Y\mid E=1)=\sigma_1^2 \\ &E(Y\mid E=0)=\mu_0 \\ &V(Y\mid E=0)=\sigma_0^2 \\ &E(Y)=\mu=p\mu_1+(1-p)\mu_0 \\ &V(Y)=\sigma^2=p\sigma_1^2+(1-p)\sigma_0^2+G\end{align*}
where \begin{equation*}G=p(\mu_1-\mu)^2+(1-p)(\mu_0-\mu)^2\geq 0\end{equation*}

A research institute would like to estimate $\mu $ based on a $ n $-sized sample. The parameter $ p $ is known to the institute. Two employees of the institute, A and B, discuss the procedure.

  • A suggests questioning $n$ randomly selected people and using their average spend as an estimate for $\mu $.
  • B proposes to separately survey $ n p $ employed persons and $ n (1 - p) $ unemployed persons, and then use the estimator \begin{equation*} \overline{Y}_B = p \overline{Y}_1 + (1-p) \overline{Y}_0 \end{equation*} $ \overline {Y}_1 $ and $ \overline{Y}_0 $ are the average spend of the employed and non-employed persons, respectively. For the sake of simplicity, we assume that $ n p $ and $ n (1 - p) $ are integers.
If I understand correcly the proposition of B, we have a sample of soze $n$ with $np$ employed and $n(1-p)$ unemployed. $Y_{1i}$ is the answer that the employed perosn $i$ gives and $Y_{0i}$ is the answer that the unemployed person $i$ gives. We calculate the mean of what the employed people spend, according to the survey, and we define that average $\overline{Y}_1$, i.e. $ \overline{Y}_1=\frac{1}{np}\sum_{i=1}^{np}Y_{1i}$. Respectively, it holds that $ \overline{Y}_0=\frac{1}{n(1-p)}\sum_{i=1}^{n(1-p)}Y_{0i}$.
Adding these two results multiplied by the respective possibility we get the average of all people.

Have I understood that correctly? (Wondering) I want to calculate the expected values and the variances of the estimates of A and B.

How could we do that? Could you give me a hint? (Wondering)
 
Physics news on Phys.org
  • #2
Hey mathmari! (Wave)

For A we have $E(\overline {Y_A}) = E(Y)$ and $\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n}$ don't we?
And for B we have $E(\overline {Y_B}) = E(Y)$ and $\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}$ don't we? (Wondering)
 
  • #3
I like Serena said:
For A we have $E(\overline {Y_A}) = E(Y)$ and $\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n}$ don't we?
And for B we have $E(\overline {Y_B}) = E(Y)$ and $\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}$ don't we? (Wondering)

Does it hold that $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ because $\overline {Y_A}$ and $\overline {Y_B}$ describes respectlively the average amount of money?

Why do we not use here that $\overline{Y}_B=p \overline{Y_1} + (1-p) \overline{Y_0}$ ? From $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ we get that both estimators are unbiased, right? (Wondering) To check which estimate is better we have to compare the two variances, right? The variance of the estimate A is equal to the variance of the median of the amounts of money. Does this mean that this is better than the variance of the estaimate B?
Or can we not compare them? (Wondering)
 
  • #4
mathmari said:
Does it hold that $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ because $\overline {Y_A}$ and $\overline {Y_B}$ describes respectlively the average amount of money?

Why do we not use here that $\overline{Y}_B=p \overline{Y_1} + (1-p) \overline{Y_0}$ ?

If follows mathematically.
Let's go through the steps for B, using indeed the formula for $\overline{Y_B}$.
$$
E(\overline{Y_B})=E\left(p\overline{Y_1}+(1−p)\overline{Y_0}\right)
=pE(Y_1)+(1−p)E(Y_0)
=p\mu_1 + (1-p)\mu_0
=\mu
= E(Y)
$$
Yes? (Thinking)

mathmari said:
From $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ we get that both estimators are unbiased, right?

Yep.

mathmari said:
To check which estimate is better we have to compare the two variances, right? The variance of the estimate A is equal to the variance of the median of the amounts of money. Does this mean that this is better than the variance of the estaimate B?
Or can we not compare them? (Wondering)

Let's compare them.

$$\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n} = \frac{p\sigma_1^2+(1-p)\sigma_0^2+G}{n} \\
\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}
=\frac{p\sigma_1^2 + (1-p)\sigma_0^2}{n}
$$
So $\sigma^2(\overline {Y_B})$ is smaller than $\sigma^2(\overline {Y_A})$ isn't it? (Wondering)

And if the standard deviations $\sigma_1$ and $\sigma_0$ are comparable or smaller than $|\mu_1 - \mu_0|$, then the standard deviation of B will be much smaller than the one of A.
That's assuming that both $p$ and $1-p$ are significantly greater than 0.
 
  • #5
I like Serena said:
If follows mathematically.
Let's go through the steps for B, using indeed the formula for $\overline{Y_B}$.
$$
E(\overline{Y_B})=E\left(p\overline{Y_1}+(1−p)\overline{Y_0}\right)
=pE(Y_1)+(1−p)E(Y_0)
=p\mu_1 + (1-p)\mu_0
=\mu
= E(Y)
$$

Ah ok!
I like Serena said:
Let's compare them.

$$\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n} = \frac{p\sigma_1^2+(1-p)\sigma_0^2+G}{n} \\
\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}
=\frac{p\sigma_1^2 + (1-p)\sigma_0^2}{n}
$$
So $\sigma^2(\overline {Y_B})$ is smaller than $\sigma^2(\overline {Y_A})$ isn't it? (Wondering)

And if the standard deviations $\sigma_1$ and $\sigma_0$ are comparable or smaller than $|\mu_1 - \mu_0|$, then the standard deviation of B will be much smaller than the one of A.
That's assuming that both $p$ and $1-p$ are significantly greater than 0.
So, since the variance of A is bigger than that of B, it becomes clear that according to A's estimates, there will be larger fluctuations as eith the estimation of B. Thus, the estimate of B is better, isn't it? (Wondering)
 
  • #6
mathmari said:
Ah ok!

So, since the variance of A is bigger than that of B, it becomes clear that according to A's estimates, there will be larger fluctuations as eith the estimation of B. Thus, the estimate of B is better, isn't it?

Yep. (Nod)
 
  • #7
I like Serena said:
Yep. (Nod)

Ok! Thank you very much! (Yes)
 

Similar threads

Replies
10
Views
2K
Replies
4
Views
2K
Replies
6
Views
3K
2
Replies
61
Views
9K
2
Replies
46
Views
6K
Replies
1
Views
4K
Back
Top