Estimating Population Variance from Observations

ghostyc · Nov 30, 2009

Suppose the population variance [tex]\theta[/tex] is to be estimated from observations [tex] Y_1, Y_2, \dots, Y_n [/tex] using

[tex] \hat{\theta} = \left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) [/tex]

where[tex] \bar{Y} [/tex] is the mean.

when it didnt say which distribution [tex]Y_i[/tex]s are?

thanks

SW VandeCarr · Nov 30, 2009

ghostyc said:

population variance [tex]\theta[/tex]

estimated using [tex]\hat{\theta}=\frac{1}{n}\sum_{i=1}^n (Y_i-\bar{Y})^2[/tex]

how do find the bias of [tex]\hat{\theta}[/tex]

when it didnt say which distribution [tex]Y_i[/tex]s are?

thanks

Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.

ghostyc · Nov 30, 2009

SW VandeCarr said:

Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.

hmmmmmm

that's how far i can go ...

[tex] \frac{1}{n} E \left( \sum_{i=1}^n Y_i^2 - n \bar{Y}^2 \right) [/tex]

and i think from CLT i have [tex] \bar{Y} \sim N\left(\mu,\frac{\theta}{n} \right) [/tex]

then i am stuck...
thanks

statdad · Nov 30, 2009

"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, [tex] \overline X [/tex] is an unbiased estimator of [tex] \mu [/tex], since

[tex]
E(\overline X) = \mu
[/tex]

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of [tex] \theta [/tex], the population variance

2) The bias is the difference between the result of step 1 and [tex] \theta [/tex]

ghostyc · Nov 30, 2009

statdad said:

"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, [tex] \overline X [/tex] is an unbiased estimator of [tex] \mu [/tex], since

[tex]
E(\overline X) = \mu
[/tex]

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of [tex] \theta [/tex], the population variance

2) The bias is the difference between the result of step 1 and [tex] \theta [/tex]

Hey there,

i know what 'bias' means.
the only problem is that i can't simplify that expression as i did in the previous any more
i have tried varies ways...
and i know it looks like a simple question , i just can't get it right...
maybe i will try tmr with my fresh brain LOL

thanks

SW VandeCarr · Nov 30, 2009

statdad said:

[tex]
E(\overline X) = \mu
[/tex]

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of [tex] \theta [/tex], the population variance

2) The bias is the difference between the result of step 1 and [tex] \theta [/tex]

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know [tex]\mu,\theta[/tex]. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

statdad · Nov 30, 2009

sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

[tex]
E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2]
[/tex]

First: since [tex] Y_1, Y_2, \dots, Y_n [/tex] are independent and identically distributed,

[tex]
E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n
[/tex]

so all the terms in the sum are equal.

Second:

[tex]
Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y)
[/tex]

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.

statdad · Nov 30, 2009

SW VandeCarr said:

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know [tex]\mu,\theta[/tex]. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

[tex]
\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2
[/tex]

is a biased estimator of [tex] \sigma^2 [/tex].

ghostyc · Nov 30, 2009

SW VandeCarr said:

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know [tex]\mu,\theta[/tex]. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

i am totally lost now...

are you impliying that it's an UNBIASED estimator of the variance?

let me check my question in the textbook...

there is a follow up question says,

When a Bootstrap sample of size n is taken, the Bootstrap estimate [tex]\hat{\theta^*}[/tex] is a biased estimator of [tex]\hat{\theta}[/tex]. State its bias.

from what i guess, the first one should be biasd?

ghostyc · Nov 30, 2009

statdad said:

sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

[tex]
E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2]
[/tex]

First: since [tex] Y_1, Y_2, \dots, Y_n [/tex] are independent and identically distributed,

[tex]
E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n
[/tex]

so all the terms in the sum are equal.

Second:

[tex]
Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y)
[/tex]

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.

to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance [tex]\theta[/tex] is to be estimated from observations [tex] Y_1, Y_2, \dots, Y_n [/tex] using

[tex] \hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 [/tex]

where[tex] \bar{Y} [/tex] is the mean.no mention of i.i.d at all...

maybe it's supposed to mean that..thanks

SW VandeCarr · Nov 30, 2009

statdad said:

No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

[tex]
\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2
[/tex]

is a biased estimator of [tex] \sigma^2 [/tex].

You are correct that this estimator is biased. I was thinking of the estimator with Bessel's correction as an unbiased estimator of population variance.

http://mathworld.wolfram.com/BesselsCorrection.html

http://en.wikipedia.org/wiki/Bessel's_correction

statdad · Nov 30, 2009

ghostyc said:

to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance [tex]\theta[/tex] is to be estimated from observations [tex] Y_1, Y_2, \dots, Y_n [/tex] using

[tex] \hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 [/tex]

where[tex] \bar{Y} [/tex] is the mean.

no mention of i.i.d at all...

maybe it's supposed to mean that..

thanks

The estimator you have is biased - the goal of your problem is to find an expression for that bias. And yes, the context of this problem is that the Ys are i.i.d.

The greater picture is this: while it is true that [tex] \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 [/tex] converges in probability to [tex] \theta [/tex], (it is consistent) the fact that it is biased means that when you take repeated samples and calculate the sample variance for each, the mean of those sample variances will not converge to [tex] \theta [/tex].

The estimator

[tex]
\frac{1}{n-1} \sum_{i=1}^n (Y_i - \overline Y)^2
[/tex]

is both unbiased and consistent.

ghostyc · Dec 7, 2009

at last, i think i got it

[tex]\operatorname{E}(\hat{\theta})=\frac{1}{n}\left(\sum_{i=1}^n \theta - n\frac{\theta}{n} \right)=\frac{1}{n}(n\theta-\theta)=\theta-\frac{\theta}{n} \quad \implies \quad \operatorname{Bias}(\hat{\theta})=\operatorname{E}(\hat{\theta})-\theta=\frac{\theta}{n}[/tex]

Estimating Population Variance from Observations

FAQ: Estimating Population Variance from Observations

What is population variance and why is it important to estimate it?

How do you calculate population variance from observations?

What is the difference between population variance and sample variance?

How do you interpret the value of population variance?

Can population variance be negative?

Similar threads

Hot Threads

Recent Insights