Estimating Population Variance from Observations

In summary: The bias is the difference between the expectation of the sample variance and the population variance.In summary, the population variance \theta can be estimated using the sample variance \hat{\theta}=\frac{1}{n}\sum_{i=1}^n (Y_i-\bar{Y})^2. The bias of this estimator arises when the population mean is not known, as it is the difference between the expectation of the sample variance and the population variance. The expectation of the sample variance can be simplified by using the fact that the observations are independent and identically distributed, and by expressing the terms in the sum in a different form.
  • #1
ghostyc
26
0
Suppose the population variance [tex]\theta[/tex] is to be estimated from observations [tex] Y_1, Y_2, \dots, Y_n [/tex] using

[tex] \hat{\theta} = \left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) [/tex]

where[tex] \bar{Y} [/tex] is the mean.

when it didnt say which distribution [tex]Y_i[/tex]s are?

thanks
 
Last edited:
Physics news on Phys.org
  • #2
ghostyc said:
population variance [tex]\theta[/tex]

estimated using [tex]\hat{\theta}=\frac{1}{n}\sum_{i=1}^n (Y_i-\bar{Y})^2[/tex]

how do find the bias of [tex]\hat{\theta}[/tex]

when it didnt say which distribution [tex]Y_i[/tex]s are?

thanks

Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.
 
Last edited:
  • #3
SW VandeCarr said:
Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.

hmmmmmm

that's how far i can go ...

[tex] \frac{1}{n} E \left( \sum_{i=1}^n Y_i^2 - n \bar{Y}^2 \right) [/tex]

and i think from CLT i have [tex] \bar{Y} \sim N\left(\mu,\frac{\theta}{n} \right) [/tex]

then i am stuck...
thanks
 
  • #4
"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, [tex] \overline X [/tex] is an unbiased estimator of [tex] \mu [/tex], since

[tex]
E(\overline X) = \mu
[/tex]

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of [tex] \theta [/tex], the population variance

2) The bias is the difference between the result of step 1 and [tex] \theta [/tex]
 
  • #5
statdad said:
"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, [tex] \overline X [/tex] is an unbiased estimator of [tex] \mu [/tex], since

[tex]
E(\overline X) = \mu
[/tex]

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of [tex] \theta [/tex], the population variance

2) The bias is the difference between the result of step 1 and [tex] \theta [/tex]

Hey there,

i know what 'bias' means.
the only problem is that i can't simplify that expression as i did in the previous any more
i have tried varies ways...
and i know it looks like a simple question , i just can't get it right...
maybe i will try tmr with my fresh brain LOL

thanks
 
  • #6
statdad said:
[tex]
E(\overline X) = \mu
[/tex]

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of [tex] \theta [/tex], the population variance

2) The bias is the difference between the result of step 1 and [tex] \theta [/tex]

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know [tex]\mu,\theta[/tex]. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?
 
  • #7
sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

[tex]
E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2]
[/tex]

First: since [tex] Y_1, Y_2, \dots, Y_n [/tex] are independent and identically distributed,

[tex]
E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n
[/tex]

so all the terms in the sum are equal.

Second:

[tex]
Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y)
[/tex]

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.
 
  • #8
SW VandeCarr said:
I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know [tex]\mu,\theta[/tex]. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

[tex]
\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2
[/tex]

is a biased estimator of [tex] \sigma^2 [/tex].
 
  • #9
SW VandeCarr said:
I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know [tex]\mu,\theta[/tex]. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

i am totally lost now...

are you impliying that it's an UNBIASED estimator of the variance?

let me check my question in the textbook...

there is a follow up question says,

When a Bootstrap sample of size n is taken, the Bootstrap estimate [tex]\hat{\theta^*}[/tex] is a biased estimator of [tex]\hat{\theta}[/tex]. State its bias.

from what i guess, the first one should be biasd?
 
  • #10
statdad said:
sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

[tex]
E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2]
[/tex]

First: since [tex] Y_1, Y_2, \dots, Y_n [/tex] are independent and identically distributed,

[tex]
E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n
[/tex]

so all the terms in the sum are equal.

Second:

[tex]
Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y)
[/tex]

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.
to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance [tex]\theta[/tex] is to be estimated from observations [tex] Y_1, Y_2, \dots, Y_n [/tex] using

[tex] \hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 [/tex]

where[tex] \bar{Y} [/tex] is the mean.no mention of i.i.d at all...

maybe it's supposed to mean that..thanks
 
  • #11
statdad said:
No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

[tex]
\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2
[/tex]

is a biased estimator of [tex] \sigma^2 [/tex].

You are correct that this estimator is biased. I was thinking of the estimator with Bessel's correction as an unbiased estimator of population variance.

http://mathworld.wolfram.com/BesselsCorrection.html

http://en.wikipedia.org/wiki/Bessel's_correction
 
  • #12
ghostyc said:
to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance [tex]\theta[/tex] is to be estimated from observations [tex] Y_1, Y_2, \dots, Y_n [/tex] using

[tex] \hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 [/tex]

where[tex] \bar{Y} [/tex] is the mean.


no mention of i.i.d at all...

maybe it's supposed to mean that..


thanks

The estimator you have is biased - the goal of your problem is to find an expression for that bias. And yes, the context of this problem is that the Ys are i.i.d.

The greater picture is this: while it is true that [tex] \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 [/tex] converges in probability to [tex] \theta [/tex], (it is consistent) the fact that it is biased means that when you take repeated samples and calculate the sample variance for each, the mean of those sample variances will not converge to [tex] \theta [/tex].

The estimator

[tex]
\frac{1}{n-1} \sum_{i=1}^n (Y_i - \overline Y)^2
[/tex]

is both unbiased and consistent.
 
  • #13
at last, i think i got it

[tex]\operatorname{E}(\hat{\theta})=\frac{1}{n}\left(\sum_{i=1}^n \theta - n\frac{\theta}{n} \right)=\frac{1}{n}(n\theta-\theta)=\theta-\frac{\theta}{n} \quad \implies \quad \operatorname{Bias}(\hat{\theta})=\operatorname{E}(\hat{\theta})-\theta=\frac{\theta}{n}[/tex]
 

FAQ: Estimating Population Variance from Observations

What is population variance and why is it important to estimate it?

Population variance is a measure of how spread out a population's data is from its mean. It is important to estimate because it allows us to understand the variability and distribution of the population's data, which can help in making informed decisions and predictions.

How do you calculate population variance from observations?

To calculate population variance from observations, you first need to calculate the mean of the data. Then, for each observation, subtract the mean from the observation and square the result. Next, find the sum of all the squared differences and divide it by the total number of observations. This will give you the population variance.

What is the difference between population variance and sample variance?

Population variance is calculated using data from an entire population, while sample variance is calculated using data from a smaller subset of the population. Sample variance is used when the entire population data is not available, and it is an estimate of the population variance.

How do you interpret the value of population variance?

The value of population variance represents the average squared distance of the population's data points from the mean. A higher value indicates that the data points are more spread out, while a lower value indicates that the data points are closer to the mean. It is important to consider other measures of variability, such as standard deviation, in conjunction with population variance when interpreting data.

Can population variance be negative?

No, population variance cannot be negative. It is always a non-negative value, as it represents the squared differences from the mean. If the calculated value is negative, then it is an indication of an error in the calculation.

Similar threads

Replies
39
Views
1K
Replies
7
Views
2K
Replies
1
Views
876
Replies
1
Views
1K
Replies
2
Views
2K
Back
Top