Distribution of Log of Variance

In summary: S^2_N) + \log(\frac{2}{N} \sum_{i=1}^{N} (Y_i - \mu)(\mu - \bar{Y})). Taking the exponential of both sides, we get S^2_N = \sigma^2e^{\log(\frac{2}{N} \sum_{i=1}^{N} (Y_i - \mu)(\mu - \bar{Y}))}. Finally, by the central limit theorem, we know that \frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu)(\mu - \bar{Y})
  • #1
Yagoda
46
0

Homework Statement


If [itex] Y_1, Y_2, ...[/itex] are iid with cdf [itex] F_Y[/itex] find a large sample approximation for the distribution of [itex]\log(S^2_N)[/itex], where [itex]S^2_N[/itex] is the sample variance.


Homework Equations





The Attempt at a Solution


The law of large numbers states that for large N [itex]S^2_N[/itex] converges in probability to [itex]\sigma^2[/itex]. However, because I don't know the distribution of the Y's I don't know what [itex]\sigma^2[/itex] is.

Also [itex]\log S^2_N = \log(\frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu) ^2) = \log(\frac{1}{N}) + \log(\sum_{i=1}^{N} (Y_i - \mu) ^2)[/itex], but I am not sure if this helps me find an approximation for the distribution.
 
Physics news on Phys.org
  • #2


To approximate the distribution of \log(S^2_N), we can use the central limit theorem. This theorem states that the sample mean of a large sample from any distribution will be approximately normally distributed. Therefore, we can approximate the distribution of \log(S^2_N) by a normal distribution.

Let's denote the sample mean of the Y's as \bar{Y} = \frac{1}{N} \sum_{i=1}^{N} Y_i. Then, we have \log(S^2_N) = \log(\frac{1}{N} \sum_{i=1}^{N} (Y_i - \bar{Y})^2). We can rewrite this as \log(S^2_N) = \log(\frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu + \mu - \bar{Y})^2). By expanding the square and using the fact that \frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu) = \sigma^2, we get \log(S^2_N) = \log(\frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu)^2 + \frac{1}{N} \sum_{i=1}^{N} (\mu - \bar{Y})^2 + \frac{2}{N} \sum_{i=1}^{N} (Y_i - \mu)(\mu - \bar{Y})).

Now, as N \to \infty, the first and second terms on the right-hand side approach \sigma^2 and 0, respectively. Also, the third term approaches 0 since \mu is a constant and \bar{Y} is the sample mean. Therefore, we can approximate \log(S^2_N) by \log(\sigma^2) = \log(\sigma^2) + \log(\frac{1}{N} \sum_{i=1}^{N} (Y_i - \mu)^2) + \log(\frac{2}{N} \sum_{i=1}^{N} (Y_i - \mu)(\mu - \bar{Y})).

Using the properties of logarithms, we can rewrite this as \log(S^2_N) =
 

FAQ: Distribution of Log of Variance

What is the distribution of log of variance?

The distribution of log of variance is a statistical distribution that describes the spread or variability of a dataset. It is commonly used to analyze financial data, as it can account for large fluctuations in value.

How is the distribution of log of variance calculated?

The distribution of log of variance is calculated by taking the natural logarithm of the variance values of a dataset. This is done by taking the sum of the squared differences between each data point and the mean, and then dividing by the number of data points.

What is the purpose of using the distribution of log of variance?

The purpose of using the distribution of log of variance is to transform a highly skewed dataset into a more normally distributed one. This allows for easier interpretation and analysis of the data, as well as better comparison between different datasets.

How does the distribution of log of variance differ from other distributions?

The distribution of log of variance is a transformation of the original dataset, whereas other distributions describe the inherent properties of the dataset itself. Additionally, the distribution of log of variance is commonly used for financial data, while other distributions may be used for different types of data.

Are there any limitations to using the distribution of log of variance?

One limitation of using the distribution of log of variance is that it assumes a normal distribution of the original data. If the data is not normally distributed, the transformed data may not accurately represent the underlying patterns and relationships. Additionally, it may not be appropriate for datasets with extreme outliers, as these can heavily influence the results.

Back
Top