- #1
madilyn
- 13
- 0
I've been figuring out the use of the nonparametric bootstrap and if I understand correctly, this is the procedure:
1. Take an original sample, a vector x = (x1, ..., xn)
2. Generate k vectors, each called a 'bootstrap sample', of the same length as x by random sampling (with replacement) of the original vector, x, e.g. I have b1 = (x3, xn, x1, ..., x13), b2 = (x2, x5, ..., xn) etc.
3. Now I calculate any statistic [itex] \hat{\theta} = f(\bf{x})[/itex] on each bootstrap sample and my bootstrapped statistic is the mean of the statistic across the distribution and my confidence intervals on that bootstrapped statistic can be found using the inverse cdf of a normal distribution.
If everything is correct above, I have two questions:
i. How do I determine the number of bootstrap samples to take, k? Is there a principled way to determine this? Without this, I would just have to keep repeating the same procedure with increasing k until there's some kind of convergence on the mean [itex] \bar{\hat{\theta}} [/itex]? But this seems computationally taxing.
ii. What assumptions must be correct for this procedure to work? I'm guessing that [itex] \hat{\theta} [/itex] must have finite variance? What else?
1. Take an original sample, a vector x = (x1, ..., xn)
2. Generate k vectors, each called a 'bootstrap sample', of the same length as x by random sampling (with replacement) of the original vector, x, e.g. I have b1 = (x3, xn, x1, ..., x13), b2 = (x2, x5, ..., xn) etc.
3. Now I calculate any statistic [itex] \hat{\theta} = f(\bf{x})[/itex] on each bootstrap sample and my bootstrapped statistic is the mean of the statistic across the distribution and my confidence intervals on that bootstrapped statistic can be found using the inverse cdf of a normal distribution.
If everything is correct above, I have two questions:
i. How do I determine the number of bootstrap samples to take, k? Is there a principled way to determine this? Without this, I would just have to keep repeating the same procedure with increasing k until there's some kind of convergence on the mean [itex] \bar{\hat{\theta}} [/itex]? But this seems computationally taxing.
ii. What assumptions must be correct for this procedure to work? I'm guessing that [itex] \hat{\theta} [/itex] must have finite variance? What else?