Sampling Distribution of Mean for Discrete Uniform Distribution with Replacement

In summary: OP is doing. He is (sort of) doing a "jackknife" approach, and trying to get at the "theoretical variance" instead of the "sample variance".Is it correct that I combine the samples like that? for some reason my variance is coming out negativeYes, you are combining them correctly. You may have made a calculation error somewhere. Check your work by hand or with a calculator to make sure all your numbers are correct. Also, make sure you are using the correct formula for the sample standard deviation, as it is slightly different from the population standard deviation formula.Yes, you are combining them correctly. You may have made a calculation error somewhere. Check your work by hand or with
  • #1
toothpaste666
516
20

Homework Statement


suppose that 50 random samples of size n = 10 are to be taken from a population having the discrete uniform distribution
f(x) = 1/10 for x = 0,1,2,...,9
0 elsewhere
sampling is with replacement so that we are sampling from an infinite population. we get 50 random samples whose means are ... (they list 50 means)

suppose that we convert the 50 samples into 25 samples of size n = 20 by combining the first two, the next two and so on, find the means of these samples and calculate their mean and their standard deviation. compare this mean and this standard deviation with the corresponding values expected in accordance with following theorem: if a random sample of size n is taken from a population having the mean μ and variance σ^2 , then X is a random variable whose distribution has the mean μ. for samples from infinite populations the variance of this distribution is σ^2/n

The Attempt at a Solution


I just want to make sure my method is correct. for each of the two means i am "combining" I think what they mean by combining is to find the mean of the two means to be combined. So if the first two means out of the 50 that they list are 4.4 and 3.2 , i combine them by finding the mean (4.4+3.2)/2 = 3.8 and now this is a mean of a sample of size 20 instead of 10. Once I combine the 50 samples into 25 samples this way, I find the mean and standard deviation of the 25 samples using the formulas μ = Σx/n and σ^2 = Σ(x-μ)^2/(n-1) . Then they want me to compare these with the ones I get from the theorem. I find these by using
μ = Σ(from 0 to 9) x(1/10) = 4.5
and
σ^2 = Σ(from 0 to 9)(x-4.5)^2(1/10) = 8.25
since n = 20 the variance is
8.25/20 = .4125

am i doing this the right way?
 
Physics news on Phys.org
  • #2
$$\mu = \sum\limits_{x=0}^9 \frac{x}{10} = 4.5$$

$$\sigma^2 = \sum\limits_{x=0}^9 \frac{(x-4.5)^2}{10} = 8.25$$

Where are you getting the 10 in the denominator from? You have 25 numbers in your data table...
 
  • #3
krebs said:
$$\mu = \sum\limits_{x=0}^9 \frac{x}{10} = 4.5$$

$$\sigma^2 = \sum\limits_{x=0}^9 \frac{(x-4.5)^2}{10} = 8.25$$

Where are you getting the 10 in the denominator from? You have 25 numbers in your data table...
krebs said:
$$\mu = \sum\limits_{x=0}^9 \frac{x}{10} = 4.5$$

$$\sigma^2 = \sum\limits_{x=0}^9 \frac{(x-4.5)^2}{10} = 8.25$$

Where are you getting the 10 in the denominator from? You have 25 numbers in your data table...

He has 10 x 50 = 500 numbers ##X_1, X_2, \ldots, X_{500}##, with each ##X_i## being an independent sampled value from UNIF{0,1,...,9}. I think he is taking ##\mu## and ##\sigma^2## to be ## EX_i## and ##\text{Var} X_i##, which do, indeed, have '10' in the denominator. Then, he is computing
[tex] \text{Var} \left( \frac{1}{20} \sum_{i=1}^{20} X_i \right) = \sigma^2/20 [/tex]
I don't think the wording of the question is crystal clear, but his interpretation is one defensible reading.
 
Last edited:
  • Like
Likes toothpaste666
  • #4
Is it correct that I combine the samples like that? for some reason my variance is coming out negative
 
  • #5
Ray Vickson said:
He has 10 x 50 = 500 numbers ##X_1, X_2, \ldots, X_{500}##, with each ##X_i## being an independent sampled value from UNIF{0,1,...,9}. I think he is taking ##\mu## and ##\sigma^2## to be ## EX_i## and ##\text{Var} X_i##, which do, indeed, have '10' in the denominator. Then, he is computing
[tex] \text{Var} \left( \frac{1}{20}\sum_{i=1}^{20} X_i \right) = \sigma^2/20 [/tex]
I don't think the wording of the question is crystal clear, but his interpretation is one defensible reading.

Oh, that could be. The way I read it is that he was just given a list of 50 means, and he needed to calculate the mean and standard deviation of that list, and then repeat for a list of 25 made by combining every set of two means. I find it hard to believe that his data table has 500 numbers for him to deal with.
 
Last edited:
  • #6
krebs said:
Oh, that could be. The way I read it is that he was just given a list of 50 means, and he needed to calculate the mean and standard deviation of that list, and then repeat for a list of 25 made by combining every set of two means. I find it hard to believe that his data table has 500 numbers for him to deal with.

Note the typo in the above; I have corrected it in Post # 3. I should have written Var(1/20 sum X_i), not Var (sum X_i).
 
  • #7
krebs said:
Oh, that could be. The way I read it is that he was just given a list of 50 means, and he needed to calculate the mean and standard deviation of that list, and then repeat for a list of 25 made by combining every set of two means. I find it hard to believe that his data table has 500 numbers for him to deal with.

No, it does not: he was not GIVEN 500 numbers. He was given 50 numbers, each of which is a sample-mean of size 10. However, the data was stated to come from a uniform distribution, and of course came from massaging 500 numbers, 10 at a time.

As I said, the wording of the question (if accurately reported) leaves a lot of room for interpretation. Personally, I would NOT have used the OP's interpretation, because it would have made more sense to me to look at his bundle of 50 numbers ##\{ \bar{x}_i, i=1,2, \ldots, 50 \}## as the data themselves, and to look not at the "theoretical" variance, but rather at the "sample variance, that would be given by
[tex] \text{Sample Var} = \frac{1}{49} \sum_{i=1}^{50} (\bar{x}_i - \bar{\bar{x}})^2, [/tex]
where ##\bar{\bar{x}} = \sum_{i=1}^{50} \bar{x}_i / 50## is the sample mean of the ##\{ \bar{x}_i \}## data. That would have given rise to the thornier question of what happens when you combine the data into ##y_1 = (\bar{x}_1+\bar{x}_2)/2, \: y_2 = (\bar{x}_3 + \bar{x}_4)/2, \ldots, \: y_{25} = (\bar{x}_{49} + \bar{x}_{50})/2##, and then try to get an appropriate formula for the sample variance of the ##\{ y_j \}## data in terms of sample variances associated with the original data. For example, when we deal with the "theoretical" variance, it does not matter if we combine the x's into y's and then take the variance, because the outcome will be the same either way. However, a question arises whether this remains true of "sample" variances rather than "theoretical" variances.
 
  • #8
Sorry toothpaste, I misread your question. I see what you are trying to do now. For this distribution,
μ = 4.5
σ2 = 8.25if you take infinite n=1 samples of χ, then σ2 = 8.25
If you take infinite n=10 samples of χ, then σ2 = 0.825
If you take infinite n=20 samples of χ, then σ2 = 0.4125

So, you know your expected σ2 of χ at different sizes of n if you sample an infinite number of times.

Now you have to verify it using your 50 samples of χ where n = 10, and your 25 samples where n=20. Can you calculate the σ2 for your 50 and 25 observations?
 
  • #9
I went ahead and modeled this in excel for you, so you can see that it is true. See that as my number of samples of the mean increases, the variance approaches the expected values (which are based on the size of each of those samples). You only have 50 samples for n=10, and 25 for n=20, so your variances should be a bit different than the expected values, unless your textbook massaged the numbers to demonstrate this point.
 

Attachments

  • stats.png
    stats.png
    65.5 KB · Views: 393
  • #10
It was done for the 50 samples as an example in my book but they didn't really show the work they just said the answers and compared them. The exercise says to combine the 50 from the example into 25 and do it for those. after combining them these are the 25 values I get:

3.8, 4.3, 4.3, 5.1, 4.9
4.2, 4.1, 4.2, 4.9, 4.2
3.0, 5.2, 4.3, 4.5, 3.8
5.4, 5.6, 5.7, 4.0, 5.1
3.2, 4.5, 3.4, 5.0, 4.5

first I calculated

Σxi = 111.2
and
Σxi^2 = 506.72

then the mean is
x = Σxi/n = 111.2/20 = 5.56

and the variance is
s^2 =[Σxi^2 - (Σxi)^2/n]/(n-1) = [506.72 - (111.2)^2/20]/19 = -5.87

but I know this can't be right because it is negative and that would mean the standard deviation is complex. I know I did something wrong but I can't figure out what.
 
  • #11
toothpaste666 said:
It was done for the 50 samples as an example in my book but they didn't really show the work they just said the answers and compared them. The exercise says to combine the 50 from the example into 25 and do it for those. after combining them these are the 25 values I get:

3.8, 4.3, 4.3, 5.1, 4.9
4.2, 4.1, 4.2, 4.9, 4.2
3.0, 5.2, 4.3, 4.5, 3.8
5.4, 5.6, 5.7, 4.0, 5.1
3.2, 4.5, 3.4, 5.0, 4.5

first I calculated

Σxi = 111.2
and
Σxi^2 = 506.72

then the mean is
x = Σxi/n = 111.2/20 = 5.56

and the variance is
s^2 =[Σxi^2 - (Σxi)^2/n]/(n-1) = [506.72 - (111.2)^2/20]/19 = -5.87

but I know this can't be right because it is negative and that would mean the standard deviation is complex. I know I did something wrong but I can't figure out what.
You need to be dividing by 25 and 24, not by 20 and 19. That will leave you with a positive variance.
 
Last edited:
  • Like
Likes toothpaste666
  • #12
Variance is a statistic you calculate based off of a set of numbers with no other context besides the numbers. Why are you using 20 and 19?
 
  • #13
I got mixed up with the sample size n=20 for the 25 samples. I see the mistake now. thank you
 

FAQ: Sampling Distribution of Mean for Discrete Uniform Distribution with Replacement

What is a sampling distribution of mean?

A sampling distribution of mean is a theoretical distribution that shows all the possible sample means that could be obtained from a population. It is based on the concept of taking repeated samples from a population and calculating the mean of each sample.

Why is the sampling distribution of mean important?

The sampling distribution of mean is important because it allows us to make inferences about a population based on a sample. It also helps us to understand the variability of sample means and the accuracy of our estimates.

How is the sampling distribution of mean different from a population distribution?

The sampling distribution of mean is different from a population distribution in that it shows the distribution of sample means, while a population distribution shows the distribution of individual values within a population. The sampling distribution of mean is also based on a sample, while a population distribution is based on the entire population.

What factors affect the shape of a sampling distribution of mean?

The shape of a sampling distribution of mean is affected by the sample size and the variability of the population. As the sample size increases, the shape of the distribution becomes more normal. A more variable population will result in a wider and more spread out distribution.

How can the central limit theorem be applied to the sampling distribution of mean?

The central limit theorem states that as the sample size increases, the sampling distribution of mean becomes more normally distributed regardless of the shape of the population distribution. This allows us to use statistical methods that assume a normal distribution, even if the population is not normally distributed.

Back
Top