Sampling theory and random sample

  • I
  • Thread starter fog37
  • Start date
  • Tags
    Sampling
In summary, sampling theory is a framework used to select a subset of individuals from a larger population to make inferences about the population as a whole. A random sample is one where every individual has an equal chance of being selected, minimizing bias and ensuring that the sample accurately represents the population. This approach is crucial for statistical analysis, allowing researchers to draw valid conclusions and make predictions based on the sampled data.
  • #1
fog37
1,569
108
TL;DR Summary
sampling theory and Inference
In inferential statistics, we have a large population, collect data from it to get random sample of size ##n##, and infer the population parameters from that single sample.

I read that the random sample can be interpreted as the collection of the ##n## realizations of a single random variable ##X##. For example, the height ##H## of individuals in a population can be define as a random variable and the height of each individual in the random sample is a realization of the r.v. However, a more correct interpretation of a random sample is the following: each element of random sample, for example the 5 heights ##[6, 5.4, 6.1, 5.5, 6.4]##, is the realization of a different random variables. So the random sample is the realization of a random vector, a sequence of i.i.d. random variables ##[X_1, X_2, X_3, X_4, X_5]## with a joint probability distribution ##f(x_1, x_2, x_3, x_4, x_5)##. Why is this the correct interpretation of the random sample and not the first one with a single r.v.? Are the two interpretations somehow equivalent to each other? How?

When we perform regression analysis on some random sample of data, are we dealing with a pair of random variables, ##X## and ##Y##, i.e. a 2D random vector ##Z=(X,Y)##? Or with two random vectors, ##X=[X_1, X_2, X_3, X_4, X_5]## and ##Y= [Y_1, Y_2, Y_3, Y_4, Y_5]## where each value of x and each value of y are realizations of different random variable X and different random variable Y?

Thank you as always for any comment and correction.
 
Physics news on Phys.org
  • #2
fog37 said:
I read that the random sample can be interpreted as the collection of the ##n## realizations of a single random variable ##X##. For example, the height ##H## of individuals in a population can be define as a random variable and the height of each individual in the random sample is a realization of the r.v. However, a more correct
Is "more correct" your phrase or theirs? A restriction of the first interpretation is that the population distribution is assumed to be identical. If the intent is to study things like cluster analysis, importance sampling, or stratified sampling, then there is some freedom to say that there are more than one distribution involved in the sample.

CORRECTION: I missed the IID part of the description of the second interpretation. I see no practical difference between the two interpretations.
 
Last edited:
  • #3
FactChecker said:
Is "more correct" your phrase or theirs? A restriction of the first interpretation is that the population distribution is assumed to be identical. If the intent is to study things like cluster analysis, importance sampling, or stratified sampling, then there is some freedom to say that there are more than one distribution involved in the sample.
Well, I have found this interpretation in several places. For example:
1704940604794.png


The population is an infinite set of values drawn from a random variable ##X##. Sampling from a population is the same as repeatedly drawing new values from ##X##. A a random sample of size ##n## is a collection of individual draws from ##X##.

The point seems to be that ##n## independent draws from a random variable ##X## is equivalent to one draw of ##n## i.i.d. random variables ##X_1, X_2,....X_n## Is that really the case? Can you help me appreciate why the two scenarios are equivalent...
 
  • #4
Sorry. I missed the IID part of second interpretation. I see no practical difference between the two. So I wonder where you read that the second interpretation was better.
 
  • Like
Likes fog37
  • #5
FactChecker said:
Sorry. I missed the IID part of second interpretation. I see no practical difference between the two. So I wonder where you read that the second interpretation was better.
Thank you FactChecker for your support. Let me share with you this stats.stackexchange.com answer:
https://stats.stackexchange.com/questions/368492/about-sampling-and-random-variables/368517#368517

The response by shadowtalker is discussed how the 2nd interpretation allows for for the sample statistics to also be random variables, as they are...

So why are the two interpretations really identical? Would you mind sharing your thought process. It is the same random reality but described in two different ways...Is one more technically correct that the other? As mentioned, when we talk about regression analysis, it seems better to keep the random sample of data, each pair of ##x## and ##y## values, are realizations of two random variables ##X## and ##Y## instead of two sequences of random variables, one for the ##x## values and one of the ##y## values...

For example, in the case of tossing a die multiple times, the outcome of each toss is the realization of a single random variable OR are the outcomes are the realizations of different random variables...

Thank you!

Thank you!
 
  • #6
fog37 said:
Thank you FactChecker for your support. Let me share with you this stats.stackexchange.com answer:
https://stats.stackexchange.com/questions/368492/about-sampling-and-random-variables/368517#368517

The response by shadowtalker is discussed how the 2nd interpretation allows for for the sample statistics to also be random variables, as they are...
I agree. It is a distinction that I have probably been careless about in the past. There is a difference between a sample, which is an already collected set of data, versus the random variables the gave you that sample. I think it is standard to use lower case (##x_i##) for the data and upper case (##X_i##) for the random variables.
fog37 said:
So why are the two interpretations really identical? Would you mind sharing your thought process. It is the same random reality but described in two different ways...Is one more technically correct that the other?
IMO, one situation where the distinction is significant is if you talk about collecting data in stages so that some data is collected but other data is not yet collected and still a random variable. You might see this in stopping problems. Suppose that you were doing an experiment where collecting data was expensive or difficult and you need to decide if you should collect more data. Also, I think that the distinction would be significant in many Bayesian methods with prior and post distributions. Also bootstrap methods.
I have no real experience with these types of problems and will have to leave this discussion to others.
 

FAQ: Sampling theory and random sample

What is sampling theory?

Sampling theory is a branch of statistics that deals with the principles and methods used for selecting a subset (sample) from a larger population to estimate characteristics of the entire population. It provides guidelines on how to draw samples in a way that ensures the sample accurately represents the population, allowing for valid inferences and predictions.

What is a random sample?

A random sample is a subset of individuals chosen from a larger population, where each individual has an equal chance of being selected. This method helps to ensure that the sample is representative of the population, minimizing bias and enabling reliable statistical analysis.

Why is random sampling important?

Random sampling is important because it helps to eliminate selection bias, ensuring that the sample accurately reflects the population. This allows for valid generalizations and conclusions to be drawn about the population based on the sample data. It also helps to ensure the reliability and validity of statistical inferences.

What are the different types of random sampling?

There are several types of random sampling, including simple random sampling, stratified random sampling, systematic sampling, and cluster sampling. Simple random sampling involves selecting individuals purely by chance. Stratified random sampling involves dividing the population into subgroups (strata) and then taking a random sample from each stratum. Systematic sampling involves selecting every nth individual from a list. Cluster sampling involves dividing the population into clusters and then randomly selecting entire clusters for the sample.

How do you determine the sample size needed for a study?

The sample size needed for a study depends on several factors, including the desired level of precision, the population size, the variability of the population, and the confidence level required. Statistical formulas and software can help determine the appropriate sample size. Commonly used formulas take into account the margin of error, the standard deviation of the population, and the z-score corresponding to the desired confidence level.

Similar threads

Replies
4
Views
2K
Replies
9
Views
2K
Replies
30
Views
3K
Replies
7
Views
2K
Replies
6
Views
2K
Replies
1
Views
901
Back
Top