- #1
fog37
- 1,569
- 108
- TL;DR Summary
- sampling theory and Inference
In inferential statistics, we have a large population, collect data from it to get random sample of size ##n##, and infer the population parameters from that single sample.
I read that the random sample can be interpreted as the collection of the ##n## realizations of a single random variable ##X##. For example, the height ##H## of individuals in a population can be define as a random variable and the height of each individual in the random sample is a realization of the r.v. However, a more correct interpretation of a random sample is the following: each element of random sample, for example the 5 heights ##[6, 5.4, 6.1, 5.5, 6.4]##, is the realization of a different random variables. So the random sample is the realization of a random vector, a sequence of i.i.d. random variables ##[X_1, X_2, X_3, X_4, X_5]## with a joint probability distribution ##f(x_1, x_2, x_3, x_4, x_5)##. Why is this the correct interpretation of the random sample and not the first one with a single r.v.? Are the two interpretations somehow equivalent to each other? How?
When we perform regression analysis on some random sample of data, are we dealing with a pair of random variables, ##X## and ##Y##, i.e. a 2D random vector ##Z=(X,Y)##? Or with two random vectors, ##X=[X_1, X_2, X_3, X_4, X_5]## and ##Y= [Y_1, Y_2, Y_3, Y_4, Y_5]## where each value of x and each value of y are realizations of different random variable X and different random variable Y?
Thank you as always for any comment and correction.
I read that the random sample can be interpreted as the collection of the ##n## realizations of a single random variable ##X##. For example, the height ##H## of individuals in a population can be define as a random variable and the height of each individual in the random sample is a realization of the r.v. However, a more correct interpretation of a random sample is the following: each element of random sample, for example the 5 heights ##[6, 5.4, 6.1, 5.5, 6.4]##, is the realization of a different random variables. So the random sample is the realization of a random vector, a sequence of i.i.d. random variables ##[X_1, X_2, X_3, X_4, X_5]## with a joint probability distribution ##f(x_1, x_2, x_3, x_4, x_5)##. Why is this the correct interpretation of the random sample and not the first one with a single r.v.? Are the two interpretations somehow equivalent to each other? How?
When we perform regression analysis on some random sample of data, are we dealing with a pair of random variables, ##X## and ##Y##, i.e. a 2D random vector ##Z=(X,Y)##? Or with two random vectors, ##X=[X_1, X_2, X_3, X_4, X_5]## and ##Y= [Y_1, Y_2, Y_3, Y_4, Y_5]## where each value of x and each value of y are realizations of different random variable X and different random variable Y?
Thank you as always for any comment and correction.