- #1
fog37
- 1,569
- 108
- TL;DR Summary
- Statistical modeling and relationship between 3 random variables and 2 random variables
In statistical modeling, the goal is to come up with a model that describes the relationship between random variables. A function of randoms variables is also a random variable.
We could have three random variables, ##Y##, ##X##, ##\epsilon## with the r.v. ##Y## given by ##Y=b_1 X + b_2 + \epsilon## where ##b_1, b_2## are constant. The expectation value of ##Y## is simply ##E[Y|X] = b_1 E[X]+ b_2 + E[\epsilon]## with ##E[\epsilon]=0##. This is what simple linear regression is about. A note: an author wrote ##E[Y;X]## instead of ##E[Y|X]##, stating that it is not really a conditional expectation value, but I am not sure about the difference...
But in most textbooks, the variable ##X## is generally said to not be a random variable but a deterministic one...Why? Clearly, that would simplify the expectation value of ##Y## to ##E[Y|X] = b_1 X+ b_2##.
On the other hand, when ##X## is also a r.v., we need to know its expectation value ##E[X]## in order to proceed. How would we get ##E[X]## from the sample data?
For example, in practice, if we asked 50 random people, out of a population, their height ##Y## and age ##X##, both age and height would be r.v. , correct? That seems the most common scenario for linear regression. What kind of situation would instead have ##X## to be deterministic? Maybe if we search from the beginning for people of specific ages and then ask them their height? In that case, we planned what the values of the variable ##X## would be...But in many other cases, it seems that both variables would be commonly random. How would we then handle
We could have three random variables, ##Y##, ##X##, ##\epsilon## with the r.v. ##Y## given by ##Y=b_1 X + b_2 + \epsilon## where ##b_1, b_2## are constant. The expectation value of ##Y## is simply ##E[Y|X] = b_1 E[X]+ b_2 + E[\epsilon]## with ##E[\epsilon]=0##. This is what simple linear regression is about. A note: an author wrote ##E[Y;X]## instead of ##E[Y|X]##, stating that it is not really a conditional expectation value, but I am not sure about the difference...
But in most textbooks, the variable ##X## is generally said to not be a random variable but a deterministic one...Why? Clearly, that would simplify the expectation value of ##Y## to ##E[Y|X] = b_1 X+ b_2##.
On the other hand, when ##X## is also a r.v., we need to know its expectation value ##E[X]## in order to proceed. How would we get ##E[X]## from the sample data?
For example, in practice, if we asked 50 random people, out of a population, their height ##Y## and age ##X##, both age and height would be r.v. , correct? That seems the most common scenario for linear regression. What kind of situation would instead have ##X## to be deterministic? Maybe if we search from the beginning for people of specific ages and then ask them their height? In that case, we planned what the values of the variable ##X## would be...But in many other cases, it seems that both variables would be commonly random. How would we then handle