Statistical modeling and relationship between random variables

In summary, statistical modeling involves the use of mathematical frameworks to represent and analyze the relationships between random variables. It incorporates techniques such as regression analysis, probability distributions, and hypothesis testing to understand how variables interact and influence one another. This modeling helps to make predictions, infer causal relationships, and quantify uncertainty, thereby facilitating informed decision-making in various fields such as economics, medicine, and social sciences.
  • #1
fog37
1,569
108
TL;DR Summary
Statistical modeling and relationship between 3 random variables and 2 random variables
In statistical modeling, the goal is to come up with a model that describes the relationship between random variables. A function of randoms variables is also a random variable.
We could have three random variables, ##Y##, ##X##, ##\epsilon## with the r.v. ##Y## given by ##Y=b_1 X + b_2 + \epsilon## where ##b_1, b_2## are constant. The expectation value of ##Y## is simply ##E[Y|X] = b_1 E[X]+ b_2 + E[\epsilon]## with ##E[\epsilon]=0##. This is what simple linear regression is about. A note: an author wrote ##E[Y;X]## instead of ##E[Y|X]##, stating that it is not really a conditional expectation value, but I am not sure about the difference...

But in most textbooks, the variable ##X## is generally said to not be a random variable but a deterministic one...Why? Clearly, that would simplify the expectation value of ##Y## to ##E[Y|X] = b_1 X+ b_2##.

On the other hand, when ##X## is also a r.v., we need to know its expectation value ##E[X]## in order to proceed. How would we get ##E[X]## from the sample data?

For example, in practice, if we asked 50 random people, out of a population, their height ##Y## and age ##X##, both age and height would be r.v. , correct? That seems the most common scenario for linear regression. What kind of situation would instead have ##X## to be deterministic? Maybe if we search from the beginning for people of specific ages and then ask them their height? In that case, we planned what the values of the variable ##X## would be...But in many other cases, it seems that both variables would be commonly random. How would we then handle
 
Physics news on Phys.org
  • #2
fog37 said:
The expectation value of ##Y## is simply ##E[Y|X] = b_1 E[X]+ b_2 + E[\epsilon]## with ##E[\epsilon]=0##.
Be careful about this. ##E[Y|X]## is a function of ##X## whereas the right side is just a single number.
fog37 said:
This is what simple linear regression is about. A note: an author wrote ##E[Y;X]## instead of ##E[Y|X]##, stating that it is not really a conditional expectation value, but I am not sure about the difference...
What author? If I am going to say something that is contradicted by the author, then I would like to know what all the surrounding text said.
fog37 said:
But in most textbooks, the variable ##X## is generally said to not be a random variable but a deterministic one...Why?
Clearly, that would simplify the expectation value of ##Y## to ##E[Y|X] = b_1 X+ b_2##.
That equation equation is to be used with a particular value of ##X##. How ##X## got to have that value, whether deterministic or random, does not matter.
fog37 said:
On the other hand, when ##X## is also a r.v., we need to know its expectation value ##E[X]## in order to proceed.
Suppose that ##X_1, X_2, ... , X_n## are random variables, and more statistical analysis needs to be done with it. That is a more complicated situation. Some variables might be correlated, others might be independent. ADDED.(actually you need to do this even if the ##X_i##s ar all deterministic.)
fog37 said:
How would we get ##E[X]## from the sample data?

For example, in practice, if we asked 50 random people, out of a population, their height ##Y## and age ##X##, both age and height would be r.v. , correct? That seems the most common scenario for linear regression. What kind of situation would instead have ##X## to be deterministic?
You would normally collect a sample, ##(x_1,y_1), (x_2,y_3), ..., (x_m,y_m)##. Would it matter whether the ##x_i##s were from a random variable? If you have several input variables, ##X_i##s and want to do more detailed analysis of their relationship with ##Y## then you would first need to address the issue of correlated ##X## variables. ADDED.(actually you need to do this even if the ##X_i##s ar all deterministic.)
 
  • #3
I think this is more a matter of convention than anything. Bayesian statistics tends to treat everything as a random variable and assign it prior probability distributions. So you would model $$y\sim \mathcal N (b_1 X + b_0, \sigma)$$ You would not just treat ##y## as a random variable, but everything else too. Each one would have their own prior distribution.
 
  • #4
I believe the " deterministic" is just one that is not random, such as , the date, say, yearwise. You can control your choice of date when plotting, say, inflation vs year/date, or record high jump vs year. Notice you can regress one random variable vs a deterministic, but finding correlations are not defined.
 
  • #5
Dale said:
I think this is more a matter of convention than anything. Bayesian statistics tends to treat everything as a random variable and assign it prior probability distributions. So you would model $$y\sim \mathcal N (b_1 X + b_0, \sigma)$$ You would not just treat ##y## as a random variable, but everything else too. Each one would have their own prior distribution.
Some authors , maybe frequentists, define correlation in terms of conditional expectation ( E( Y|X), when regressing X against Y). Is this done with the Bayesian approach?
 
  • #6
WWGD said:
Some authors , maybe frequentists, define correlation in terms of conditional expectation ( E( Y|X), when regressing X against Y). Is this done with the Bayesian approach?
I haven’t seen that as a definition, but Bayesians use the same computations to actually calculate correlations as frequentists do.
 
  • Like
Likes WWGD
  • #7
Dale said:
I haven’t seen that as a definition, but Bayesians use the same computations to actually calculate correlations as frequentists do.
Thanks, I was also wondering on whether regressing Y on X is seen, described as the conditional expectation of Y on X , i.e., E[Y|X].
 
  • #8
WWGD said:
Thanks, I was also wondering on whether regressing Y on X is seen, described as the conditional expectation of Y on X , i.e., E[Y|X].
Sort of. It is not just the conditional expectation, but you get the entire conditional distribution. So you can get the expectation of the conditional distribution, but you can also get any other measure such as the variance or anything else you like.
 
  • Like
Likes WWGD

FAQ: Statistical modeling and relationship between random variables

What is statistical modeling?

Statistical modeling is a mathematical framework used to represent, analyze, and predict the relationships between different variables. It involves the use of statistical techniques to create models that can describe the patterns and structures in data. These models help in understanding the underlying processes generating the data and in making predictions about future observations.

What are random variables?

Random variables are quantities whose values are subject to variations due to chance. They can take on different values, each with an associated probability. Random variables can be discrete, taking on specific values, or continuous, taking on any value within a given range. They are fundamental in probability theory and are used to model uncertainty in statistical analyses.

How do you determine the relationship between random variables?

The relationship between random variables can be determined using various statistical methods. Common techniques include correlation analysis, which measures the strength and direction of the linear relationship between two variables, and regression analysis, which models the relationship between a dependent variable and one or more independent variables. Other methods include joint probability distributions and conditional probability distributions.

What is the difference between correlation and causation?

Correlation refers to a statistical association between two variables, indicating that they tend to move together in some way. However, correlation does not imply causation, which means that one variable directly influences the other. Establishing causation requires further analysis, such as controlled experiments or longitudinal studies, to rule out other factors and confirm a cause-and-effect relationship.

What are some common types of statistical models?

Some common types of statistical models include linear regression models, which predict a continuous outcome based on one or more predictor variables; logistic regression models, which predict a binary outcome; and time series models, which analyze data points collected or recorded at specific time intervals. Other models include generalized linear models, mixed-effects models, and survival analysis models, each suited for different types of data and research questions.

Similar threads

Replies
5
Views
2K
Replies
30
Views
3K
Replies
5
Views
1K
Replies
1
Views
942
Replies
1
Views
921
Replies
8
Views
2K
Replies
9
Views
2K
Replies
1
Views
875
Back
Top