Statistics: independently distributed mean and variance

In summary: The expectation value is linear so E(X+Y) is always equal to E(X)+E(Y) regardless of how X and Y are distributed. The reason you are not allowed to do E(XY) = E(X^2) is that, although they are distributed in the same way, X and Y are different stochastic variables. In any given outcome, X and Y may take on different values, while in X^2 the value of X is always fully correlated with the value of...the variance of X. The correlation coefficient is a measure of the linear relationship between two variables, not their individual variances.
  • #1
joemama69
399
0

Homework Statement



Math and verbal SAT scores are each N(500, 10000)

1)If the math and verbal SAT scores were independently distributed, which is not the case, then what would be the distribution of the overall SAT scores? Find its mean and variance.


Homework Equations





The Attempt at a Solution



So originally they are not independent variables, yet the share the same mean and variance. So, if we now assume they are independent, the mean and variance for the two would still be the same, therefore the answer is still N(500,10000). Is this correct?
 
Physics news on Phys.org
  • #2
joemama69 said:

Homework Statement



Math and verbal SAT scores are each N(500, 10000)

1)If the math and verbal SAT scores were independently distributed, which is not the case, then what would be the distribution of the overall SAT scores? Find its mean and variance.


Homework Equations





The Attempt at a Solution



So originally they are not independent variables, yet the share the same mean and variance. So, if we now assume they are independent, the mean and variance for the two would still be the same, therefore the answer is still N(500,10000). Is this correct?

How is the overall SAT score determined from the math and verbal scores? If it is their sum or their arithmetic average, then NO, you are incorrect.
 
  • Like
Likes 1 person
  • #3
"How is the overall SAT score determined from the math and verbal scores? If it is their sum or their arithmetic average, then NO, you are incorrect. "

Im not sure I follow... I do not know how the OVERALL scores are determined. I gave the problem word for word as it was given.

Are you saying I should treat it as say...

X = Math where X~N(500,10000), Y = Verbal where Y~N(500,10000)

and then I find E(X+Y) and V(X+Y)?

E(X+Y) = E(X)+E(Y) = 500+500=1000

V(X+Y) = V(X) + V(Y) +2cov(X,Y) where X&Y are independent so cov(X,Y) = 0
... = 10000+10000=20000

?
 
  • #4
Something like that, yes. Assuming that the total score is the sum, otherwise you need to do the corresponding thing to whatever other formula is used.
 
  • Like
Likes 1 person
  • #5
Ok, thanks...

The question continues...

2) Next assume that the correlation coefficient between the math and verbal scores is .75, Find the mean and variance of the resulting distribution

so... I got .75 = cov(X,Y)/(sigma_x * sigma_y) = [E(XY) - E(X)E(Y)]/(sigma_x * sigma_y)

where x & y are from the same distributions so X = Y...

= [E(X^2) - (E(X))^2]/(sigma_x^2) = V(X)/sigma_x^2 = V(X)/V(X)=1 = incorrect?
 
  • #6
That ##E(X) = E(Y)## does not imply that ##E(XY) = E(X^2)## (consider the case where they were independent variables). Instead, I suggest solving for the covariance and inserting it instead of 0 in your original expression in post #3.
 
  • #7
Orodruin said:
That ##E(X) = E(Y)## does not imply that ##E(XY) = E(X^2)## (consider the case where they were independent variables). Instead, I suggest solving for the covariance and inserting it instead of 0 in your original expression in post #3.

So cov(X,Y) = E(XY) - E(X)E(Y)

I looked through my book and searched the web but I do not see how to solve E(XY). I know E(XY) = x*y*p(x,y) but how do I do it when all I know is that X & Y ~ N(500,10000)?
 
Last edited:
  • #8
The point is that you do not need to compute E(XY). You have:

joemama69 said:
I got .75 = cov(X,Y)/(sigma_x * sigma_y)

as well as

V(X+Y) = V(X) + V(Y) + 2 cov(X,Y)
 
  • #9
Orodruin said:
The point is that you do not need to compute E(XY). You have:
as well as

V(X+Y) = V(X) + V(Y) + 2 cov(X,Y)
I must be missing something... My thought was I solve V(X+Y) for cov(X,Y) and then plug it into my .75 = ... equation.

cov(X,Y) = [V(X+Y) - V(X) - V(Y)] = V(X+Y) - 20000

.75 = [V(X+Y) - 20000]/(sigma_x*sigma_y) = V(X+Y) - 20000/(100*100)

therefore V(X+Y) = .75(10000)+20000 = 27500 I feel like I am off track?

27500 = V(X) + V(Y) + 2 cov(X,Y) = 20000 + 2cov(X,Y)

cov(X,Y) = (27500-20000)/2 = 3750

I don't think I should be using the the "old" variance to find the new mean and variance... how does this end of giving me a new mean and variance
 
  • #10
joemama69 said:
I must be missing something... My thought was I solve V(X+Y) for cov(X,Y) and then plug it into my .75 = ... equation.

cov(X,Y) = [V(X+Y) - V(X) - V(Y)] = V(X+Y) - 20000

.75 = [V(X+Y) - 20000]/(sigma_x*sigma_y) = V(X+Y) - 20000/(100*100)

therefore V(X+Y) = .75(10000)+20000 = 27500 I feel like I am off track?

Why? You just computed the variance of X+Y which was the aim ... (although missing a factor of two when solving for the covariance)
The more straightforward way would have been to solve for the covariance first:
$$
0.75 = \frac{{\rm cov}(X,Y)}{\sqrt{V(X)V(Y)}}= \frac{{\rm cov}(X,Y)}{V(X)} \quad \Longrightarrow \quad {\rm cov}(X,Y) = 0.75 V(X) = 7500
$$
It follows that
$$
V(X+Y) = V(X) + V(Y) + 2{\rm cov}(X,Y) = 20000 + 15000 = 35000
$$
 
  • #11
Ohhh okay, I don't think I really understood what the question was asking. So the question is asking for the TOTAL or COMBINED score distribution given that correlation coefficient is .75, which is why I solved for V(X+Y).

Soo then what about the E(X+Y), is that simply E(X) + E(Y) = 1000, because the correlation coefficient doesn't really involve the expected values. Probably not the best way of explaining it but am I on the right track.
 
  • #12
The expectation value is linear so E(X+Y) is always equal to E(X)+E(Y) regardless of how X and Y are distributed. The reason you are not allowed to do E(XY) = E(X^2) is that, although they are distributed in the same way, X and Y are different stochastic variables. In any given outcome, X and Y may take on different values, while in X^2 the value of X is always fully correlated with the value of X.
 

FAQ: Statistics: independently distributed mean and variance

What is the meaning of "independently distributed" in statistics?

In statistics, independently distributed means that the data points are not related or influenced by each other. This means that the occurrence of one data point does not affect the occurrence of another data point, and the data points are not part of a larger group or pattern.

Why is it important to understand the mean and variance of independently distributed data?

Understanding the mean and variance of independently distributed data is important because it allows us to analyze and interpret the data accurately. The mean gives us a measure of central tendency, while the variance measures the spread or variability of the data. These measures help us to understand the characteristics and patterns of the data, and make informed decisions based on the data.

How do you calculate the mean and variance of independently distributed data?

To calculate the mean of independently distributed data, you add all the data points together and divide by the total number of data points. To calculate the variance, you first find the mean, then subtract each data point from the mean, square the differences, add them all together, and finally divide by the total number of data points.

Can the mean and variance of independently distributed data change over time?

Yes, the mean and variance of independently distributed data can change over time. This can happen if there are changes in the underlying processes that generate the data, or if there are external factors that influence the data. It is important to regularly monitor and update the mean and variance to ensure accurate analysis and decision-making.

How can the mean and variance of independently distributed data be used in statistical tests?

The mean and variance of independently distributed data can be used in statistical tests to determine the significance of differences between groups or variables. For example, the t-test uses the means and variances of two independent samples to determine if there is a significant difference between the two groups. In ANOVA, the variances of multiple independent groups are compared to determine if there is a significant difference between any of the groups.

Back
Top