Correlations with combined sums and products

  • Thread starter mcorazao
  • Start date
  • Tags
    Sums
In summary, the conversation is about building a probability-related software package that requires a theoretical framework to deal with less common issues. The main focus is on how to perform symbolic calculations on random variables with sum and product operations, taking into account arbitrary correlations between the variables. The conversation also includes discussions on handling correlations between sums and products, and the challenges of linearizing the log function in correlation calculations. The speaker is seeking guidance and a theoretical framework to solve these mathematical problems.
  • #1
mcorazao
8
0
I was trying to build a probability-related software package and needed to have a theoretical framework to deal with some less common issues (i.e. stuff that you don't find in the average textbook). I was hoping that somebody could give me pointers as to where to find the proper formulae.

Basically I am trying to figure out how to do arbitrary (symbolic) calculations on random variables which involve both sum and product operations fully taking into account arbitrary correlations between the variables. If I have the correlation coefficients of the variables then obviously summing them is easy. Also if I have the correlation coefficients of their logs multiplying them is easy (since a product involves summing the logs and the log/exponent formulae for normals are well defined). And, of course, determining the resulting correlations between the sums or products and the original variables is straightforward.

So where things are more murky (for me at least) is when I go beyond these simple cases. If, say, I want to multiply two sums, how do I handle the correlations? E.g. Let's say z = (a + b) * (c + d). If I know the correlation of a and b I can easily determine the correlation of their sum to a or b. Similarly c + d is easy. But now if I multiply those two sums, how do I determine the correlation of z to the original variables a, b, c, and d? And if all the variables originally had some non-zero correlation how do I take that into account in the product since the result of the original summations gave me the correlation coefficients wrt the original variables but not the correlations of the logarithms which is what I would need for the multiplication?

Can anybody give me a clue how to start figuring this out?

Thanks.
 
Physics news on Phys.org
  • #2
Can't you apply the log rule to x*y where x = a+b and y = c+d?
 
  • #3
Again, how? If I do

x*y = e^(log(x) + log(y))​

Then I have to know the correlation of log(x) and log(y). How do I figure that out?
 
  • #4
I was going with your statement "if I have the correlation coefficients of their logs multiplying them is easy." Now I realize that you don't have Corr(Log x, Log y) and the problem looks really difficult, because Corr is a linear operator and Log is a nonlinear function. Which tells me you need to somehow linearize the log function. E.g. if x is near 1, then Log(x) is approximately equal to x - 1. Maybe you can devise some kind of a scaling that would result in [itex]\overline\xi[/itex] = 1 where [itex]\xi[/itex] is the scaled version of x.
 
Last edited:
  • #5
Well, there are all sorts of approximations I can devise but they would not be precise. Since log(x) and log(y) are both normal (i.e. log of normal is still normal) there should be a simple correlation coefficient that can be calculated and used. I presume, then, there should be a closed form solution for calculating it although I don't know what that would be.

One observation: If the correlation of x and y was zero, then the correlation of log(x) and log(y) is zero. Also if x and y have the same mean/stdev and their correlation is unity, then the correlation of log(x) and log(y) is also unity. That would seem to point toward the correlations always being equal although intuitively that doesn't sound right.
 
Last edited:
  • #6
mcorazao said:
log of normal is still normal
Wrong. Log of Lognormal is normal.
If the correlation of x and y was zero, then the correlation of log(x) and log(y) is zero
I am not sure that's necessarily the case. Similarly for the unit correlation case.
 
  • #7
EnumaElish said:
Wrong. Log of Lognormal is normal.
Actually you're right. I'm not thinking clearly.
EnumaElish said:
I am not sure that's necessarily the case. Similarly for the unit correlation case.
This one you're not thinking clearly about. If x and y are uncorrelated then it is not possible that their logs could have any correlation. The log transformation does not introduce any component that they could have in common.
If they are perfectly correlated and they have the same mean and standard deviation then they are, by definition, exactly the same number. Therefore their logs are exactly the same number. Therefore their logs are perfectly correlated.
These are the only obvious cases that I can think of at the moment. Any other cases seem to require a more elaborate proof.
 
  • #8
mcorazao said:
If x and y are uncorrelated then it is not possible that their logs could have any correlation. The log transformation does not introduce any component that they could have in common.
The point is, Log is a nonlinear transformation; but corr is a linear operation. In general properties of linear operators are not invariant under a nonlinear transformation.

A trivial example is Corr(a, b) = 0 where some elements of a and/or b are zero (or negative). Then Corr(Log(a), Log(b)) is undefined.

For similar examples, see: http://en.wikipedia.org/wiki/Correlation#Correlation_and_linearity
 
Last edited:
  • #9
EnumaElish said:
The point is, Log is a nonlinear transformation; but corr is a linear operation. In general properties of linear operators are not invariant under a nonlinear transformation.
Of course, that's what I said. But that doesn't prove that there isn't a direct, even linear, relationship between the variable correlations and the log correlations. Seems unlikely but as yet I haven't found a proof one way or the other.

EnumaElish said:
A trivial example is Corr(a, b) = 0 where some elements of a and/or b are zero (or negative). Then Corr(Log(a), Log(b)) is undefined.

Well, by that argument log of normal is undefined period since any normal distribution has negative values.

Anyway, point is, I need a theoretical framework to do the calculation. I know there are other software packages that do this sort of thing without doing Monte Carlo analysis but I don't know what the math behind their calculations is.

Thanks, BTW, for the interest.
 
  • #10
mcorazao said:
by that argument log of normal is undefined period
Precisely.
 
  • #11
EnumaElish said:
Precisely.

Well, not "precisely". The reality is that it is not undefined. It is just not real. E.g.

ln(-1) = i*3.1415927

Similarly the correlation is not undefined although it could perhaps have imaginary components.

In other words, the question is not moot it is just "complex" (pun intended).
 
  • #12
You are right; that was a non sequitur. Still, my point about non-linearity applies. Just because vectors a and b are uncorrelated does not mean their nonlinear functions cannot be correlated.

Here is a numerical example:
x = {
0.147281700000000,
0.230993647671506,
0.427692041391026,
0.079822616900000,
0.291048299000000,
0.185000000000000,
0.088631713672936,
0.266815063276460,
0.182298600000000,
0.850679700000000
};
y = {
0.872438309154607,
0.186970421455947,
0.738597327308731,
0.598236593500000,
0.462298740000000,
0.330000000000000,
0.115598225897281,
0.107657376896975,
0.207345000000000,
0.325996428800000
};
Corr(x,y) [itex]\approx[/itex] 0 ( = 5.1841*10^-12)
But Corr(Log(x), Log(y)) = 0.0873581.
 
Last edited:
  • #13
You are absolutely right. The non-linearity can create these oddball situations. When I work these problems out I normally think of the correlation as distinguishing between a single correlated component and a single uncorrelated component. For linear operations this approach is valid. But when you have non-linear operations then this simplification breaks down (usually accurate but not necessarily).

Curiouser and curiouser ...

Doesn't get me any closer to an answer, though. :-)
 
  • #14
Somehow, you need to linearize.
 

FAQ: Correlations with combined sums and products

What is the purpose of finding correlations with combined sums and products?

The purpose of finding correlations with combined sums and products is to identify potential relationships between two or more variables. By combining the sums and products, we can determine if there is a linear or non-linear relationship between the variables.

How do you calculate the correlation coefficient for combined sums and products?

To calculate the correlation coefficient for combined sums and products, you first need to calculate the sums and products of the variables. Then, you can use the formula for Pearson's correlation coefficient, which is r = (nΣxy - ΣxΣy) / [√(nΣx^2 - (Σx)^2)√(nΣy^2 - (Σy)^2)].

Can correlations with combined sums and products determine causation?

No, correlations with combined sums and products can only determine the strength and direction of a relationship between variables. It cannot determine causation, as there may be other factors at play that influence the relationship.

What is the range of values for the correlation coefficient?

The range of values for the correlation coefficient is -1 to 1. A correlation coefficient of -1 indicates a perfect negative relationship, 0 indicates no relationship, and 1 indicates a perfect positive relationship.

How can outliers affect the results of correlations with combined sums and products?

Outliers can significantly affect the results of correlations with combined sums and products. If there are extreme values in the data, they can pull the correlation coefficient towards either -1 or 1, even if there is no real relationship between the variables.

Back
Top