- #1
dhiraj
- 4
- 0
From what I have understood about Pearson correlation coefficient I have created a visual illustration, I would like to know if this understanding looks correct.
Say I have a sample with 5 data points:-
x y
8 6
16 8
20 16
28 12
32 20
My goal is to calculate Pearson correlation coefficient between x and y.
So this is how the diagram I created looks like:-
View attachment 6472
I have done appropriate color coding.
So in this case the covariance between x and y is:-
\(\displaystyle cov(x,y) = \frac {\sum d_x d_y}{n-1} \)
\(\displaystyle d_x\) and \(\displaystyle d_y\) are the deviations (not standard deviation) from \(\displaystyle \bar{x}\) and \(\displaystyle \bar{y}\) respectively, these mean lines are shown in the diagram (red line for \(\displaystyle \bar{x}\) and the green line for \(\displaystyle \bar{y}\)).
Pearson correlation coefficient \(\displaystyle r = \frac{cov(x,y)}{S_x S_y} \)
Based on the diagram, standard deviations of x and y are:-
\(\displaystyle S_x = \sqrt{ \frac{\sum d_x^2}{n-1} }\)
\(\displaystyle S_y = \sqrt{ \frac{\sum d_y^2}{n-1} }\)
So replacing these in the formula for the correlation coefficient we get:-
\(\displaystyle r = \frac {\sum d_x d_y} { (n-1) \sqrt{ \frac{\sum d_x^2}{n-1} } \sqrt{ \frac{\sum d_y^2}{n-1} } } \)Is this interpretation correct with respect to the diagram I have shown? I know the signs of \(\displaystyle d_x\) and \(\displaystyle d_y\) will depend on which side of \(\displaystyle \bar{x}\) and \(\displaystyle \bar{y}\) , \(\displaystyle x\) and \(\displaystyle y\) appear.
Say I have a sample with 5 data points:-
x y
8 6
16 8
20 16
28 12
32 20
My goal is to calculate Pearson correlation coefficient between x and y.
So this is how the diagram I created looks like:-
View attachment 6472
I have done appropriate color coding.
So in this case the covariance between x and y is:-
\(\displaystyle cov(x,y) = \frac {\sum d_x d_y}{n-1} \)
\(\displaystyle d_x\) and \(\displaystyle d_y\) are the deviations (not standard deviation) from \(\displaystyle \bar{x}\) and \(\displaystyle \bar{y}\) respectively, these mean lines are shown in the diagram (red line for \(\displaystyle \bar{x}\) and the green line for \(\displaystyle \bar{y}\)).
Pearson correlation coefficient \(\displaystyle r = \frac{cov(x,y)}{S_x S_y} \)
Based on the diagram, standard deviations of x and y are:-
\(\displaystyle S_x = \sqrt{ \frac{\sum d_x^2}{n-1} }\)
\(\displaystyle S_y = \sqrt{ \frac{\sum d_y^2}{n-1} }\)
So replacing these in the formula for the correlation coefficient we get:-
\(\displaystyle r = \frac {\sum d_x d_y} { (n-1) \sqrt{ \frac{\sum d_x^2}{n-1} } \sqrt{ \frac{\sum d_y^2}{n-1} } } \)Is this interpretation correct with respect to the diagram I have shown? I know the signs of \(\displaystyle d_x\) and \(\displaystyle d_y\) will depend on which side of \(\displaystyle \bar{x}\) and \(\displaystyle \bar{y}\) , \(\displaystyle x\) and \(\displaystyle y\) appear.