MHB Visual illustration of Pearson correlation coefficient r

AI Thread Summary
The discussion focuses on the calculation of the Pearson correlation coefficient using a sample dataset with five data points. A visual illustration was created to depict the relationship between the variables x and y, including color-coded mean lines for clarity. The covariance formula and the standard deviation calculations for both x and y were correctly applied in the context of the diagram. It was confirmed that the interpretation of the correlation coefficient and its formula were accurate. The formula for r can be simplified, emphasizing the relationship between the covariance and the standard deviations of the datasets.
dhiraj
Messages
3
Reaction score
0
From what I have understood about Pearson correlation coefficient I have created a visual illustration, I would like to know if this understanding looks correct.

Say I have a sample with 5 data points:-

x y
8 6
16 8
20 16
28 12
32 20

My goal is to calculate Pearson correlation coefficient between x and y.

So this is how the diagram I created looks like:-

View attachment 6472

I have done appropriate color coding.

So in this case the covariance between x and y is:-

[math]cov(x,y) = \frac {\sum d_x d_y}{n-1} [/math]

[math]d_x[/math] and [math]d_y[/math] are the deviations (not standard deviation) from [math]\bar{x}[/math] and [math]\bar{y}[/math] respectively, these mean lines are shown in the diagram (red line for [math]\bar{x}[/math] and the green line for [math]\bar{y}[/math]).

Pearson correlation coefficient [math] r = \frac{cov(x,y)}{S_x S_y} [/math]

Based on the diagram, standard deviations of x and y are:-

[math]S_x = \sqrt{ \frac{\sum d_x^2}{n-1} }[/math]

[math]S_y = \sqrt{ \frac{\sum d_y^2}{n-1} }[/math]

So replacing these in the formula for the correlation coefficient we get:-
[math] r = \frac {\sum d_x d_y} { (n-1) \sqrt{ \frac{\sum d_x^2}{n-1} } \sqrt{ \frac{\sum d_y^2}{n-1} } } [/math]Is this interpretation correct with respect to the diagram I have shown? I know the signs of [math]d_x[/math] and [math]d_y[/math] will depend on which side of [math]\bar{x}[/math] and [math]\bar{y}[/math] , [math]x[/math] and [math]y[/math] appear.
 

Attachments

  • Correlation.png
    Correlation.png
    5.4 KB · Views: 111
Mathematics news on Phys.org
Hi dhiraj!

It's all correct.
And note that the formula for $r$ can be simplified to:
$$ r = \frac {\sum d_x d_y} {\sqrt{ \sum d_x^2 } \sqrt{ \sum d_y^2 }}$$
 
Seemingly by some mathematical coincidence, a hexagon of sides 2,2,7,7, 11, and 11 can be inscribed in a circle of radius 7. The other day I saw a math problem on line, which they said came from a Polish Olympiad, where you compute the length x of the 3rd side which is the same as the radius, so that the sides of length 2,x, and 11 are inscribed on the arc of a semi-circle. The law of cosines applied twice gives the answer for x of exactly 7, but the arithmetic is so complex that the...
Back
Top