The effect of cross validation on correlation coefficient

lyuriedin · Oct 23, 2012

I have two variables where the regression line is just the mean as a constant. As such, the correlation is zero. However, when I perform k-fold cross validation (in Weka) this becomes non-zero.

I have no idea why this is. The regression line for whatever the test set is will always be a constant, where the correlation will be zero. Because some of the data will be taken out to act as the validation set at each fold the mean will be different at each fold, but the correlation will still be the same no matter what. The only thing I can think of is that it is computing the correlation between training means with respect to the actual mean, but even then these should sum to zero.

Can anybody clear this up for me?

Valkarie · Oct 23, 2012

It is possible that the k-fold cross validation is computing the correlation between the training means and the validation set means. This could explain why the correlation is not zero. Additionally, it is possible that the k-fold cross validation is calculating the correlation of the training data at each fold, which would also lead to a non-zero correlation.

The effect of cross validation on correlation coefficient

FAQ: The effect of cross validation on correlation coefficient

What is cross validation and why is it important in relation to correlation coefficient?

How does cross validation affect the correlation coefficient?

What is the difference between k-fold cross validation and leave-one-out cross validation?

Can cross validation be applied to any type of data?

What are the limitations of cross validation in relation to correlation coefficient?

Similar threads

Hot Threads

Recent Insights