- #1
EngWiPy
- 1,368
- 61
Hello all,
This is not particular to Python, it is more conceptual related to machine learning algorithms, specifically the nearest neighbor classifiers. So, I have this dataset with m examples, each with n features and one target feature. The dataset is originally ordered such that similar target features are placed contiguously: AAA..A, BBB...B, ... etc. I did 10-fold cross validation on the original dataset and got an accuracy of 0.86. Then I shuffled the dataset and did the same cross validation procedure but got an accuracy of 0.32. My question is: is this expected and why since I compute the average accuracy? Is shuffling the data legitimate in the first place?
Thanks in advance
This is not particular to Python, it is more conceptual related to machine learning algorithms, specifically the nearest neighbor classifiers. So, I have this dataset with m examples, each with n features and one target feature. The dataset is originally ordered such that similar target features are placed contiguously: AAA..A, BBB...B, ... etc. I did 10-fold cross validation on the original dataset and got an accuracy of 0.86. Then I shuffled the dataset and did the same cross validation procedure but got an accuracy of 0.32. My question is: is this expected and why since I compute the average accuracy? Is shuffling the data legitimate in the first place?
Thanks in advance