- #1
Frank Einstein
- 170
- 1
- TL;DR Summary
- I have a set of data of people loading into a server and I must find the most adequate distance to cluster them.
Hello everyone.
I have a pandas dataset in python which has n+1 columns and t rows. The first column is a timestamp that goes second by second during a time interval, and the other columns are the names of the people who log in the server. The t rows of the other columns indicate if the person is logged with an "1" and a "0" if the person isn't logged in the exact second.
I have used a Hierarchical clustering with Hamming distance and linkage average.
However, I am not sure if the Hamming distance is the most suitable measure to calculate the clustering between the users, specially after reading this article in which a comparison between 76 distances is defined.
I am not an expert in clustering, so I would like to know what other people think that would be the most adequate distance measure to group the users.
As far as I know, positive and negative matches are important in this case, so the Sokal Michenner distance might be suitable?
Any recomendation is welcome.
Best regards an thanks for reading.
I have a pandas dataset in python which has n+1 columns and t rows. The first column is a timestamp that goes second by second during a time interval, and the other columns are the names of the people who log in the server. The t rows of the other columns indicate if the person is logged with an "1" and a "0" if the person isn't logged in the exact second.
I have used a Hierarchical clustering with Hamming distance and linkage average.
However, I am not sure if the Hamming distance is the most suitable measure to calculate the clustering between the users, specially after reading this article in which a comparison between 76 distances is defined.
I am not an expert in clustering, so I would like to know what other people think that would be the most adequate distance measure to group the users.
As far as I know, positive and negative matches are important in this case, so the Sokal Michenner distance might be suitable?
Any recomendation is welcome.
Best regards an thanks for reading.