Looking for advice in clusterization

Frank Einstein · Oct 15, 2023

Hello everyone. I have a machine with a series of sensors. All sensors send a signal each minute. I want to know if any of those sensors are redundant. The data is available as an Excel file, where the columns are the variables and the rows are the measurements. I have 1000 rows.

To do this, I have used DBSCAN in Python as

Data clusterization:

scaler = StandardScaler()
data_normalized = scaler.fit_transform(data)
data_normalized = data_normalized.T
dbscan = DBSCAN(eps=15, min_samples=2)
clusters = dbscan.fit_predict(data_normalized)

However, I think that there has to be a better way to find relationships between variables (each sensor or columns of the data file).

Could someone please point me towards a methodology more suitable for my goals?
Any answer is appreciated.
Tanks for reading.
Best regards.
Frank.

Dale · Oct 15, 2023

You can just look at the correlation matrix. If two inputs are highly correlated then you can probably drop one.

Frank Einstein · Oct 15, 2023

Dale said:

You can just look at the correlation matrix. If two inputs are highly correlated then you can probably drop one.

Thanks. I can calculate them with ease as well.

Looking for advice in clusterization

FAQ: Looking for advice in clusterization

What is clusterization in data science?

What are the common algorithms used for clusterization?

How do I choose the number of clusters in K-Means clustering?

What are the advantages and disadvantages of hierarchical clustering?

How can I evaluate the quality of my clustering results?

Similar threads

Hot Threads

Recent Insights