Determining the Importance of Certain Data Types (PCA?)

Pighead · Oct 8, 2008

Hello Forum,

My first post...

Im doing a project that extracts certain features from music files. These "feautures" will/may become the inputs to a neural network. I have 12 features in total which will correspond to a maximum of 12 inputs to the neural network.

Essentially I will have 12 columns of data, 1 column of data for each feature. eg 10 music files will produce 10 rows of data for each feature/column. eg Amplitude could be column 1.

Anyway, here comes my maths question. I am not an expert at Maths as I've only done basic math at university but I am willing to learn and am a fast learner.

--------------------
I want to decide which input features/columns of data are the most important and any relationshipd between them etc. Maybe some sort of classification also but I am not sure?

I have been told that PCA or Principle Components Analysis could be the best way of doing this. I don't have any knowledge of this but a search in Google tells me that this is working out SD and other parameters.

Also, I have been told that classifiers such as Bayesian classifiers could be worth a look.

Im just looking for advice for good maths experts on here. How would you tackle the problem, what techniques would you use? Is it important to look at the relationships between the input data sets?

Dale · Oct 8, 2008

Hi Pighead, welcome to PF!

When you say "most important" what do you mean? In other words, what is the question you are trying to answer or the task you are trying to accomplish with your data?

Pighead · Oct 8, 2008

DaleSpam said:

Hi Pighead, welcome to PF!

When you say "most important" what do you mean? In other words, what is the question you are trying to answer or the task you are trying to accomplish with your data?

Thanks.

The prediction out of the neural network will be the quality of the music. The training data for the neural network will be human scores for certain audio files ie they grade the quality of the audio files and give a score. The NN will try to predict what score humans would grade.

The inputs will be from the files used by the humans in the quality grading process. The output of the neural network will be the scores recorded from the humans , for the training of the netwrok.

I want to know 3 things;

1. how do I assess which inputs are most important in giving an accurate prediction of the human scores. I ahve the inputs and expected outputs of the neural network so how do I analyse the inputs to see which ones are most important.

2. Also, which inputs should be removed as they have no importance.

3. Any other ways of improving the accuracy of the system eg classifers that will classify some of the inputs in some way. I am not sure about this. Maybe I could have a different neural network for each class. I think I read that a naive Baysian Classifier can independently decide which inputs to use.?

Thanks for any help.

Dale · Oct 8, 2008

It sounds to me like you want a multiple regression. That will give you the best linear combination of your features for predicting the scores. You should probably try both a linear regression and a logistic regression.

There are also specific methods for including or excluding your features as predictors.

Determining the Importance of Certain Data Types (PCA?)

Related to Determining the Importance of Certain Data Types (PCA?)

1. What is PCA and why is it important in data analysis?

2. How does PCA determine the importance of data types?

3. Can PCA be used for all types of data?

4. How do you interpret the results of a PCA analysis?

5. Are there any limitations to using PCA in data analysis?

Similar threads

Hot Threads

Recent Insights