Determining the Importance of Certain Data Types (PCA?)

  • Thread starter Pighead
  • Start date
  • Tags
    Data Pca
In summary: I'm not sure which would be best for your situation, but I would definitely explore those options.In summary, Pighead is asking how to choose which input features are the most important in a prediction model for human scores, and how to modify the model if certain features are not important. He is also asking about possible methods for predicting the quality of audio files.
  • #1
Pighead
2
0
Hello Forum,

My first post...

Im doing a project that extracts certain features from music files. These "feautures" will/may become the inputs to a neural network. I have 12 features in total which will correspond to a maximum of 12 inputs to the neural network.

Essentially I will have 12 columns of data, 1 column of data for each feature. eg 10 music files will produce 10 rows of data for each feature/column. eg Amplitude could be column 1.

Anyway, here comes my maths question. I am not an expert at Maths as I've only done basic math at university but I am willing to learn and am a fast learner.

--------------------
I want to decide which input features/columns of data are the most important and any relationshipd between them etc. Maybe some sort of classification also but I am not sure?

I have been told that PCA or Principle Components Analysis could be the best way of doing this. I don't have any knowledge of this but a search in Google tells me that this is working out SD and other parameters.

Also, I have been told that classifiers such as Bayesian classifiers could be worth a look.

Im just looking for advice for good maths experts on here. How would you tackle the problem, what techniques would you use? Is it important to look at the relationships between the input data sets?
 
Last edited:
Physics news on Phys.org
  • #2
Hi Pighead, welcome to PF!

When you say "most important" what do you mean? In other words, what is the question you are trying to answer or the task you are trying to accomplish with your data?
 
  • #3
DaleSpam said:
Hi Pighead, welcome to PF!

When you say "most important" what do you mean? In other words, what is the question you are trying to answer or the task you are trying to accomplish with your data?

Thanks.

The prediction out of the neural network will be the quality of the music. The training data for the neural network will be human scores for certain audio files ie they grade the quality of the audio files and give a score. The NN will try to predict what score humans would grade.

The inputs will be from the files used by the humans in the quality grading process. The output of the neural network will be the scores recorded from the humans , for the training of the netwrok.

I want to know 3 things;

1. how do I assess which inputs are most important in giving an accurate prediction of the human scores. I ahve the inputs and expected outputs of the neural network so how do I analyse the inputs to see which ones are most important.

2. Also, which inputs should be removed as they have no importance.

3. Any other ways of improving the accuracy of the system eg classifers that will classify some of the inputs in some way. I am not sure about this. Maybe I could have a different neural network for each class. I think I read that a naive Baysian Classifier can independently decide which inputs to use.?

Thanks for any help.
 
  • #4
It sounds to me like you want a multiple regression. That will give you the best linear combination of your features for predicting the scores. You should probably try both a linear regression and a logistic regression.

There are also specific methods for including or excluding your features as predictors.
 

Related to Determining the Importance of Certain Data Types (PCA?)

1. What is PCA and why is it important in data analysis?

PCA stands for Principal Component Analysis and it is a technique used in data analysis to reduce the complexity of high-dimensional data. It helps in identifying important patterns and relationships in the data, making it easier to interpret and analyze.

2. How does PCA determine the importance of data types?

PCA determines the importance of data types by identifying the variables or features that contribute the most to the variance in the data. These variables are considered the most important in explaining the underlying patterns and relationships in the data.

3. Can PCA be used for all types of data?

PCA can be used for most types of data, including numerical, categorical, and mixed data. However, it is most commonly used for numerical data as it relies on calculating variances and covariances between variables.

4. How do you interpret the results of a PCA analysis?

The results of a PCA analysis are typically presented in the form of a scatter plot or a biplot, where each data point represents a sample and the variables are represented by arrows. The direction and length of the arrows indicate the strength and direction of the relationship between the variables, and the distance between data points represents the similarity or dissimilarity between samples.

5. Are there any limitations to using PCA in data analysis?

While PCA can be a powerful tool in data analysis, it does have some limitations. It assumes that the data is linearly correlated and may not work well for non-linear relationships. Additionally, it can only identify patterns and relationships that are present in the data, and may not be able to capture more complex relationships.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
29
Views
6K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
7
Views
2K
  • Programming and Computer Science
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
Replies
1
Views
807
  • Programming and Computer Science
Replies
11
Views
1K
Back
Top