Training Matlab for Predicting Protein Secondary Structure

  • MATLAB
  • Thread starter chamrik
  • Start date
  • Tags
    Matlab
In summary: ChamrikIn summary, Chamrik is seeking help with their neural network classifier for predicting the secondary structure of proteins. They have 357*10766 inputs and 3*10766 targets, and have separated their training set (357*8324) from the testing set. However, they are facing issues with the training process and are unsure if it is working properly. They have shared their code and asked for suggestions on what could be wrong. Some potential issues to consider are the size and representativeness of the training set, the structure and parameters of the neural network, and properly evaluating the performance of the classifier.
  • #1
chamrik
4
0
Hi all,

Can anyone of you guys please help me figure out what could be wrong here.I have built a classifier to use in predicting the secondary structure of my proteins. Since I have the data of 357*10766 inputs and 3*10766 targets. I tried separating the training set(357*8324) into two sets, and left the remaining data for testing. However, I don't seem to be getting anywhere with training. Separating the training set is most important as I will be working with larger data.

Suppose I have something like this:
Matlab:
x = inputs;
>> data = inputstrain;
>> targets = target_arrayK_Htrain;
>> testdata = inputstest;
>> testtargets = target_arrayK_Htest;
>> mmx = minmax(x);
>> net = newff(mmx,[2,2],{'logsig','purelin'},'trainrp');
>>wm1 =net.IW{1,1};
>>wm2 = net.b{2,1};
>>b1 = net.b{1};
>>b2 = net.b{2};
>>for k = 1:100
>>[wm1,b1,wm2,b2] = protein_step1(mmx,data2,targets2,wm1,b1,wm2,b2);
>> [wm1,b1,wm2,b2] = protein_step1(mmx,data1,targets1,wm1,b1,wm2,b2);
end

and protein_step is given as an m.file below:
Matlab:
function [wm1,b1,wm2,b2] = protein_step1(mmx,data,targets,wm1,b1,wm2,b2)

% inputs: data1, targets1,weightmatrix1,bias1,weightmatrix2,bias2,m mx=minmax(data)(created outside the function);
% Output: new weightmatrix1, new bias1, new weightmatrix2, b2.net = newff(mmx,[2,2],{'logsig','purelin'},'trainrp');

net.IW{1,1} = wm1;
net.b{1} = b1;
net.LW{2,1} = wm2;
net.b{2} = b2;epochnumber = 1;
net.trainParam.epochs = epochnumber;

net = train(net,data,targets);

wm1 = net.IW{1,1}; % new weights and biases obtained after training.
b1 = net.b{1};
wm2 = net.LW{2,1};
b2 = net.b{2};

return

Why is it that it does not train the data the way I may want it to? I am just stuck. Its even worse when I want to repeat the training process because I doubt if it ever does that.

Thank you.

Regards,
Chamrik
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
Hi Chamrik,

Thank you for sharing your code and the issue you are facing. It seems like you have taken a good approach by separating your data into training and testing sets, and using a neural network classifier to predict the secondary structure of your proteins.

One potential issue could be the size of your training set compared to the number of inputs and targets. It seems like your training set only has 8324 inputs and targets, which may not be enough for training a neural network. Generally, it is recommended to have a larger training set compared to the number of inputs and targets to ensure the model is able to learn and generalize well.

Additionally, it may be helpful to check the distribution of your data and make sure it is representative of the entire dataset. If the training set is not representative, it may lead to poor performance of the classifier.

Another thing to consider is the structure and parameters of your neural network. The number of hidden layers and nodes, as well as the activation function and training algorithm, can greatly affect the performance of the classifier. It may be helpful to experiment with different configurations to see which one works best for your data.

Lastly, it is important to properly evaluate the performance of your classifier using the testing set. This will give you a better understanding of how well the model is able to generalize to new data. If the performance on the testing set is not satisfactory, it may indicate that there is an issue with the training process.

I hope these suggestions are helpful in addressing the issue you are facing. Good luck with your research!
 
  • #3


Hi Chamrik,

Thank you for reaching out for help with your Matlab training for predicting protein secondary structure. From the code and information provided, it seems like you are using a neural network approach for your classifier. Neural networks can be a powerful tool for classification tasks, but they require careful tuning and understanding of the data and network architecture.

One potential issue with your code is that you are creating a new neural network within your protein_step1 function every time it is called. This means that the network is being reinitialized with random weights and biases each time, which can greatly affect the training results. It would be better to create the network outside of the function and pass it as an input to the function, so that it can be trained and updated consistently.

Additionally, it may be helpful to try different network architectures and training algorithms to see which works best for your data. Also, make sure that your data is properly preprocessed and normalized before training, as this can greatly affect the performance of a neural network.

I hope this helps and good luck with your training process.
 

Related to Training Matlab for Predicting Protein Secondary Structure

1. What is Matlab and why is it used for predicting protein secondary structure?

Matlab is a high-level programming language and interactive environment commonly used in scientific and engineering research. It is particularly useful for analyzing large datasets and developing predictive models. In the context of protein secondary structure prediction, Matlab can be used to process and analyze protein sequence data, and to train machine learning algorithms for predicting secondary structure elements such as alpha helices and beta strands.

2. What are the main steps involved in training Matlab for predicting protein secondary structure?

The main steps involved in training Matlab for predicting protein secondary structure include data preprocessing, feature extraction, model selection, and model evaluation. Data preprocessing involves cleaning and formatting the input data to make it suitable for analysis. Feature extraction involves identifying relevant features or characteristics of the protein sequence that can be used in the predictive model. Model selection involves choosing the most appropriate machine learning algorithm for the task, and model evaluation involves testing the performance of the trained model on new data.

3. What types of data are typically used in training Matlab for predicting protein secondary structure?

The most commonly used type of data in training Matlab for predicting protein secondary structure is protein sequence data, which consists of the linear sequence of amino acids that make up a protein. Other types of data that may be used include structural information, such as the three-dimensional structure of the protein, and physicochemical properties of the amino acids in the sequence.

4. How accurate is Matlab in predicting protein secondary structure?

The accuracy of Matlab in predicting protein secondary structure depends on various factors, such as the quality and quantity of the training data, the choice of machine learning algorithm, and the features used in the predictive model. In general, with a well-optimized model and sufficient training data, Matlab can achieve high accuracy in predicting protein secondary structure.

5. Can Matlab be used for other types of protein structure prediction?

Yes, Matlab can be used for other types of protein structure prediction, such as tertiary structure prediction or protein-ligand binding prediction. However, the specific steps and techniques used may differ from those used for predicting secondary structure, as different types of protein structure require different approaches and data. It is important to carefully consider the specific goals and limitations of the predictive model when using Matlab for any type of protein structure prediction.

Back
Top