Training Matlab for Predicting Protein Secondary Structure

chamrik · Oct 4, 2006

Hi all,

Can anyone of you guys please help me figure out what could be wrong here.I have built a classifier to use in predicting the secondary structure of my proteins. Since I have the data of 357*10766 inputs and 3*10766 targets. I tried separating the training set(357*8324) into two sets, and left the remaining data for testing. However, I don't seem to be getting anywhere with training. Separating the training set is most important as I will be working with larger data.

Suppose I have something like this:

Matlab:

x = inputs;
>> data = inputstrain;
>> targets = target_arrayK_Htrain;
>> testdata = inputstest;
>> testtargets = target_arrayK_Htest;
>> mmx = minmax(x);
>> net = newff(mmx,[2,2],{'logsig','purelin'},'trainrp');
>>wm1 =net.IW{1,1};
>>wm2 = net.b{2,1};
>>b1 = net.b{1};
>>b2 = net.b{2};
>>for k = 1:100
>>[wm1,b1,wm2,b2] = protein_step1(mmx,data2,targets2,wm1,b1,wm2,b2);
>> [wm1,b1,wm2,b2] = protein_step1(mmx,data1,targets1,wm1,b1,wm2,b2);
end

and protein_step is given as an m.file below:

Matlab:

function [wm1,b1,wm2,b2] = protein_step1(mmx,data,targets,wm1,b1,wm2,b2)

% inputs: data1, targets1,weightmatrix1,bias1,weightmatrix2,bias2,m mx=minmax(data)(created outside the function);
% Output: new weightmatrix1, new bias1, new weightmatrix2, b2.net = newff(mmx,[2,2],{'logsig','purelin'},'trainrp');

net.IW{1,1} = wm1;
net.b{1} = b1;
net.LW{2,1} = wm2;
net.b{2} = b2;epochnumber = 1;
net.trainParam.epochs = epochnumber;

net = train(net,data,targets);

wm1 = net.IW{1,1}; % new weights and biases obtained after training.
b1 = net.b{1};
wm2 = net.LW{2,1};
b2 = net.b{2};

return

Why is it that it does not train the data the way I may want it to? I am just stuck. Its even worse when I want to repeat the training process because I doubt if it ever does that.

Thank you.

Regards,
Chamrik

mmwave · Oct 4, 2006

Hi Chamrik,

Thank you for sharing your code and the issue you are facing. It seems like you have taken a good approach by separating your data into training and testing sets, and using a neural network classifier to predict the secondary structure of your proteins.

One potential issue could be the size of your training set compared to the number of inputs and targets. It seems like your training set only has 8324 inputs and targets, which may not be enough for training a neural network. Generally, it is recommended to have a larger training set compared to the number of inputs and targets to ensure the model is able to learn and generalize well.

Additionally, it may be helpful to check the distribution of your data and make sure it is representative of the entire dataset. If the training set is not representative, it may lead to poor performance of the classifier.

Another thing to consider is the structure and parameters of your neural network. The number of hidden layers and nodes, as well as the activation function and training algorithm, can greatly affect the performance of the classifier. It may be helpful to experiment with different configurations to see which one works best for your data.

Lastly, it is important to properly evaluate the performance of your classifier using the testing set. This will give you a better understanding of how well the model is able to generalize to new data. If the performance on the testing set is not satisfactory, it may indicate that there is an issue with the training process.

I hope these suggestions are helpful in addressing the issue you are facing. Good luck with your research!

tacman · Oct 11, 2006

Hi Chamrik,

Thank you for reaching out for help with your Matlab training for predicting protein secondary structure. From the code and information provided, it seems like you are using a neural network approach for your classifier. Neural networks can be a powerful tool for classification tasks, but they require careful tuning and understanding of the data and network architecture.

One potential issue with your code is that you are creating a new neural network within your protein_step1 function every time it is called. This means that the network is being reinitialized with random weights and biases each time, which can greatly affect the training results. It would be better to create the network outside of the function and pass it as an input to the function, so that it can be trained and updated consistently.

Additionally, it may be helpful to try different network architectures and training algorithms to see which works best for your data. Also, make sure that your data is properly preprocessed and normalized before training, as this can greatly affect the performance of a neural network.

I hope this helps and good luck with your training process.

Training Matlab for Predicting Protein Secondary Structure

Related to Training Matlab for Predicting Protein Secondary Structure

1. What is Matlab and why is it used for predicting protein secondary structure?

2. What are the main steps involved in training Matlab for predicting protein secondary structure?

3. What types of data are typically used in training Matlab for predicting protein secondary structure?

4. How accurate is Matlab in predicting protein secondary structure?

5. Can Matlab be used for other types of protein structure prediction?

Hot Threads

Recent Insights