- #1
BRN
- 108
- 10
Hi everyone,
I have to classify a DNA sequence with a LSTM neural network but I have a problem with the inputs shame. Both the sequence and the class are encoded with One Hot Encoding and my code is this:
The error I receive is this
Can anyone tell me how to solve?
Thanks!
I have to classify a DNA sequence with a LSTM neural network but I have a problem with the inputs shame. Both the sequence and the class are encoded with One Hot Encoding and my code is this:
python code:
import pandas as pd
import numpy as np
data = pd.read_csv('splice.data', header = None)
data_shuffled = data.sample(frac = 1).reset_index(drop = True)
# space removing
for i in range (len(data)):
data.loc[i][2] = data.loc[i][2].strip()
raw_data = np.array(data)
def one_hot_encoder(data):
x = []
y = []
for i in range (len(data)):
oh_class = np.zeros((1, 3))
if data[i][0] == 'EI': oh_class[0][0] = 1
elif data[i][0] == 'IE': oh_class[0][1] = 1
else: oh_class[0][2] = 1
y.append(oh_class)
oh_seq = np.zeros((len(data[0][2]), 4))
for j in range (len(data[0][2])):
if data[i][2][j] == 'A': oh_seq[j][0] = 1
elif data[i][2][j] == 'C': oh_seq[j][1] = 1
elif data[i][2][j] == 'G': oh_seq[j][2] = 1
else: oh_seq[j][3] = 1
x.append(oh_seq)
return np.array(x), np.array(y)
x_seq, y_class = one_hot_encoder(raw_data)
# Split into validation and training data
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x_seq, y_class, test_size = 0.2, random_state = 1)
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
#Initialize the RNN
model = Sequential()
model.add(LSTM(units = 50, activation = 'relu', return_sequences = True, input_shape = (x_train.shape[1], x_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units = 60, activation = 'relu', return_sequences = True))
model.add(Dropout(0.3))
model.add(LSTM(units = 80, activation = 'relu', return_sequences = True))
model.add(Dropout(0.4))
model.add(LSTM(units = 120, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(units = 3, activation='softmax'))
model.summary()
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.fit(x_train, y_train, epochs = 100, batch_size = 80, validation_split = 0.1)
The error I receive is this
python error:
ValueError: Shapes (None, 1, 3) and (None, 3) are incompatible
Can anyone tell me how to solve?
Thanks!