Python Problem with scikit-learn metrics in K-fold cross validation

  • Thread starter Thread starter BRN
  • Start date Start date
  • Tags Tags
    Cross
AI Thread Summary
The discussion revolves around discrepancies observed in accuracy metrics during K-Fold Cross Validation using Scikit-Learn. The user implemented K-Fold Cross Validation with a model and noted that the accuracy reported during training (ranging from 0.70 to 0.90) differs significantly from the average accuracy calculated post-validation, which was around 0.51. This raises questions about the reliability of the accuracy metrics derived from Scikit-Learn compared to the training accuracy displayed during model training. The user seeks clarification on why these metrics are not aligned, indicating a potential misunderstanding of how validation accuracy is computed versus training accuracy. The conversation highlights the importance of understanding the differences between training performance and validation metrics in machine learning model evaluation.
BRN
Messages
107
Reaction score
10
Hello everyone,
In my implementation of a K-Fold Cross Validation, I find a difference between the accuracy calculated during each training and the average accuracy calculated through the metric functions of Scikit-Learn.

This is my code for the K-Fold Cross Validation and for the calculation of metrics.
[CODE lang="python" title="K-Fold cross validation"]def kf_validation(images_path_list):

kfold = KFold(n_splits = NUM_FOLDS, shuffle = True, random_state = 42)

model = inceptionV3()

acc_list = []
mse_list = []
mae_list = []
auc_list = []

for fold, (train_index, test_index) in enumerate(kfold.split(images_path_list)):

print('==================================================================')
print(f'-----------------------FOLD {fold + 1}, -------------------------')
print('==================================================================')

dataset = get_dataset(images_path_list, split_dataset = False)
train_ds = dataset.skip(len(test_index)).batch(BATCH_SIZE)
test_ds = dataset.skip(len(train_index)).take(len(test_index)).batch(BATCH_SIZE)

model.fit(train_ds, epochs = NUM_EPOCHS, validation_data = test_ds, verbose = 1)

y_true = [label for _, label in test_ds]
y_true = merge_tensors(y_true)
y_pred = model.predict(test_ds)
y_pred = tf.argmax(y_pred, axis = 1)
y_true = tf.argmax(y_true, axis = 1)

results = calc_metrics(y_true, y_pred)
acc_list, mse_list, mae_list, auc_list = zip([(results[0], results[1], results[2], results[3]) for _ in range(4)])

print('----------------AVERAGES METRICS AFTER ', NUM_FOLDS,' FOLDS---------------------')
print(f'average ACC: {np.mean(acc_list):.3f}')
print(f'average MSE: {np.mean(mse_list):.3f}')
print(f'average MAE: {np.mean(mae_list):.3f}')
print(f'average AUC: {np.mean(auc_list):.3f}')
print('--------------------------------------------------------------------------------')

return y_true, y_pred[/CODE]

[CODE lang="python" title="scikit-learn metrics"]def calc_metrics(y_true, y_pred):

acc = accuracy_score(y_true, y_pred, normalize = True)
mse = mean_squared_error(y_true, y_pred)
mae = mean_absolute_error(y_true, y_pred)

fpr, tpr, thresholds = roc_curve(y_true, y_pred)
auc_val = auc(fpr, tpr)

print(f'-- ACC={acc}, MSE={mse}, MAE={mae}, AUC={auc_val}, --')

return acc, mse, mae, auc_val[/CODE]

For example, for the first fold, the result is this

[CODE title="first fold"]==================================================================
-----------------------FOLD 1, -------------------------
==================================================================
Epoch 1/10
19/19 [==============================] - 55s 486ms/step - loss: 1.7101 - accuracy: 0.7681 - val_loss: 8.6331 - val_accuracy: 0.4371
Epoch 2/10
19/19 [==============================] - 9s 355ms/step - loss: 0.4822 - accuracy: 0.7729 - val_loss: 8.0062 - val_accuracy: 0.4780
Epoch 3/10
19/19 [==============================] - 9s 379ms/step - loss: 0.4996 - accuracy: 0.8312 - val_loss: 7.9097 - val_accuracy: 0.4843
Epoch 4/10
19/19 [==============================] - 9s 357ms/step - loss: 0.3882 - accuracy: 0.8738 - val_loss: 7.9579 - val_accuracy: 0.4811
Epoch 5/10
19/19 [==============================] - 9s 355ms/step - loss: 0.6962 - accuracy: 0.9085 - val_loss: 5.0147 - val_accuracy: 0.5283
Epoch 6/10
19/19 [==============================] - 9s 359ms/step - loss: 1.1334 - accuracy: 0.8896 - val_loss: 2.6646 - val_accuracy: 0.7453
Epoch 7/10
19/19 [==============================] - 9s 349ms/step - loss: 0.3623 - accuracy: 0.9401 - val_loss: 4.8177 - val_accuracy: 0.6792
Epoch 8/10
19/19 [==============================] - 9s 354ms/step - loss: 0.3827 - accuracy: 0.9401 - val_loss: 3.4249 - val_accuracy: 0.7767
Epoch 9/10
19/19 [==============================] - 8s 347ms/step - loss: 0.3808 - accuracy: 0.9180 - val_loss: 4.9683 - val_accuracy: 0.7107
Epoch 10/10
19/19 [==============================] - 9s 361ms/step - loss: 0.3999 - accuracy: 0.9148 - val_loss: 9.7832 - val_accuracy: 0.3396
10/10 [==============================] - 4s 68ms/step
-- ACC=0.5125786163522013, MSE=0.48742138364779874, MAE=0.48742138364779874, AUC=0.5153103611979271, [/CODE]

How is it possible to obtain an average accuracy of 0.51 when in the training is in the range 0.70 - 0.90?

Does anyone have an explanation?

Thanks.
 
Technology news on Phys.org
Maybe I explained myself wrong. I wonder why the accuracy calculated with the Scikit-Learn metrics is not comparable with the one displayed during training.
 
Thread 'Star maps using Blender'
Blender just recently dropped a new version, 4.5(with 5.0 on the horizon), and within it was a new feature for which I immediately thought of a use for. The new feature was a .csv importer for Geometry nodes. Geometry nodes are a method of modelling that uses a node tree to create 3D models which offers more flexibility than straight modeling does. The .csv importer node allows you to bring in a .csv file and use the data in it to control aspects of your model. So for example, if you...
I tried a web search "the loss of programming ", and found an article saying that all aspects of writing, developing, and testing software programs will one day all be handled through artificial intelligence. One must wonder then, who is responsible. WHO is responsible for any problems, bugs, deficiencies, or whatever malfunctions which the programs make their users endure? Things may work wrong however the "wrong" happens. AI needs to fix the problems for the users. Any way to...
Back
Top