Problem with scikit-learn metrics in K-fold cross validation

  • Python
  • Thread starter BRN
  • Start date
  • Tags
    Cross
In summary, the conversation discusses the implementation of a K-Fold Cross Validation and the calculation of accuracy using Scikit-Learn metrics. There is a difference between the accuracy calculated during each training and the average accuracy calculated through the metrics. The code for the K-Fold Cross Validation and the calculation of metrics is shown, and the conversation includes a discussion about the accuracy being lower than expected. There is a query about the discrepancy between the accuracy displayed during training and the one calculated using Scikit-Learn metrics.
  • #1
BRN
108
10
Hello everyone,
In my implementation of a K-Fold Cross Validation, I find a difference between the accuracy calculated during each training and the average accuracy calculated through the metric functions of Scikit-Learn.

This is my code for the K-Fold Cross Validation and for the calculation of metrics.
K-Fold cross validation:
def kf_validation(images_path_list):
    
    kfold = KFold(n_splits = NUM_FOLDS, shuffle = True, random_state = 42)
    
    model = inceptionV3()
    
    acc_list = []
    mse_list = []
    mae_list = []
    auc_list = []
    
    for fold, (train_index, test_index) in enumerate(kfold.split(images_path_list)):
        
        print('==================================================================')
        print(f'-----------------------FOLD {fold + 1}, -------------------------')
        print('==================================================================')
        
        dataset = get_dataset(images_path_list, split_dataset = False)
        train_ds = dataset.skip(len(test_index)).batch(BATCH_SIZE)
        test_ds = dataset.skip(len(train_index)).take(len(test_index)).batch(BATCH_SIZE)
        
        model.fit(train_ds, epochs = NUM_EPOCHS, validation_data = test_ds, verbose = 1)
        
        y_true = [label for _, label in test_ds]
        y_true = merge_tensors(y_true)
        y_pred = model.predict(test_ds)
        y_pred = tf.argmax(y_pred, axis = 1)
        y_true = tf.argmax(y_true, axis = 1)       
        
        results = calc_metrics(y_true, y_pred)
        acc_list, mse_list, mae_list, auc_list = zip([(results[0], results[1], results[2], results[3]) for _ in range(4)])
    
    print('----------------AVERAGES METRICS AFTER ', NUM_FOLDS,' FOLDS---------------------')
    print(f'average ACC: {np.mean(acc_list):.3f}')
    print(f'average MSE: {np.mean(mse_list):.3f}')
    print(f'average MAE: {np.mean(mae_list):.3f}')
    print(f'average AUC: {np.mean(auc_list):.3f}')
    print('--------------------------------------------------------------------------------')

    return y_true, y_pred

scikit-learn metrics:
def calc_metrics(y_true, y_pred):
    
    acc = accuracy_score(y_true, y_pred, normalize = True)
    mse = mean_squared_error(y_true, y_pred)
    mae = mean_absolute_error(y_true, y_pred)
    
    fpr, tpr, thresholds = roc_curve(y_true, y_pred)
    auc_val = auc(fpr, tpr)
    
    print(f'-- ACC={acc}, MSE={mse}, MAE={mae}, AUC={auc_val}, --')

    return acc, mse, mae, auc_val

For example, for the first fold, the result is this

first fold:
==================================================================
-----------------------FOLD 1, -------------------------
==================================================================
Epoch 1/10
19/19 [==============================] - 55s 486ms/step - loss: 1.7101 - accuracy: 0.7681 - val_loss: 8.6331 - val_accuracy: 0.4371
Epoch 2/10
19/19 [==============================] - 9s 355ms/step - loss: 0.4822 - accuracy: 0.7729 - val_loss: 8.0062 - val_accuracy: 0.4780
Epoch 3/10
19/19 [==============================] - 9s 379ms/step - loss: 0.4996 - accuracy: 0.8312 - val_loss: 7.9097 - val_accuracy: 0.4843
Epoch 4/10
19/19 [==============================] - 9s 357ms/step - loss: 0.3882 - accuracy: 0.8738 - val_loss: 7.9579 - val_accuracy: 0.4811
Epoch 5/10
19/19 [==============================] - 9s 355ms/step - loss: 0.6962 - accuracy: 0.9085 - val_loss: 5.0147 - val_accuracy: 0.5283
Epoch 6/10
19/19 [==============================] - 9s 359ms/step - loss: 1.1334 - accuracy: 0.8896 - val_loss: 2.6646 - val_accuracy: 0.7453
Epoch 7/10
19/19 [==============================] - 9s 349ms/step - loss: 0.3623 - accuracy: 0.9401 - val_loss: 4.8177 - val_accuracy: 0.6792
Epoch 8/10
19/19 [==============================] - 9s 354ms/step - loss: 0.3827 - accuracy: 0.9401 - val_loss: 3.4249 - val_accuracy: 0.7767
Epoch 9/10
19/19 [==============================] - 8s 347ms/step - loss: 0.3808 - accuracy: 0.9180 - val_loss: 4.9683 - val_accuracy: 0.7107
Epoch 10/10
19/19 [==============================] - 9s 361ms/step - loss: 0.3999 - accuracy: 0.9148 - val_loss: 9.7832 - val_accuracy: 0.3396
10/10 [==============================] - 4s 68ms/step
-- ACC=0.5125786163522013, MSE=0.48742138364779874, MAE=0.48742138364779874, AUC=0.5153103611979271,

How is it possible to obtain an average accuracy of 0.51 when in the training is in the range 0.70 - 0.90?

Does anyone have an explanation?

Thanks.
 
Technology news on Phys.org
  • #2
Maybe I explained myself wrong. I wonder why the accuracy calculated with the Scikit-Learn metrics is not comparable with the one displayed during training.
 

FAQ: Problem with scikit-learn metrics in K-fold cross validation

Why are my scikit-learn metrics inconsistent in K-fold cross validation?

One common reason for inconsistent metrics in K-fold cross validation is the presence of class imbalance. If certain classes are underrepresented in the dataset, the model may struggle to accurately predict those classes in some folds, leading to variability in the evaluation metrics.

How can I address the problem of inconsistent metrics in K-fold cross validation with scikit-learn?

To address the issue of inconsistent metrics, you can use techniques such as stratified K-fold cross validation, which ensures that each fold contains a proportional representation of each class. This can help improve the stability and reliability of the evaluation metrics.

Are there any other factors that could contribute to the problem of scikit-learn metrics in K-fold cross validation?

Other factors that could contribute to inconsistent metrics in K-fold cross validation include the choice of evaluation metric, the complexity of the model, and the size of the dataset. It's important to carefully consider these factors when interpreting the results of cross validation.

Can overfitting be a potential cause of the problem with scikit-learn metrics in K-fold cross validation?

Yes, overfitting can be a potential cause of inconsistent metrics in K-fold cross validation. If the model is too complex or if it is trained on a small dataset, it may memorize the training data and perform poorly on unseen data, leading to variability in the evaluation metrics across folds.

How can I improve the reliability of scikit-learn metrics in K-fold cross validation?

To improve the reliability of the metrics in K-fold cross validation, you can use techniques such as hyperparameter tuning, feature selection, and model ensembling. These approaches can help optimize the model's performance and reduce variability in the evaluation metrics across different folds.

Back
Top