Interpretable Machine Learning with Python by Serg Masís

Interpretable Machine Learning with Python by Serg Masís

Author:Serg Masís
Language: eng
Format: epub
Publisher: Packt publishing pvt. ltd
Published: 2021-03-25T00:00:00+00:00


Assessing the CNN classifier with traditional interpretation methods

We can easily derive accuracies for all three datasets using the model's own evaluate function like this:

train_score = cnn_fruits_mdl.evaluate(X_train,
ohe.transform(y_train), verbose=0)

test_score = cnn_fruits_mdl.evaluate(X_test,
ohe.transform(y_test), verbose=0)

val_score = cnn_fruits_mdl.evaluate(X_val,
ohe.transform(y_val), verbose=0)

print('Train accuracy: {:.1%}'.format(train_score[1]))

print('Test accuracy: {:.1%}'.format(test_score[1]))

print('Val accuracy: {:.1%}'.format(val_score[1]))

The preceding snippet outputted the following figures:

Train accuracy: 100.0%

Test accuracy: 99.9%

Val accuracy: 31.2%

Indeed, you can expect a model to always reach 100% training accuracy if you train it for enough epochs using optimal hyperparameters. A near-perfect test accuracy is harder to achieve, depending on how different these two are. We know that the test dataset is simply a sample of images from the same collection, so it's not particularly surprising that such high accuracy (99.9%) was achieved.

When classification models are discussed in a business setting, often layman stakeholders are only interested in one number: accuracy. It's easy to let this drive the discussion, but there's much more nuance to it. For instance, the disappointing validation accuracy (31.2%) could mean many things. It could mean that five classes are getting perfect classification, and all others are not, or that 10 classes are getting only half misclassified. There are many possibilities of what could be going on.

In any case, when dealing with a multiclass classification problem, an accuracy below 50% might not be as bad as it seems. With 16 classes more or less evenly split, we have to take note that the No Information Rate is likely to be around 7%, so 31.2% is still orders of magnitude higher than that. In fact, there is less of a leap to 100%! To a machine learner practitioner, this means that if we judge solely based on validation accuracy results, the model is still learning something of value that can be improved upon.

We will first evaluate the model using the test dataset with the evaluate_multiclass_mdl function. The arguments include the model (cnn_fruits_mdl), our test data (X_test), and corresponding labels (y_test), as well as the class names (fruits_l) and the encoder (ohe). Lastly, we don't need it to plot the ROC curves since they will be perfect (plot_roc=False). This function returns the predicted labels and probabilities, which we can store in variables for later use:

y_test_pred, y_test_prob =
mldatasets.evaluate_multiclass_mdl(cnn_fruits_mdl, X_test,
y_test, fruits_l, ohe, plot_roc=False)

The preceding code generates both Figure 8.4 with a confusion matrix and Figure 8.5 with performance metrics for each class:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.