Warning for Sensitive Contents! This post contains photos of bugs. If you are sensitive about insects, please close the tab immediately!
Dive into Deep Learning
Lucas Zhang
Train a deep learning model to classify beetles, cockroaches and dragonflies using these images. Original images from https://www.insectimages.org/index.cfm. Explain how the neural network classified the images using SHapley Additive exPlanations.
To achieve our aims, we require packages matplotlib.pyplot, numpy, os, PIL and tensorflow.
Note: The latest version of tensorflow (2.7.0) is required for the following source code.
It shows we have 1019 figures in the ‘train’ folder and 180 figures in the ‘test’ folder. Next, we may want to see a few example photos of beetles, cockroaches and dragonflies in our dataset.
Configure the neural network model
We need to set the training dataset (using the photos in the “train” folder) and the test dataset (using the “test” folder) for our neural network model. Here we use functions in tensorflow.keras to help set up the configuration. We first set the training set and test set, and check the names of the 3 classes in our dataset.
We found 1021 files belonging to 3 classes in the ‘train’ folder and 187 files belonging to 3 classes in the ‘test’ folder, and these photos are [‘beetles’, ‘cockroach’, ‘dragonflies’]. Next, we display some photos in our training dataset.
Train the neural network model
In this section, we will set up a buffer for our model, convert the color scale [0, 255] to (0, 1) scale, set up layers for the neural network model and finally, learn the parameters in our neural network model.
After these steps, we can check the properties of different layers in our neural network.
It shows we have a total of 8,412,707 trainable parameters in our model. Thereafter, we are about to learn the parameters in the model or namely, to train the neural network.
Quality-checking
In this section, we will exam our model using the “test” dataset. A good model should have high training accuracy and low loss, but we should also avoid over-fitting. Overfitting occurs if the model on the training set is much better than the test set.
In addition to the statistics, we will also check if our trained model can successfully identify the insect in some figures provided in the test folder or on the internet.
These figures show that our model works well on both training and test datasets with high accuracy and low loss. We will next test our model with some real examples, including one specific photo in our dataset and one from the internet.
It shows “This image most likely belongs to dragonflies with a 99.85 percent confidence.” We next use a photo from the internet to test.
It shows “This image most likely belongs to dragonflies with a 99.93 percent confidence.” Combining these results, it appears that our model can identify the dragonflies very well. But how does it work? We will use SHAP values to illustrate this point.
Explain our model using SHAP
In the following section, we would like to explain how our model works. We will calculate the SHAP value for a different part in one of our previous figures. Source code adapted from h1ros.
Initialization
To achieve our aims, we require packages shap, skimage.segmentation, pandas, numpy, matplotlib.pyplot and warnings.
Note: The latest version of tensorflow (2.7.0) is required for the following source code.
We will again use the dragonfly photo from the internet to calculate the SHAP values for its different parts.
Divide the picture into different parts
We may want to divide the picture into different parts to see which part plays an important role in our model.
Set up the SHAP kernel explainer and calculate the SHAP value
We will calculate the SHAP value by blocking specific parts and observing how the model outcome may change.
We calculate the SHAP values for each part.
Visualize the SHAP for different parts in the photo
Finally, we are able to examine which part in our previous example contributes more to the model result.
It shows our model correctly predicts the photo as dragonflies with a high score. This prediction is partially because the background and the middle part of the body make it look like a “dragonfly”