Explainable Machine Learning For Land Cover Classification An Introductory Guide - Final
Explainable Machine Learning For Land Cover Classification An Introductory Guide - Final
Preface
Why You are Struggling to Improve Your Land Cover Classification
(And How to Fix It)
Many years ago, I struggled to improve my land cover classification results. It was a
time when I was obsessed with trying the latest shiny or advanced machine learning
models in order to improve land cover classification. I invested a lot of time and effort in
learning how to run new and advanced machine learning models in python or R. Yet,
time and time again, the results were disappointing despite reading about successful
use cases in major remote sensing and GIS journals.
So what was I doing wrong? Why was I struggling? Eventually, I realized that my
mindset was fixated on applying the shiny and advanced machine learning models,
which were in fashion those days. So I took a step back. I then started performing land
cover classification using a simple machine learning model such as k nearest neighbor
(KNN). And guess what? The land cover classification results were almost similar or
even better than the advanced machine learning models. From that moment, I realized
that there was more to just tuning or trying to optimize advanced machine learning
models. Much as it sounds like common sense, believe me some researchers or
students focus on only on the new machine or deep learning models.
Page 2
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
This guide is not a cookbook. The guide is for those who are interested in improving
land cover classification using explainable machine learning models. The goal of the
guide is to introduce explainable machine learning in the context of land cover
classification. My hope is that you will go through all modules and try to understand
what it takes to improve land cover classification models.
Page 3
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Table of Contents
Chapter 1. Introduction 5
1.4 Prerequisites 6
References 53
Appendix 54
Page 4
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Chapter 1. Introduction
1.1 Importance of explainable machine learning for land cover
classification
During the past decades, there have been significant advancements in land cover
classification due to the availability of Earth Observation (EO) data and the rapid
development of machine learning algorithms. A quick scan on science direct or Google
scholar reveals many journal articles reporting successful applications of machine
learning models for land cover classification. To date, many online and onsite courses,
textbooks, and tutorials on machine learning models and algorithms are available. In
addition, machine learning researchers have developed many machine learning libraries
or packages. Many researchers use libraries and packages in free and open-source
software (FOSS) and commercial off-the-shelf (COTS) software applications. In some
cases, software applications have made it easy to use machine learning models without
understanding machine learning algorithms, data science, or environmental remote
sensing. To date, one can quickly run a machine learning model and produce a land
cover map without a basic understanding of machine learning and remote sensing
principles.
There is no doubt that advanced machine learning algorithms and models have
improved land cover classification. However, most of the advanced machine learning
models and algorithms are very complex. Generally, the machine learning models and
algorithms do not clearly explain how and why they make predictions. In addition, it is
challenging to trust only evaluation results from machine learning models because
evaluation measures (overall accuracy, receiver operating characteristic curve, RMSE,
etc.) are more focused on data. In practice, model performance evaluation measures
compare model output values and input data values. However, a machine learning
model can achieve high accuracy by simply memorizing features or patterns in the data.
Therefore, it is questionable to rely only on model performance evaluation and
prediction measures.
Page 5
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
In contrast, typical machine learning models focus on model prediction. That is,
standard machine learning models are good at making predictions without a solid
explanation. Hence, the need for explainable machine learning (Roscher et al. 2019).
By the time you finish this guide, you will learn how to:
import satellite imagery and training sample points;
prepare training data;
perform exploratory data analysis;
tune and train machine learning models; and
compute accumulated local effects (ALE).
1.4 Prerequisites
This guide is for people familiar with remote sensing classification, machine learning,
and R. However, beginners can also learn about remote sensing classification, machine
Page 6
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
learning models, and R. I provide a link to further materials in the resources (see
appendix 1).
Note: VRE - vegetation red edge; NIR - Near-infrared; and SWIR - short-wave infrared bands
Page 7
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Next, set up your work environment. Note, this is the directory or folder that contains
your Sentinel-2 imagery and training data.
Create a raster object “rvars” (that is, raster variables), which will contain the post-rainy
season Sentinel-2 imagery. Next, load the imagery into the “rvars” object using the
stack() function.
Check the attributes of the post-rainy season Sentinel-2 imagery (compiled between
April and June 2020).
Page 8
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
print(rvars)
## class : RasterStack
## dimensions : 2489, 2657, 6613273, 9 (nrow, ncol, ncell, nlayers)
## resolution : 10, 10 (x, y)
## extent : 783740, 810310, 7835490, 7860380 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=35 +south +datum=WGS84 +units=m +no_defs
## names : B2, B3, B4, B5, B6, B7, B8, B11, B12
For this guide, we selected only nine spectral bands. Note that the vegetation red edge
bands (B5, B6, and B7) and the short-wave infrared bands (B11 and B12) were
resampled to 10 m to match the spatial resolution of the visible and near-infrared (NIR)
bands.
Next, create an object "ta_data" (that is, training data), which will contain training
sample data points. We will use the readOGR() function to load the training sample
points into the “ta_data” object.
The training data has one field (column) with 1,911 land cover features.
You can also check the training data using the print() function.
Page 9
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Next, use the viewRGB() function from the mapview package to display Sentinel-2
imagery and the training data (Figure 1). Click on the training points to see the
corresponding land cover classes.
Figure 1. Training data points overlaid on Sentinel-2 imagery (displayed as true color
composite; r = 3, g = 2, b = 1)
First, we are going to use the extract() function to extract spectral reflectance values
from Sentinel-2 bands (rvars) and training sample points (ta_data). A data frame called
"ta" (training area) that contains spectral reflectance values from the Sentinel-2 bands
will be created. This computation takes a bit of time, depending on your computer
memory. So be patient!!!.
# Extract spectral reflectance values from Sentinel-2 bands and training sample points
ta <- as.data.frame(extract(rvars, ta_data))
Next, we are going to plot the mean spectral reflectance from Sentinel-2 bands.
Page 10
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
# Remove the first column in order to use the actual land cover names
rownames(msr) <- msr[,1]
msr <- msr[,-1]
# Create a matrix
msr <- as.matrix(msr)
# Custom X-axis
axis(1, at=1:9, lab=colnames(msr))
# add spectral reflectance
for (i in 1:nrow(msr)){
lines(msr[i,], type = "o", lwd = 3, lty = 1, col = mycolor[i])
}
# Add the title
title(main="Spectral Profile from Sentinel-2 bands", font.main = 2)
# Add the legend
legend("topleft", rownames(msr), cex=0.8, col=mycolor, lty = 1, lwd =3, bty = "n")
Page 11
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Figure 2 shows a spectral profile that represents mean spectral reflectance. For built-up
and bare areas classes, there is a close spectral similarity in the blue (B2), green (B3),
red (B4), vegetation red edge 1 (B5), and vegetation red edge 2 (B6) bands, and narrow
separability in the other bands. However, spectral separability is relatively wide between
built-up and bare areas classes on the one hand and cropland class on the other hand,
especially for bands 2 to 7. There is also a close spectral similarity between cropland
and grass open areas because these classes are spectrally similar during the post-rainy
season (Kamusoko et al. 2021). In the test site, most crops are harvested during the
post-rainy season. As a result, cropland areas are left with a crop residue or are
covered by grass, which has the same spectral reflectance as grass/open areas. We
can also observe a relatively broader spectral separability between grass/ open areas
and woodlands classes. However, there is a close spectral similarity in the blue, green,
and red, and wide separability in the other bands for water and woodlands classes.
Note that the spectral plot represents only the mean spectral reflectance for this specific
training data in the test site and during the post-rainy season. Therefore, mean spectral
reflectance should be interpreted with caution since it depends on the particular
imagery, local conditions, the quantity and quality of training data. However, spectral
information is vital because it helps us visually assess the separability of land cover
classes. Furthermore, mean spectral reflectance helps select optimal bands and
optimize machine learning models.
Next, create a data frame with labeled training points containing all spectral reflectance
values from the training points.
# Create a data frame with training points containing all spectral reflectance values
ta_data@data = data.frame(ta_data@data,ta[match(rownames(ta_data@data),
rownames(ta)),])
Next, take a look at the whole training data set structure using the str() function.
The data frame consists of 1,911 training data points. Ten variables comprise a
response variable (Class) with six target land cover classes and nine predictor
variables. The six target land cover classes are bare areas, built-up, grass/ open areas,
Page 12
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
cropland, woodlands, and water. In contrast, the predictor variables consist of nine
Sentinel-2 bands (see Table 1).
Next, use the complete.cases() function to check if there missing values or NAs in the
training data. Alternatively, you can use anyNA() function to check whether the training
data contains missing values or not. It is important to treat or remove missing values
before model training because some machine learning models cannot handle them.
The training data does not contain missing values or NAs. However, you can use the
na.omit() function if the training data has missing values or NAs.
Machine learning experts recommend splitting the training data into training and testing
sets. In this guide, the createDataPartition() function from the caret package splits
training data into training and testing sets. We will use 60% of the training data as a
training set and 40% as a testing set. The training set will be used to find the best
model parameters and check initial training model performance based on repeated
cross-validation. The testing set will be used for checking model accuracy. First, we
need to set a pre-defined value using set.seed() function so that results will be
repeatable. Setting the seed to the same number will ensure that we get the same
result. Choose any number you like to set the seed (Kamusoko 2019).
We start by checking the descriptive statistics of the training data set using the
summary() function.
Page 13
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The output above shows the number of land cover training points (Class) in the first
column. Columns two to nine provides descriptive statistics for each Sentinel-2 band.
However, it is difficult to understand nature of the distribution or to detect outliers and
collinearity.
Next, we will examine training data using graphical visualizations. Graphs help select
appropriate methods for pre-processing, transformation and classification. We are going
to use the featurePlot(), the ggplot2(), corrplot(), and corrplot.mixed functions.
Next, create density estimation plots for each attribute by land cover class value.
Page 14
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Figure 3 summarizes the distribution of the data for each land cover class and Sentinel-
2 bands. Generally, the distributions are skewed and multimodal, indicating that the
distribution in the population is not normal.
Next, display the box and whisker plots for each attribute by class value.
Figure 4 shows the presence of outliers, especially for the bare areas, built-up, and
grass/ open areas classes. Outliers can cause problems for machine learning models.
While researchers recommend removing the outliers from the training data, this could
throw away valuable training data. Therefore, it is critical to understand the reason for
having outliers in the first place. For example, outliers in the bare and grass/ open areas
are attributed to spectral confusion between the two land cover classes.
Next, we are going to use the cor() and corrplot() functions to check the correlation
between predictor variables (Sentinel-2 bands).
Page 15
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The R output is also difficult to read. So we will display a mixed plot (numbers and plots)
to understand the correlation between the predictors.
Figure 5 shows positive correlations in dark blue, while the red color shows negative
correlations. The color intensity and the size of the square are proportional to the
correlation coefficients. Bands 2 and 3, 3 and 4, 6 and 7, 7 and 8, and 11 and 12 are
highly correlated (correlation coefficient is above 0.9). Machine learning experts
recommend removing the highly correlated bands or performing a principal component
analysis to reduce redundancy and improve computation efficiency.
Page 16
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Cross-validation refers to the subdivision of training data into several mutually exclusive
groups. For the k-fold cross-validation, the algorithm partitions the initial training data
randomly into k mutually exclusive subsets or folds. Training and validation are then
performed k times. The k-fold cross-validation repeats the model building based on
different subsets of the available training data. Then it evaluates the model only on
unseen data during model building. Kuhn and Johnson (2016) recommend the k value
be 5 or 10 to minimize bias.
For training the KNN model, we specify the train() function. First, we define "Class~.,"
which denotes a formula for using all attributes in the model. The "Class" is the
response variable, and the "data" contains the Sentinel-2 band reflectance values. The
Page 17
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The output shows that the algorithm used 1,149 training samples for training. There are
nine predictors (that is, nine Sentinel-2 bands) and six land cover classes (bare areas,
built-up, cropland, grass/ open areas, woodlands, and water). The cross-validation
Page 18
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
results show the sample sizes and final values that were selected for the best model.
The best model has a kmax = 9 and a distance = 2.
We can also display the model training performance based on overall accuracy using
the command below.
Figure 6 shows the repeated cross-validation profile for the KNN model, which is based
on overall accuracy. The model achieved 76% accuracy with k is 9.We observe a slight
decrease from "kmax =9" to "kmax=5", which means increasing k values does not affect
the accuracy.
Figure 6. Repeated cross-validation (based on overall accuracy) profile for the KNN
model
Page 19
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Figure 7 shows that the most significant band contribution is observed for the cropland,
grass/open areas, and woodland classes. The band contribution for the built-up and
bare areas classes is moderate, while the water class is relatively low. Although the
KNN model variable importance provides essential insights, the variable importance
values are suspiciously high for the grass/open areas, cropland, and woodland classes.
In addition, band 8 (NIR) has a low contribution for the woodlands class, which
contradicts the spectral profile (Figure 2). It is not easy to understand how the KNN
model came up with the variable importance ranking. Therefore, we should interpret the
variable importance metrics with caution. Strobl et al. (2008) reported that variable
importance metrics are biased when the predictor variables (Sentinel-2 bands) are
highly correlated, which leads to selecting suboptimal predictor variables.
We will assess the KNN model performance using new testing data and the
confusionMatrix() function. The KNN model results are used for prediction and building
a confusion matrix.
Page 20
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
##Overall Statistics
## Accuracy : 0.7651
## 95% CI : (0.7333, 0.7948)
## No Information Rate : 0.3885
## P-Value [Acc > NIR] : < 2.2e-16
## Kappa : 0.6755
The overall accuracy is 76.5%, while the "No Information Rate" is about 39%. The
producer's accuracy (sensitivity) is lower than the user's accuracy (Pos Pred Value) for
bare areas, indicating an underestimation. Concerning the built-up class, the producer's
accuracy (sensitivity) is higher than the user's accuracy (Pos Pred Value), suggesting
that the KNN model overestimated built-up areas. In contrast, the producer's accuracy
(sensitivity) is lower than the user's accuracy (Pos Pred Value) for the cropland class,
which indicates omission errors and thus an underestimation of the cropland areas.
However, the producer's accuracy is substantially higher than the user's accuracy for
the grass/ open areas class, indicating an overestimation. Generally, the woodland and
water classes have high individual accuracies. However, bare areas and cropland
classes have low individual accuracies, suggesting spectral confusion and class
imbalance problems. A closer look at the confusion matrix shows high misclassification
errors, especially between bare areas and built-up classes and grass/ open areas and
cropland classes.
Page 21
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Next, use the KNN model to classify the Sentinel-2 imagery. We will use the predict()
function from the raster package to perform the prediction or classification. We are
going to pass on two arguments. The first parameter is the trained model ("knnFit"), and
the second parameter, "new data" holds the testing data ("testing"). The predict()
function will return a list that we will save as "pred_knnFit."
We will display the land cover map using the levelplot() function (Figure 8).
# Alternatively, you can use plot, splom or ggplot to display the land cover map
# plot(lc_knn, col = c("grey","red","yellow","green", "blue", "darkgreen"))
# spplot(lc_knn, col.regions=c("grey","red","yellow","green", "blue", "darkgreen"))
Figure 8. Land cover classification based on post-rainy season Sentinel-2 imagery and
a KNN model
Page 22
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Next, calculate the land cover class statistics produced using the KNN model.
Finally, save the land cover map so that you can visualize it in GIS software.
Page 23
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
# Train a RF model
timeStart <- proc.time()
set.seed(27)
rf_model <- train(Class ~.,data = training,
method = "rf",
trControl = fitControl,
prox = TRUE,
fitBest = FALSE,
localImp = TRUE,
returnData = TRUE)
proc.time() - timeStart
## user system elapsed
343.129 0.424 343.514
Next, check the RF model performance. Again we use the print() and plot() functions.
Page 24
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
##
## mtry Accuracy Kappa
## 2 0.7777824 0.6893787
## 5 0.7784873 0.6916921
## 9 0.7744061 0.6870036
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 5.
The RF model used 1,149 training samples for training. The model achieved 78%
accuracy with mtry is 5. Note that the RF model accuracy is slightly better than the KNN
model accuracy (76%).
Next, display the RF model training performance based on overall accuracy (Figure 9).
# Display the RF model plot
plot(rf_model)
Page 25
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The output shows a confusion matrix for the best model (after cross-validation). The RF
algorithm used 500 decision trees and selected five bands from nine predictor Sentinel-
2 bands. The out-of-bag (OOB) estimate of error rate is 22%, indicating poor training
model performance. Note that the land cover class errors for the bare areas and
cropland are very high, which is not good. The RF model is severely affected by
spectral confusion between bare areas and built-up, cropland, and grass/ open areas.
For example, the RF model misclassified 35 built-up pixels as bare areas.
Page 26
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Figure 10 shows that the most significant band contribution is observed for the
grass/open areas, built-up, and cropland classes. The band contribution for the bare
areas and woodland classes is moderate, while the water class is relatively low. Again,
the variable importance metrics should be interpreted with caution because the
Sentinel-2 bands are highly correlated (Figure 5). The variable importance values are
suspiciously high for the grass/open areas. In addition, band 8 (NIR) has no contribution
to the woodlands class, suggesting serious model training problems.
Next, assess the RF model performance using new testing data. We are going to use
the RF model results for prediction and then build a confusion matrix.
## Overall Statistics
## Accuracy : 0.7848
## 95% CI : (0.7539, 0.8135)
## No Information Rate : 0.3885
## P-Value [Acc > NIR] : < 2.2e-16
## Kappa : 0.7002
## Mcnemar's Test P-Value : NA
## Statistics by Class:
## Class: Bare areas Class: Built-up Class: Cropland Class: Grass/ open Class: Water Class: ## Woodlands
## Sensitivity 0.45977 0.9155 0.58654 0.8145 0.93750 0.81579
## Specificity 0.97630 0.8991 0.94833 0.8872 1.00000 0.99171
## Pos Pred Value 0.71429 0.8522 0.64211 0.7469 1.00000 0.83784
## Neg Pred Value 0.93343 0.9437 0.93553 0.9213 0.99866 0.99034
## Prevalence 0.11417 0.3885 0.13648 0.2900 0.02100 0.04987
## Detection Rate 0.05249 0.3556 0.08005 0.2362 0.01969 0.04068
## Detection Prevalence 0.07349 0.4173 0.12467 0.3163 0.01969 0.04856
## Balanced Accuracy 0.71803 0.9073 0.76743 0.8509 0.96875 0.90375
The confusion matrix shows that the overall classification accuracy is 78%, slightly
higher than the previous KNN model. However, we observe serious classification
problems. The producer's accuracy (sensitivity) is significantly lower than the user's
accuracy (Pos Pred Value) for the bare areas, indicating an underestimation. However,
the producer's accuracy (sensitivity) is higher than the user's accuracy (Pos Pred Value)
for the built-up class indicating that the RF model overestimated built-up areas. In
contrast, the cropland class has a substantial low producer's and a high user's
Page 27
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
accuracies. For the grass/ open areas class, the producer's accuracy is relatively higher
than the user's, indicating high commission errors. Concerning the woodland and water
classes, the producer's accuracy is slightly higher than the user's accuracy. Generally,
the RF model had difficulty separating built-up and bare areas and cropland and grass/
open areas classes.
Next, we will use the RF model to classify the Sentinel-2 imagery and create a land
cover map using the predict() function.
We will display the land cover map using the levelplot() function (Figure 11).
Figure 11. Land cover classification based on post-rainy season Sentinel-2 imagery and
an RF model
Page 28
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Figure 11 shows that the RF model overestimated built-up areas and underestimated
bare areas. For example, there are many misclassified built-up pixels in the northwest
and south parts of the city. In addition, the RF model misclassified some bare areas,
cropland, and grass/open areas in the south and south-western part of the city as built-
up areas.
Next, calculate the land cover class statistics produced using the RF model.
Finally, save the land cover map so that you can visualize it in GIS software.
Page 29
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Next, check the SVM model performance. Again we use the print() and plot() functions.
Page 30
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
## C Accuracy Kappa
## 0.25 0.7309417 0.6168089
## 0.50 0.7632173 0.6663508
## 1.00 0.7905587 0.7075065
## 2.00 0.8049228 0.7288197
## 4.00 0.8138859 0.7419757
## Tuning parameter 'sigma' was held constant at a value of 0.3622052
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.3622052 and C = 4.
The SVM model used 1,149 training samples for training. There are nine predictors (that
is, nine Sentinel-2 bands) and six land cover classes.
Next, display the SVM model training performance based on overall accuracy (Figure
12).
Figure 12. Repeated cross-validation (based on overall accuracy) profile for the same
model
Figure 12 shows that the optimal model has a Cost value of about four and an accuracy
of 80%, which is relatively good.
Page 31
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The SVM model selected 667 support vectors from a total of 1,149 training samples.
The training error rate is relatively low.
Figure 13 also shows that the most significant band contribution is observed for the
grass/open areas, cropland, and woodland classes. We noticed the same pattern with
the KNN model variable importance. The band contribution for the built-up, bare areas,
and water classes is moderate. While the SVM model variable importance provides
essential insights, it is difficult to understand how it came up with the variable
importance ranking. Again, we should be cautious interpreting variable importance
when the spectral bands are highly correlated.
Page 32
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Next, we are going to assess the SVM model performance using new testing data. We
are going to use the SVM model results for prediction and then build a confusion matrix.
Overall Statistics
Accuracy : 0.8123
95% CI : (0.7828, 0.8395)
No Information Rate : 0.3885
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.7392
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: Bare areas Class: Built-up Class: Cropland Class: Grass/ open Class: Water Class:
Woodlands
Sensitivity 0.56322 0.9324 0.55769 0.8643 0.93750 0.78947
Specificity 0.97481 0.9378 0.95593 0.8835 1.00000 0.99309
Pos Pred Value 0.74242 0.9049 0.66667 0.7520 1.00000 0.85714
Neg Pred Value 0.94540 0.9562 0.93185 0.9409 0.99866 0.98900
Prevalence 0.11417 0.3885 0.13648 0.2900 0.02100 0.04987
Detection Rate 0.06430 0.3622 0.07612 0.2507 0.01969 0.03937
Detection Prevalence 0.08661 0.4003 0.11417 0.3333 0.01969 0.04593
Balanced Accuracy 0.76902 0.9351 0.75681 0.8739 0.96875 0.89128
The overall accuracy is about 81%, which is higher than the KNN and RF models. In
terms of individual accuracies, the producer's accuracy (sensitivity) is also lower than
the user's accuracy (Pos Pred Value) for the bare areas, indicating n underestimation.
However, the producer's accuracy (sensitivity) is higher than the user's accuracy (Pos
Pred Value) for the built-up class indicating that the SVM model overestimated built-up
areas. In contrast, the cropland class has a substantially low producer's and a high
user's accuracy, meaning omission errors. For the grass/ open areas class, the
producer's accuracy is relatively higher than the user's, indicating high commission
errors. Concerning the woodlands class, the producer's accuracy is similar to the user's
accuracy. The water class has the highest individual accuracy, indicating that the SVM
model produced an optimal classification. While the SVM model had difficulty separating
built-up and bare areas and cropland and grass/ open areas classes, there is a slight
improvement in accuracy.
Page 33
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Next, we will use the SVM model to classify the Sentinel-2 imagery and create a land
cover map using the predict() function.
We will display the land cover map using the levelplot() function (Figure 14).
Figure 14. Land cover classification based on post-rainy season Sentinel-2 imagery and
an SVM model
Figure 14 also shows the overestimation of built-up area and underestimation of bare
areas. There are many misclassified built-up pixels in the northwest and south parts of
the city. In addition, the SVM model misclassified some bare areas, cropland, and
grass/open areas in the south and south-western part of the city as built-up areas.
Next, calculate the land cover class statistics produced using the SVM model.
Page 34
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Finally, save the land cover map so that you can visualize it in GIS software.
Page 35
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
randomForest = rf_model,
kernlab = svm_model))
resamps_ml
## ## Call:
## resamples.default(x = list(kknn = knnFit, randomForest = rf_model, kernlab
## = svm_model))
## ## Models: kknn, randomForest, kernlab
## Number of resamples: 100
## Performance metrics: Accuracy, Kappa
## Time estimates for: everything, final model fit
Finally, check the summary statistics and then display the results in graphic form.
Figure 15 shows that the SVM model (Kernlab) had the highest training accuracy,
followed by the RF and KNN models. However, there is a slight difference in training
model performance between the KNN and RF models.
Page 36
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Page 37
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
not understand how the Sentinel-2 bands affected the land cover or the nature of the
effects (positive or negative).
Explainable machine learning methods such as accumulated local effects (ALE) help us
understand the nature and direction of effects. ALE is a model-agnostic method (Apley
and Zhu 2020) that infers the relationship between the predicted land cover and the
spectral bands. In particular, the ALE values answer this question: conditional on a
given value, what is the relative effect of changing the predictor variable value on the
prediction? ALE averages the changes in the predictions and accumulates them over
the local grid. As a result, the effect of a particular factor can be evaluated at different
levels.
This guide will produce ALE plots because they are more robust to predictor collinearity
(Apley and Zhu 2020). Furthermore, ALE plots are faster to compute than partial
dependence (PD) plots (Molnar 2019; Apley and Zhu 2020). The ALE plots will help us
understand how the predictor variables (Sentinel-2 bands) influence the machine
learning models' predicted land cover (response variable). More importantly, ALE plots
can also help us gain insights into the underlying predictive probability model based on
our domain knowledge (e.g., environmental remote sensing).
First, we will create predictor objects that hold the machine learning models and the
data. The new objects (predictor_knn, predictor_rf, and predictor_svm) are created by
calling Predictor$new(). For each observation of a specific predictor such as Sentinel-2
band 2 (X-value), the ALE plot calculates the average change in the target prediction
(land cover) within a local multidimensional window.
# Create predictor objects for each machine learning model
set.seed(27)
X = training[which(names(training) != "Class")]
predictor_knn = Predictor$new(knnFit, data = X, y = training$Class)
X = training[which(names(training) != "Class")]
predictor_rf = Predictor$new(rf_model, data = X, y = training$Class)
X = training[which(names(training) != "Class")]
predictor_svm = Predictor$new(svm_model, data = X, y = training$Class)
For simplicity, we will compute feature effects and create ALE plots for each band for all
the models. To do that, we will create an object called "effect_" that stores results from
the computation of the feature effects for each model. Note that you can also compute
feature effects and create ALE plots for all bands at once.
Page 38
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Let’s start by computing feature effects and creating (ALE) plots for band 2 (blue band).
Figure 16 shows the ALE plots for the KNN, RF, and SVM models. The ALE plots are
centered at zero, representing the mean model prediction across all values of the
spectral band variable. Values below zero have a negative effect on the predicted land
cover (decreased land cover prediction), while values above zero have a positive effect
(increased land cover prediction). ALE values on the y axis represent the main feature
effect at a specific spectral reflectance value (on the x-axis) on the predicted land cover
compared to the overall prediction. For example, let us observe a point in figure 16 with
an ALE value y = 0.3 and a spectral reflectance value x = 0.3. This example shows that
for a spectral reflectance value equal to 0.3, the predicted probability of land cover
increases by 30% compared to the average prediction of land cover.
Figure 16 shows that all models have initial positive ALE values associated with low
spectral reflectance and constant or negative ALE values related to high spectral
reflectance for the bare areas. In contrast, all models show that band 2 has a strong
positive effect on the prediction of the built-up class. The RF model shows an initial
steep increase followed by a flat and constant average prediction. Concerning the
cropland class, all models show L-shaped ALE plots, with high positive ALE values
associated with low spectral reflectance and low negative ALE values associated with
high spectral reflectance. For the grass/ open area class, all models have different
effects. However, the RF model shows an initial increase followed by a flat and constant
average prediction. In general, band 2 (B2) does not affect water and woodland
predictions that much. This effect contradicts the variable importance values, which
show that B2 had the highest contribution, suggesting model training problems. We
observe that ALE effects for all land cover classes except the built-up class peaks at
spectral reflectance vary between 0 and 0.1, consistent with the spectral profile (see
Figure 2). However, the ALE effect of the built-up class for the SVM model peaks at
spectral reflectance of about 0.4, which is not consistent with the spectral profile. The
ALE effect reflects an overestimation of the built-up areas due to the high-class
imbalance (note that built-up areas constitute about 39% of the training pixels).
Page 39
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Figure 16. Accumulated local effects (ALE) plots for all models (band 2). Plots indicate
how predictions in a given model changed on average concerning different values in
local value-areas of the respective band. ALE values are centered around zero. Note
the distinct vertical axes scales for the other machine learning models.
Next, let’s compute feature effects and creating (ALE) plots for band 3 (green band).
Page 40
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The ALE plots for all models show a complex and non-linear relationship between bare
areas and band 3 (Figure 3). For the KNN and SVM models, the average prediction
rises with increasing spectral reflectance but falls again above the spectral reflectance
of 0.1. However, the RF model shows an initial increase followed by a flat and constant
average prediction. In contrast, the KNN model shows a strong positive effect on the
prediction of the built-up class. The ALE effect and plot shape are different for SVM and
RF models, indicating non-linear relationships.
Concerning the cropland class, all models have different ALE effects. However, the ALE
plot shape and direction are almost similar for all models. For example, we observe high
positive ALE values are associated with very low spectral reflectance, and constant ALE
values are associated with high spectral reflectance. For the grass/ open area class, all
models show an L-shaped ALE plot. The high positive ALE values are related to low
spectral reflectance, while low negative ALE values are associated with high spectral
reflectance. For water and woodlands classes, all models fail to capture the effects.
Generally, we expect band 3 (green band) to have more impact on the woodlands
because the green band helps detect peak vegetation and assess vegetation. The small
training data set for the woodlands class was probably the leading cause of the negative
ALE effect. The predicted built-up class for the KNN model also peaks at spectral
reflectance of about 0.4, which is not consistent with the spectral profile (see Figure 2).
Next, let's compute feature effects and create (ALE) plots for band 4 (red band).
Page 41
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The ALE plots for band 4 (red band) reveal a strong negative effect on bare areas
predictions (Figure 18) for the KNN and SVM models. However, the ALE plot for the RF
model shows a small and null effect, suggesting model training problems. In contrast,
the ALE plots for the KNN and SVM models show a strong and positive association with
built-up area prediction. The average prediction of built-up areas increases with spectral
reflectance, which is the expected behavior. Generally, the red band helps discriminate
between artificial objects and vegetation.
Page 42
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
this is followed by a decrease in average prediction for all models. For the water class,
all models fail to capture the effects. However, L-shaped ALE plots for the woodlands
class suggest poor prediction for all models.
Next, let's compute feature effects and create (ALE) plots for band 5 (vegetation red
edge 1 band).
The ALE plots for the KNN and SVM models show a non-linear effect on bare areas
predictions (Figure 19). Interestingly band 5 had no predictive effect on bare areas for
the RF model. In contrast, the ALE plots reveal a positive impact on built-up areas for
the KNN and SVM models. However, there is no effect on the built-up class. Concerning
Page 43
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
the cropland class, all models have variable effects. Nonetheless, the ALE effects peak
at a spectral reflectance between 0.1 and 0.2. The ALE effect is consistent with the
spectral profile (see Figure 2) because reflectance in the red-edge band is sensitive to
vegetation status. For the grass/ open area class, all models show an initial steep
increase in average prediction. However, this is followed by a substantial decrease in
average prediction with an increase in spectral reflectance. As was observed before, all
models fail to capture the effects for the water class. Concerning the woodlands class,
the ALE plots exhibit a steep decrease in average prediction, characterized by an L-
shaped plot.
Next, let's compute feature effects and create (ALE) plots for band 7 (vegetation red
edge 3 band).
Page 44
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The ALE plots for band 7 (vegetation red edge 3 band) positively affect bare areas
predictions (Figure 20) for the KNN model. In contrast, we observe a negative effect on
bare areas predictions for the RF and SVM models. All models reveal variable effects
on built-up predictions. However, we notice that average prediction rises with increasing
spectral reflectance but falls and flattens for KNN and RF models, respectively. On the
other hand, the SVM model shows an initial decrease followed by a constant and sharp
increase in average prediction for built-up areas. For the KNN model, we observe a
linear and negative effect on cropland predictions. While we discern non-linear positive
and negative effects on cropland predictions for the SVM model, the RF model shows a
positive effect that saturates at a spectral reflectance of 0.2.
Concerning the grass/ open area class, all models show complex and non-linear effects.
However, the ALE plot shape and patterns are similar for the KNN and SVM models.
The RF model fails to capture the effects for the water class, while the KNN and RF
models show a decreasing and negative effect with an increase in spectral reflectance.
For the woodlands class, all models exhibit different ALE plots that do not provide
meaningful information.
Next, let's compute feature effects and create (ALE) plots for band 8 (NIR).
The ALE plots for band 8 (NIR) exhibit a positive non-linear effect on bare areas
predictions for all models (Figure 21). However, the ALE plots reveal a negative effect
on built-up predictions for all models. Concerning the cropland class, all models show
high ALE variability. However, we see positive effects that peak at a spectral reflectance
of 0.2 for the RF and SVM models. The ALE effect is consistent with the spectral
reflectance of the NIR during the post-rainy season (see Figure 2). All models have
positive ALE values for the grass/ open areas at a spectral reflectance of 0.2.
Nonetheless, we see constant and decreasing ALE values associated with high spectral
reflectance. For water, the ALE plots do not provide helpful information for the KNN and
RF models. However, water predictive probability drops rapidly with increasing spectral
reflectance and then levels off for the SVM model. Contrary to expectations, the ALE
plots for all models fail to capture the effects on woodlands. The ALE effect is
inconsistent with spectral profile (Figure 2) and environmental remote sensing principles
because spectral reflectance is high for vegetation in the NIR band. While there might
Page 45
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
be many reasons for the failure to capture the effects on woodlands, the small number
of training sample areas for this class might be the leading cause.
Next, let's compute feature effects and create (ALE) plots for band 11 (SWIR 1).
Page 46
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Figure 22 shows that the average prediction of bare areas rises with increasing spectral
reflectance for the KNN and SVM models. However, the RF model shows a flat and
sinusoidal effect, indicating that the SWIR1 band does not affect the average prediction
of bare areas. In contrast, the ALE plots reveal a negative effect on built-up predictions
for all models. We observe a positive effect on cropland predictions for the KNN and RF
models concerning the cropland class. However, the cropland predictive probability for
the SVM model increases positively and rapidly drops with an increase in spectral
reflectance. We expect this behavior since crops are still healthy in April and then
mature for harvest during June. Generally, reflectance in SWIR 1 band is sensitive to
vegetation moisture content and drought analysis, suggesting that the KNN and RF
models failed to capture the effect for cropland. For all models, the grass/ open area
predictive probability increases positively and decreases with spectral reflectance.
Concerning water, the KNN and RF models fail to capture the effects. However, water
predictive probability drops rapidly with increasing spectral reflectance and then levels
off for the SVM model. For the woodlands, all models capture different effects.
However, all the models have high ALE peaks at low spectral reflectance, consistent
with the spectral profile (Figure 2).
Finally, let's compute feature effects and create (ALE) plots for band 12 (SWIR2).
Page 47
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Figure 23 shows that the ALE plots for band 12 (SWIR2) exhibit a non-linear effect on
the prediction of bare areas. However, the SVM model decreases and negatively affects
bare areas predictions, consistent with the spectral profile (Figure 2). Similarly, we
observe non-linear effects for the built-up predictions for the KNN and RF models.
However, the ALE plots reveal a strong positive effect on built-up predictions for the
SVM model, contrary to the spectral reflectance profile (see Figure 2). Concerning the
cropland class, a positive effect on cropland predictions is observed for KNN and RF
models. However, the cropland predictive probability for the SVM model decreases with
an increase in spectral reflectance, which agrees with the spectral profile (Figure 2).
The KNN and RF models failed to capture the effect for cropland. For all models, the
grass/ open area predictive probability increases positively and decreases with
Page 48
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
increased spectral reflectance, which is also expected during the post-rainy season.
Concerning water, all models fail to capture the effects. For the woodlands, we see a
negative effect that flattens out gradually for all models.
2.6.3 Summary
The ALE results revealed that the SVM model has the highest average prediction
followed by the KNN model, while the RF has the lowest average prediction. In general,
the ALE plots highlighted essential insights. First, most ALE plots revealed non-
monotonic and non-linear associations between Sentinel-2 bands and the predicted
land cover. Second, all models exhibited positive ALE values associated with low
spectral reflectance and constant or negative ALE values related to high spectral
reflectance. Therefore, low magnitude spectral reflectance values are associated with
higher than an average prediction, while high magnitude spectral reflectance values are
associated with lower than average prediction. Third, the results revealed that ALE
effects for bare areas, cropland, and grass/open areas peaked at spectral reflectance
that varies between 0 and 0.1, consistent with the spectral profile (see Figure 2). In
contrast, some ALE effects peaked at spectral reflectance of about 0.4 for the SVM and
KNN models, suggesting an overestimation of the built-up areas.
Fourth, ALE plots highlighted the inherent problem of inadequate and imbalanced
training data. The class imbalance is a common problem (especially in remote sensing),
which has profound implications for tuning and training machine learning models. In
most cases, inadequate and imbalanced training data is the leading cause for biased
and inaccurate machine learning models. In this guide, the class imbalance leads to
overestimating built-up areas and underestimating water and woodland areas.
Therefore, we should not interpret the constant ALE values for the woodlands class as
evidence of the absence of an effect (strangely, the NIR band did not affect the
prediction of woodlands). Instead, it results from a small training data set for the
woodlands class, which fails to capture the spectral variability of the woodlands.
Therefore, remote sensing researchers should make more efforts to collect more
training data, especially for small land cover classes. In addition, remote sensing
researchers can also use upsampling or synthetic minority oversampling technique
(SMOTE) to minimize the class imbalance problem.
Fifth, the ALE plots revealed that all the models had some uncertainty due to spectral
confusion between cropland and grass/open areas (see spectral profile in Figure 2). For
example, ALE effects are abysmal for some classes such as cropland and bare areas,
especially for the RF model, which had high-class errors for bare areas and cropland.
The ALE effects suggest that the RF model was affected by overfitting and
multicollinearity. Interestingly, the ALE plots indicated simple KNN model was better
than the advanced RF model. Therefore, starting with simple machine learning models
is always essential rather than using shiny and advanced machine learning models.
Last but not least, we should analyze the ALE plots with caution because most of the
Sentinel-2 bands were highly correlated. Although ALE plots account for predictor
collinearity, they can be misleading for highly correlated spectral bands. Generally, ALE
searches for small contributions of individual spectral bands in an isolated way. In
Page 49
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Page 50
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
The ALE results revealed that model-agnostic approaches provide more insights than
model-specific ones. Furthermore, the ALE results highlighted machine learning model
limitations, which gives more opportunities for further improvement. Although the results
and discussions in this guide are far too limited to form the basis for firm conclusions,
they warrant further research, especially for land cover classification. However, this
guide only provided a basic workflow for building general machine learning models and
explainable machine learning models using ALE from the iml package. Therefore, there
is a need to explore other methods such as individual conditional expectation (ICE)
plots, Shapley values, and local interpretable model-agnostic explanations (LIME). In
addition, there are different packages such as DALEX, auditor, ExplainPrediction,
fastshap, and lime in R or python. Therefore, learning these methods and packages is
crucial to explain machine learning models for land cover classification.
References
Apley DW, Zhu J (2020) Visualizing the effects of predictor variables in black box
supervised learning models. J. Roy. Stat. Soc. B 82 (4), 1059-1086.
https://doi.org/10.1111/rssb.123777
Anchang JY, Prihodko L, Ji W, Kumar SS, Ross CW, Yu Q, Lind B, Sarr MA, Diouf AA
and Hanan NP (2020) Toward Operational Mapping of Woody Canopy Cover in
Tropical Savannas Using Google Earth Engine. Front. Environ. Sci. 8:4. DOI:
10.3389/fenvs.2020.00004
Biecek, P and Burzykowski, T (2020) Explanatory Model Analysis: Explore, Explain, and
Examine Predictive Models. With examples in R and Python.
https://ema.drwhy.ai/preface.html
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin
classifiers. Proceedings of the fifth annual workshop on Computational learning
theory. ACM, p.144-52. Available online: http://dl.acm.org/citation.cfm?id=130401.
Accessed on 22 February 2014)
Page 51
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Page 52
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Appendix
Appendix 1. Resources
Several educational and training resources are available to learn about remote sensing,
general machine learning, R, and explainable and interpretable machine learning.
4. About Quick-R
https://www.statmethods.net/
Grass/open areas Grass cover, open grass areas, golf courses, and parks.
(Gr)
Page 53
Explainable Machine Learning for Land Cover Classification: An Introductory Guide
Recent Books:
(1) Kamusoko, C. (in press). Optical and SAR Remote Sensing of Urban Areas: A
Practical Guide. Springer.
(2) Kamusoko, C. (2019). Remote Sensing Image Classification in R. Springer.
Page 54