2 Automated Wheat Diseases Classification Framework Using
2 Automated Wheat Diseases Classification Framework Using
Article
Automated Wheat Diseases Classification Framework Using
Advanced Machine Learning Technique
Habib Khan 1 , Ijaz Ul Haq 1 , Muhammad Munsif 1 , Mustaqeem 2 , Shafi Ullah Khan 3 and Mi Young Lee 1, *
Abstract: Around the world, agriculture is one of the important sectors of human life in terms of food,
business, and employment opportunities. In the farming field, wheat is the most farmed crop but
every year, its ultimate production is badly influenced by various diseases. On the other hand, early
and precise recognition of wheat plant diseases can decrease damage, resulting in a greater yield.
Researchers have used conventional and Machine Learning (ML)-based techniques for crop disease
recognition and classification. However, these techniques are inaccurate and time-consuming due
to the unavailability of quality data, inefficient preprocessing techniques, and the existing selection
criteria of an efficient model. Therefore, a smart and intelligent system is needed which can accurately
identify crop diseases. In this paper, we proposed an efficient ML-based framework for various kinds
of wheat disease recognition and classification to automatically identify the brown- and yellow-rusted
diseases in wheat crops. Our method consists of multiple steps. Firstly, the dataset is collected from
different fields in Pakistan with consideration of the illumination and orientation parameters of the
Citation: Khan, H.; Haq, I.U.; Munsif, capturing device. Secondly, to accurately preprocess the data, specific segmentation and resizing
M.; Mustaqeem; Khan, S.U.; Lee, M.Y.
methods are used to make differences between healthy and affected areas. In the end, ML models
Automated Wheat Diseases
are trained on the preprocessed data. Furthermore, for comparative analysis of models, various
Classification Framework Using
performance metrics including overall accuracy, precision, recall, and F1-score are calculated. As a
Advanced Machine Learning
result, it has been observed that the proposed framework has achieved 99.8% highest accuracy over
Technique. Agriculture 2022, 12, 1226.
https://doi.org/10.3390/
the existing ML techniques.
agriculture12081226
Keywords: artificial intelligence; computer vision; machine learning; precision agriculture; wheat
Academic Editors: Maciej
diseases
Zaborowicz and Hongbin Pu
Around the globe, wheat is the most important ingredient of food; therefore, it is the
most popular cereal which has been cultivated by farmers around the world [6]. According
to the Food and Agriculture Organization (FAO) of the United Nations [7], in the years
2018 and 2019, wheat produced almost 28% of total global cereal production from an
estimated area of 215 million hectares. However, the demand for wheat is far higher than
the production of wheat cereal, specifically in developing countries [2]. Different factors
are involved in the low production of wheat, one and most important factor are diseases,
which can cause 15–20% losses in global wheat production per annum [8]. Wheat leaf
common diseases such as leaf rust and yellow rust are the most widespread diseases in
wheat plants and can cause huge losses of food and economic activities deflation if they
remain uncontrolled [9]. Further, most farmers, especially in developing countries, depend
on agriculture experts to identify and diagnose the disease [10]. The quick response to
disease is very crucial to stop it from spreading to the entire plant and even the entire field,
particularly in wheat plants.
Wheat plant disease detection and identification are always a challenge for farmers
to look after the whole field and visit and examine each plant by themselves or through
an agriculture expert. Because of the density of wheat crops in the field, it is very time-
and resource-consuming to manually monitor the whole field [11]. Due to recent advance-
ments in computer technology such as human-computer interaction [12,13] and AI [14–19],
intelligent systems can help out farmers in the field identify wheat leaf diseases, using
different automatic methods, such as Computer Vision (CV) and AI-based methods [20].
The development of a robust ML and CV-based system for wheat disease recognition
that is capable to work in diverse field conditions with accurate performance has various
challenges and some of them are discussed below.
The first and very important challenge for plant disease recognition, especially for
wheat using CV and ML, is data collection of diseases and the creation of a challenging
database which have a variety of datasets such as images captured with various angles and
occlusion [21–23]. Current freely available datasets of plant disease include [24] which in-
cludes the images of wheat diseases, another one is reported in [25] which has datasets with
variations of angles and illumination. So, data acquisition from fields with wheat diseases
during different conditions is very essential for accurate automatic disease identification.
The second challenge is the segmentation of leaves from busy backgrounds and re-
gions of interest which has the potential to improve model accuracy [22,26,27]. Most of the
existing methods for wheat disease recognition use manual methods such as manual crop-
ping of images after capturing by farmers. Most inexperienced farmers cut the important
region in an image by cropping and leave the unessential region of the captured image
which badly affects the accuracy of the system. On the other hand, the segmentation of
interesting regions in leaves can improve the accuracy of the disease identification system.
The third one is the selection of proper methods for feature extraction and classification
of extracted features [28]; most of the existing techniques for wheat leaf disease detection
used rare feature extractors of CV and ML models without any investigation of other
descriptors and models’ performance. For example, [29] used only color and texture
descriptors with Random Forest (RF) classifiers for wheat disease classification. Further, [30]
used only the Maximum Margin Criterion (MMC) method for the severity and classification
of wheat leaf diseases. Thus, feature extractors and classifiers for classification should be
selected based on comparative performance.
In this paper, we proposed an ML-based wheat disease recognition framework and
achieved the highest recognition accuracy after a comparative analysis of various feature
extraction and classification algorithms. The major contributions of our work are as follows:
• We proposed a machine learning-based framework for the detection of salient cues
regarding wheat diseases and accurately classify them into yellow and brown rust. Our
model utilized a masked-based segmentation technique that automatically removes
the background, noises, and identifies healthy, unhealthy wheat crops, and determines
Agriculture 2022, 12, 1226 3 of 20
the affected and unaffected area of the crop. The proposed framework is lightweight
and automatically identifies the wheat crop diseases with a high recognition rate.
• A new dataset for wheat disease classification is introduced. The dataset is collected
from different wheat fields in various regions of Peshawar, and Dir Pakistan. We
focused on two categories of diseases, having a total of three classes, i.e., brown rust,
severe yellow, and healthy leaves, respectively. The dataset will be publicly available
to the research community.
• A comparative analysis has been conducted among ML techniques for wheat disease
recognition. The proposed framework achieved 98.8% accuracy for the wheat diseases
classification. Due to good generalization and a high recognition rate, the system can
be employed in various real-time industrial applications.
The rest of the paper is arranged as follows. Section 2 discusses the related work in
detail, followed by Section 3, where the proposed system is comprehensively discussed
with implementation details. Section 4 illustrates the experimental results in detail. The
outcomes and possible future research directions are presented in Section 5.
2. Related Work
Around the globe, researchers are striving to develop significant guidance and insights
to help farmers make better decisions and take actions accordingly. For the last two
decades, advancement in technology such as AI and computing has attained the attention
of researchers. To produce a more effective system for actual disease diagnosis and to
categorize diseases with high accuracy, there are many alternative schemes with diverse
combinations that may be used. All these conventional (statistical and image processing)
techniques as well as ML-based methods for the plant and leave diseases, specifically wheat
disease recognition and classification. Different researchers had contributed to the different
aspects of precision agriculture [31]. Various advances in digital image processing and
ML methods were used for crop leaf disease detection and recognition using their leaf
images [28,32–34]. The literature can be categorized into two subsections. This section
discussed comparatively less intelligent ways of precision agriculture such as pure image
processing or CV-based disease classification and more intelligent ones such as ML-based
task handling during precision agriculture.
four classes such as the healthy leaf, common rust, leaf spot, and leaf blight. The 80/20 ratio
was considered for training/testing. They obtained the most accurate classification using
SVM when color, texture, and morphology features were combined. As a result, their model
has achieved 95.63% accuracy. Hossain et al. [38] developed an automated SVM-based
ML model for the recognition and classification of tea leaf diseases in Bangladesh. Their
dataset consisted of three classes, two for diseases (brown blight disease and algal leaf
disease), including a healthy leaf class. They used statistical features for feature extraction
for the given model. The suggested technique was able to classify more accurately with 93%
accuracy. However, their proposed framework has many limitations which include limited
samples in the dataset, using only pure statistical features, and their proposed framework
has low accuracy which can be improved by using more models and a big dataset.
they used three main categories: texture, color, and shape. Their approach achieved up
to 93.33% accuracy. However, their accuracy can be improved further by using a dataset
that has variations (illumination and different other angles) and state-of-the-art models. In
continuation of rice plants diseases identification, Ahmed et al. [47] introduced a model
for rice leaf disease detection using ML techniques. This study focused on three of the
most widespread rice plant diseases which include brown spot, leaf smut, and bacterial
leaf blight. Their dataset consists of 480 images and the 90/10 ratio was considered for
training/testing. They compared four ML techniques (logistic regression, decision tree,
KNN, and Naïve Bayes) with each other, and it was found that decision tree comparatively
performed 97.91% performance. However, they used statistical feature extractors which
are not very robust towards physical changes (illumination structures) and by improving
the dataset, the performance of the model can be improved. To develop a more robust
system, Panigrahi et al. [48] proposed a framework based on ML algorithms which include
SVM, RFC, DT, and KNN used for the detection of various maize crop diseases. The afore-
mentioned classification approaches are tested and compared to determine the best model
with the highest accuracy. In comparison to the other classification approaches, the RFC
algorithm showed good accuracy of 79.23%. The maize datasets are divided into 90% for
training and 10% for testing the whole dataset. The dataset contains 3823 images consisting
of four classes, namely healthy (1162 images), gray leaf spot (513 images), common rust
(1192 images), and northern leaf blight (985 images). However, they trained the models
on poorly captured images which are not sufficient because diseases can affect any part of
the plant. Waghmare et al. [49] proposed a multi-class SVM classification-based machine
learning technique for grape plant disease identification. As preprocessing steps, they used
segmentation to remove the background area. Their research focused on two major diseases
that commonly affect grape plants (black rot and downy mildew). Their dataset consists
of 450 samples (160 healthy leaves and 290 diseased leaves) of grape leaves. A special
texture-based feature is used to extract the segmented leaf texture. They used multi-class
SVM to classify the extracted texture patterns. Their model achieved 96.6% accuracy. How-
ever, the accuracy is low which can be improved. For wheat leaves diseases classification,
Zhao et al. [50] suggested an integral technique based on ML algorithms for leaf-scale
wheat powdery mildew. The proposed framework was evaluated and trained by a hyper-
spectral images-based dataset. Three diagnosis models were constructed which include
SVM, probabilistic neural network (PNN), and RFC. After a comparison of used models
based on accuracy, the best model which was SVM had 93.33% classification accuracy.
However, their accuracy is low and can be improved by using other state-of-the-art models
with various feature descriptors. To improve accuracy, GuanLin et al. [51] proposed a novel
approach to recognize two types of wheat rusts (stripe rust and wheat leaf rust) based
on SVM and multiple feature parameters of their dataset. As preprocessing steps, they
used image cutting, de-noising, and segmentation techniques. Furthermore, they extracted
color- and texture-related features from preprocessed images. As a classification model,
they utilized SVM. The authors achieved an accuracy of 96.67% by using the SVMs with
radial basis function (RBF) on the selected twenty-six features. However, they used only
one feature extractor and the model was trained on an invariant dataset which became less
robust when the physical appearance changed. To improve accuracy, Azadbakht et al. [52]
proposed an ML-based detection of one wheat disease. They developed their dataset based
on canopy scale and under different LAI levels. Their proposed framework identified the
severity level of wheat leaf rust at the canopy scale by using four ML techniques which
include Random Forests Regression (RFR), ν-support vector regression (ν-SVR), Gaussian
process regression (GPR), and boosted regression trees (BRT). They achieved high accuracy
up to 99% using ν-support vector regression (ν-SVR). However, the experiments were
performed only on one disease and their focus was only on the severity of one disease.
In Table 1, we tabulate the summary of the above-discussed related work. Different
techniques were proposed by researchers for various crop leave disease recognitions in the
ML domain which include maize, rice, tea, vineyard, and our focus plant wheat. They used
Agriculture 2022, 12, 1226 6 of 20
various methods for preprocessing, commonly resizing, de-noising, and cropping; some
of them used segmentation for other than wheat plant leaves as preprocessing steps. For
feature extractions, they used statistical and CV-based techniques. Recognition has been
done by algorithms that include pure statistical as well as ML approaches such as SVM
and RFC. However, we found deficiencies in the existing works about wheat leave disease
recognition and classification using ML methods in terms of unavailability of diverse
datasets, robust preprocessing techniques, and accuracy of the framework. Therefore,
we bridge this gap by proposing a framework for common wheat diseases recognition
which proved to have a comparatively high accuracy in the result of the collected dataset,
preprocessing steps such as masked-based segmentation, feature descriptors, and the
proposed fine-tuned RFC framework, specifically for wheat common disease recognitions.
Table 1. Summary of the literature and at the end the results of our proposed framework is marked
bold for comparison.
Algorithms/
Article Crop Preprocessing Features Accuracy
Models
Conversion images to
G single gray Binary features point Flood filling
Xu et al., 2017 [35] Wheat 92.3%
RGB model, set algorithm
background removal
Color based
Islam et al., 2017 [36] Potato Statistical features SVM 95%
segmentation
Texture and
Alehegen et al., 2019 [37] Maize Segmentation SVM 95.63%
morphological
Image resizing and
Hossain et al., 2018 [38] Tea Statistical Features SVM 93%
cropping
Aurangzeb et al., 2020 [41] Corn and Potato Image resizing LTP, HOG, SFTA MSVM 92.8% and 98.7%
Morphological
First order statistic,
Treboux et al., 2018 [42] Vineyards operation (Opening DTE 94.275%
Tamura, Haralick
and closing)
Image resizing, Physiological
Rumpf et al., 2010 [43] Sugar beet SVM 97%
Clustering parameters
Image resizing and
Ramesh et al., 2018 [44] Papaya HOG RFC 70%
Normalization
Enhancement via
Phadikar et al., 2012 [45] Rice mean filters and Colors descriptors SVM, NB 68.1% and 79.5%
segmentation
Back removal, Texture, Color,
Prajapati et al., 2017 [46] Rice SVM 93.33%
segmentation and shape
Pure statistical
Ahmed et al., 2019 [47] Rice Augmentation DT 97.91%
features
Resizing, denoising, Grayscale pixel NB, KNN, DT, 79.23% (highest
Panigrahi et al.,2020 [48] Maize
segmentation values SVM and RFC with RFC)
Waghmare et al., 2016 [49] Graphs Back removal Texture SVM 96.6%
Image smoothing via Disease level of
SVM, PNN,
Zhao et al., 2020 [50] Wheat S-G filter and severity, and affected 93.33%
and RFC
derivative function leaf spots
Li et al., 2012 [51] Wheat Cropping, denoising Colored and texture SVM with RBF 96.67%
Disease severity level,
Azadbakht et al., 2019 [52] Wheat Noise reduction leaf area index, and V-SVR, and RFR 99% and 79%
pixel values
Haralick texture,
Resizing, Masked
Proposed framework Wheat color histogram, and Fine-tuned RFC 99.8%
based segmentation
hue moments
Agriculture 2022, 12, 1226 7 of 20
Figure 1. Framework: for better understanding, the whole framework is divided into three sub-
Figure 1. Framework: for better understanding, the whole framework is divided into three sub-
sections which include: (1) data preprocessing and feature extraction. The images from wheat
sections which include: (1) data preprocessing and feature extraction. The images from wheat fields
fields are collected, and our dataset consists of three classes, including healthy, rusted, and yellow-
are collected,
rusted leaves.and
Thisour dataset
contains consists which
sub-steps of threeinclude
classes,preprocessing
including healthy,
whichrusted, and yellow-rusted
preprocess the data by
leaves.the
using This contains
resizing andsub-steps
segmentation which include preprocessing
techniques which preprocess
and feature extraction the data
where extracted by using
features
the resizing
from and segmentation
preprocessed techniques
data using different anddescriptors
feature feature extraction
are used.where
(2) Theextracted features
second one is modelfrom
preprocessed
training, data 80%
in which usingofdifferent
data loads feature
fromdescriptors
the datasetare for used. (2) purposes.
training The second one issix
Lastly, model training,
different
ML models
in which 80%areoftrained. (3) The
data loads from third step is testing,
the dataset after purposes.
for training training models,
Lastly,these are evaluated
six different ML modelsby
checking the(3)
are trained. performance of models
The third step on unseen
is testing, data. As
after training a testing
models, load,
these arethe remaining
evaluated by 20% of thethe
checking
data are usedofand
performance passed
models on from
unseenpreprocessing and feature
data. As a testing extraction
load, the remaining steps,
20%asofsame as applied
the data are usedinand
the training phase, and the performance of the model is observed.
passed from preprocessing and feature extraction steps, as same as applied in the training phase, and
the performance of the model is observed.
3.1. Real-Time Data Collection
For the Integration of computer vision technology in agriculture and facilitation of
plant disease diagnosis, researchers developed various open-access datasets, such as plant
village datasets containing over 50,000 images of different plant species with diseases an-
Agriculture 2022, 12, 1226 8 of 20
3.2.1. Preprocessing
Preprocessing is a very important step in ML. It helps remove unwanted data and
reduce the computation time during the training and testing of models. Our preprocessing
is shown in Figure 2, which includes two techniques. The first technique in preprocessing
is resizing which is the adjustment of the sizes of images without having to take anything
out. To make image processing systems more accurate and run faster, high-resolution
images are nearly always down-sampled. In this work, the INTER_AREA interpolation
method is used for the scaling of images, and each image is resized from 1026 × 768 to
250 × 250. The objective of the interpolation function is to take these neighboring areas
of pixels and use them to expand or reduce the image’s size. In general, it is considerably
better to reduce the image size using the interpolation method. This is due to the fact
that the interpolation function removes pixels from an image. Inter-area is the desired
approach for image reduction because it produces moire free results. This step improved
our system performance in terms of computational complexity and accuracy. The second is
Image segmentation which is the process of segmenting an image into clusters based on the
similarity found in the intensity values of the input image. For instance, the pixel values in
wheat leaves that are similar to the affected region will belong to the affected cluster and
the rest of the region will consider the healthy part. This process seems simple apparently,
but it becomes very difficult when some pixel values laying on the boundary such as
boundary spots of brown rust. The brown-rusted pixels on the boundary sometimes are
much close to healthy region values; so, the decision about it becomes difficult, whether to
consider it as the part of affected, healthy or accommodate it into another region. The Fuzzy
set-based ideas allow us to deal with these situations. For example, there is a set consisting
of N number of elements, and one needs to divide these into C number of clusters. Each
element of the set will have a C number of membership values according to C clusters. So,
an element will be the part of that cluster C which have the highest membership value.
The intensity values of the wheat leaf images as a set of N number of pixel values and
C-means clustering algorithms are used. This works with the same concept and perform
this process is iteratively performed for each pixel value using the following formula given
in Equation (1).
N C
m
∑ ∑ uij
2
Jm = xi − c j (1)
i =1 j =1
Agriculture 2022, 12, 1226 9 of 20
where N represents image intensity values, C is the number of clusters and m is any real
number greater than 1. While u shows the degree of membership in cluster j, xi is the
ij
current value in the image, c j and is the center of a specific cluster. In our case, the m value
is taken 2 and achieves the highest performance when the number of clusters was 3. The
Agriculture 2022, 12, x FOR PEER REVIEW 10 of 22
images were passed from the proposed algorithm and the colors were assigned to various
parts of the leaves. Generally, diseased parts of the image have greater intensities than
healthy parts.
Figure2.
Figure 2. In image segmentation,
segmentation, the
thefirst
firstcolumns
columnsare
arethe
theinput
inputimages,
images,and the
and second
the column
second column
showsthe
shows thesegmented
segmentedimage
imageof
ofthe
thecorresponding
correspondinginput
inputimage.
image.
Therefore,
Therefore,we weassigned
assignedthethehighest
highestintensity
intensityvalues
valuesto
tothe
thediseased
diseasedspots
spotsand
andmade
madeaa
cluster
clusterfrom
fromit,
it,and
andthethe rest
rest of
of the
the parts
parts are divided
divided into the boundary and healthy clusters.
As
Asthe
theresult of of
result segmentation,
segmentation, more suitable
more and highlighted
suitable imagesimages
and highlighted are obtained, as shown
are obtained, as
in Figure
shown in2Figure
for further
2 for processing.
further processing.
3.2.2.
3.2.2. Feature
FeatureExtraction
Extraction
Feature
Feature extractionisisone
extraction oneofof
thethe
most important
most steps
important for ML-based
steps model
for ML-based development.
model develop-
The performance of ML algorithms depends on the extracted features. If extracted features
ment. The performance of ML algorithms depends on the extracted features. If extracted
are relevant to the ROI, ML classifiers can differentiate among classes with high accuracies.
features are relevant to the ROI, ML classifiers can differentiate among classes with high
The basic idea of feature extraction is to extract only those features which have a high
accuracies. The basic idea of feature extraction is to extract only those features which have
weight in terms of the representation of an object and reduce computational complexity
a high weight in terms of the representation of an object and reduce computational com-
by avoiding further processing of less meaningful features. Many descriptors are used by
plexity by avoiding further processing of less meaningful features. Many descriptors are
researchers for varieties of features that include texture, shape, and color feature descriptors.
used by researchers for varieties of features that include texture, shape, and color feature
As shown in Figure 1, three relevant feature descriptors are used which include Histogram
descriptors. As shown in Figure 1, three relevant feature descriptors are used which in-
of Oriented Gradient (HOG), particularly designed for shape extraction, Local Binary
clude Histogram of Oriented Gradient (HOG), particularly designed for shape extraction,
Pattern (LBP) which is mostly used for texture feature extraction, Hue- Moment (HM)
Local Binary Pattern (LBP) which is mostly used for texture feature extraction, Hue- Mo-
which is a statistical descriptor using for shape feature extraction, Color Histogram (CH) is
ment (HM) which is a statistical descriptor using for shape feature extraction, Color His-
togram (CH) is used for color features extraction, and Haralick Texture (HT) that is using
14 features as textures. Each of the descriptors is used separately, and three (HM, HT, and
CH) of them are combined for performance analysis. After detailed experiments, it is
found that the combination of these three feature descriptors is effective. In terms of test-
Agriculture 2022, 12, 1226 10 of 20
used for color features extraction, and Haralick Texture (HT) that is using 14 features as
textures. Each of the descriptors is used separately, and three (HM, HT, and CH) of them
are combined for performance analysis. After detailed experiments, it is found that the
combination of these three feature descriptors is effective. In terms of testing accuracy,
these three feature descriptors are explained below in detail.
where x and y represent the location of the pixel connected to the object region while x and
y represent the centroid of the shape of the object. Centroid is considering the center of the
mass which is calculated using Equations (3) and (4), where M00 the area is while M10 and
M01 represent coordinates of the center of a shape.
M10
x= (3)
M00
M01
y= (4)
M00
To equip the central HM with scale invariance. It is very essential to normalize the
central moment. The normalized version of the main moment is calculated by using
Equation (5).
µi,j
ηij = (i+ j)/2+1 (5)
µ00
Now, the central moment is a translation and scale-invariant, however, it is not enough
for robust shape matching. The central moment must be rotation- and reflection-invariant
along with scale and rotation invariance. So, the following seven moments are calculated
using Equations (6)–(12) and defined as hn where n is the number of moments.
h4 = (η30 − 3η12 )(η30 + η12 )[(η30 + η12 )2 − 3(η21 + η03 )2 ] + (3η21 − η03 )[3(η30 + η12 )2 − (η21 + η03 )2 ] (10)
h5 = (η20 − η02 )[(η30 + η12 )2 − (η21 + η03 )2 + 4η11 (η30 + η12 )(η21 + η03 )] (11)
h6 = (3η21 − η03 )(η30 + η12 )[(η30 + η12 )2 − 3(η21 + η03 )2 ] + (η30 − 3η12 )(η21 + η03 )[3(η30 + η12 )2 − (η21 + η03 )2 ] (12)
These seven invariants of the moments describe the shape of objects as a 7D vector
which is used in this work as a feature with the concatenation of other features in the
training of the ML model.
Agriculture 2022, 12, 1226 11 of 20
Gl −1 Gl −1
Con = ∑ ∑ (||i − j||2 G (i, j) (14)
i j
Gl −1
1
Cor = ∑ σi σj
[{i.j × G (i, j)} − µi µ j ] (15)
i,j=0
Gl −1 Gl −1
Uni = ∑ ∑ G(i, j)2 (16)
i j
Gl −1 Gl −1
−1
Homo = ∑ ∑ {1 + (i, j)2 } G (i, j) (17)
i j
Agriculture 2022, 12, 1226 12 of 20
where the texture of an image is stored in a matrix I (i, j) called GLCM. So, here, in these equations,
Gl represents the total number of gray levels of an image. After extraction, these features are
normalized, using skewness, mean, and kurtosis as defined in Equations (18) and (19).
E( xi 3 ) − 3µσ2 − µ3
Skew = (18)
σ3
" #
Xi − µ 4
Kur = E (19)
σ
where Skew and Kur represent skewness and kurtosis, respectively. Further, in these
equations, E denotes the expected mean and Xi normalized scale matrix. After extracting
the above features, these are fused into one matrix and passed for model training in the
next module.
plays a major role in the proposed lightweight and accurate framework. If we compare our
proposed fine-tuned RFC with other ML models such as neural networks, it can perform
well when handling classification tasks but most of them are computationally expensive.
So, with this set of the RFC model, we achieve the high performance presented in the
result section.
4. Experimental Results
In this section, we discuss the experimental setting, collected dataset, evaluation
metrics, and evaluation of the performance of our proposed framework. Furthermore, we
elaborate on the performance of trained models comparatively.
Agriculture 2022, 12, x FOR PEER REVIEW 15
Agriculture 2022, 12, 1226 In this section, we discuss the experimental setting, collected dataset,
14 of evaluation
20
rics, and evaluation of the performance of our proposed framework. Furthermor
elaborate on the performance of trained models comparatively.
4.1. Experimental
4.1.Settings
Experimental Settings
All the experiments are carried out on a computer system with specifications of
All the experiments are carried out on a computer system with specifications of I
Intel® Xeon® X5560 processor with 2.80 GHz clock speed and installed memory (RAM)
Xeon® X5560 processor with 2.80 GHz clock speed and installed memory (RAM) 8.00
8.00 Giga bite and GPU of GTX GFORCE 1070 are installed. In addition, the Microsoft
bite and GPU of GTX GFORCE 1070 are installed. In addition, the Microsoft Win
Windows operating system are used. Apart from this, different libraries are utilized for
operating system are used. Apart from this, different libraries are utilized for the im
the implementation of our project which includes python 3.7 as a programming language,
mentation of our project which includes python 3.7 as a programming language, Op
OpenCv version 3.4 for preprocessing which is a CV library, and scikit learn ML library
version 3.4 for preprocessing which is a CV library, and scikit learn ML library ve
version 0.24.2 is used for training and testing of various ML models. Matplotlib is a python-
0.24.2 is used for training and testing of various ML models. Matplotlib is a python-b
based visualization library for the visualization of various images, results, and graph
visualization library for the visualization of various images, results, and graph genera
generation. Furthermore, the OS glob library is utilized for reading different files from the
Furthermore, the OS glob library is utilized for reading different files from the hard d
hard drive.
Figure
Figure 3. Samples 3. Samples
of images. Theoffirst,
images. The first,
second, second,
and third andshow
rows thirdhealthy,
rows show healthy,
rusted, rusted, and yello
and yellow
rusted leaves of wheat diseases, respectively.
rusted leaves of wheat diseases, respectively.
which is 99.8% of our proposed framework followed by DT which is 99.2%. On other hand,
the testing accuracy of LR is observed lower than all of the others which were 89.6%.
TP + TN
Acc = (21)
TP + TN + FP + FN
TP
Precision = (22)
TP + FP
TP
Recall = (23)
TP + FP
2 ∗ ( Precision ∗ Recall )
F1 − score = (24)
Precision + Recall
4.4. Results
To evaluate the performance of our framework, different experiments on the training
dataset are conducted. All testing images were selected randomly. The size of the testing
dataset was 20% of our whole dataset which consists of 210 images for each category
(healthy, rusted, yellow-rusted). Furthermore, based on the testing set, it is observed that
different evaluation graphs include confusion matrix, overall accuracy, and performance
evaluation graph. These graphs contain precision, recall, and F1-score. All of these are
discussed below in detail.
In these experiments, a comparative analysis of the performance of ML models is
conducted on the combined extracted features of colors, shape, and harlick textures of
wheat leaves. Accuracy is calculated using Equation (21). We divided the total accurate
prediction of trained models by the total number of testing samples of the allocated dataset
for testing. 20% of the whole dataset is split for testing as mentioned in the above relevant
section. Calculated accuracies of all the trained models are mentioned in Figure 4, in which
the proposed framework has the highest accuracy of 99.8% followed by DT (99.2%) and
KNN (99.0%). On the other hand, the accuracy of the LR is 89.6% which is lower than
Agriculture 2022, 12, x FOR PEER REVIEW
all the mentioned models. However, SVM and NB achieved accuracies up to 94.0% 17 and
of 22
97.7%, respectively.
Figure 4. Accuracy graph of all six trained models: the proposed framework has the highest accu-
Figure 4. Accuracy graph of all six trained models: the proposed framework has the highest accuracy
racy followed by DT while LR has the lowest, and all others have a performance value between
followed by DT while LR has the lowest, and all others have a performance value between these two.
these two.
For better evaluation and verification of our experimental results, we calculated var-
ious evaluation values from the confusion matrix. These values include precision, recall,
and F1-score. These metrics are calculated using Equations (22)–(24), respectively. All
these values are depicted in Figure 5, where our proposed framework has achieved the
Figure 4. Accuracy graph of all six trained models: the proposed framework has the highest accu-
racy followed by DT while LR has the lowest, and all others have a performance value between
these two.
For
better evaluation and verification of our experimental results, we calculated
Agriculture 2022, 12, 1226 var-
16 of 20
ious evaluation values from the confusion matrix. These values include precision, recall,
and F1-score. These metrics are calculated using Equations (22)–(24), respectively. All
For betterinevaluation
these values are depicted Figure 5,and verification
where of our experimental
our proposed framework results,hasweachieved
calculatedthe
various evaluation values from the confusion matrix. These values include precision, recall,
highest 99.8% precision,
and F1-score. recall, and F1
These metrics score. These
are calculated results show
using Equations that
(22)–(24), almost all
respectively. testing
All these
samples are classified
values arecorrectly
depicted inby our5,proposed
Figure framework
where our proposed whichhasisachieved
framework secondly the followed
highest
by DT with 99.2% for all values. This means that only 1% of the data of each class isare
99.8% precision, recall, and F1 score. These results show that almost all testing samples mis-
classified correctly by our proposed framework which is secondly followed by DT with
classified. However, LR has lower 91.0%, 90.0%, and 90.0% for these values, respectively.
99.2% for all values. This means that only 1% of the data of each class is misclassified.
Although, the performance
However, LR has oflower
other91.0%,
used90.0%,
ML models
and 90.0% isfor
between the range
these values, of ourAlthough,
respectively. proposed
framework andthe LR. All models
performance except
of other themodels
used ML LR model classified
is between the rangethe class
of our of yellow
proposed images
framework
and LR. All models except the LR model classified the class
correctly due to clear pixels in the input images. However, other classes of images areof yellow images correctly
due to clear pixels in the input images. However, other classes of images are misclassified
misclassified because these classes have high minimum differences due to light and dark
because these classes have high minimum differences due to light and dark disease spots
disease spots on onwheat leaves.
wheat leaves.
Figure 5. Performance
Figure 5.evaluation
Performancebased on precision,
evaluation recall,recall,
based on precision, and andF1-score:
F1-score:the
theproposed frame-
proposed framework
work followed byfollowed
DT has bythe
DT highest
has the highest performance,
performance, however,LR
however, LRhashas the
the lowest
lowestprecision, recall,
precision, and
recall,
and F1 score. F1 score.
The confusion matrices for each model are computed that show us the accurate and
wrong predictions for each class of testing samples. The confusion matrix consists of four
values which include TP, TN, FP, and FN in which TP and TN show the accurate prediction
while the other two values show the rate of false predictions against the ground truth.
Confusion matrices for the proposed framework which has higher accuracy followed by
DT and lowest accuracy RL models are tabulated in Table 3. The confusion matrix of
the proposed framework verifies the best performance because, as shown, the proposed
framework has classified all testing samples correctly except only one rusted sample is
classified as healthy. On the other hand, DT has the second-highest accuracy due to only
5 testing samples being misclassified. Further, the Performance of LR is lower because
10.4% of the testing samples are misclassified instead of their corresponding classes.
Agriculture 2022, 12, 1226 17 of 20
The result of the suggested method has been compared against three state-of-the-art
approaches, and the findings are shown in Table 5. The data are arranged in the table
year-wise. The authors proposed methods for wheat disease classification using different
techniques such as [52], used texture features and classified it with four ML models achieved
99% accuracy. In [50], the authors used the affected area divided by the total area of the
wheat leaves formula for the finding of a disease as a feature and classified it using three
ML models which include SVM, PNN, and the proposed framework, and achieved 93.33%
accuracy. Further, in [30], the authors used color, texture, and their combination as a feature
and classified them using the EMMC matrix. They achieved 94.16 % accuracy. On the other
hand, our proposed framework achieved 99.8% accuracy in the result of a comparative
analysis of different ML models with the proposed RFC model. We achieved this accuracy
by concatenating the three descriptor features which include Haralick-texture, Color-
Agriculture 2022, 12, 1226 18 of 20
histogram, and Hue-moment. The use of these techniques has increased in recent years,
as researchers utilized them in high-level research for forecasting [53], age estimation [54],
and time-series analysis [55].
Table 5. Comparison with SOTA papers where our work achieved the highest accuracy, using the
proposed framework on the extracted features of three descriptors.
Author Contributions: Conceptualization, H.K.; data curation, M.M.; formal analysis, I.U.H. and
S.U.K.; funding acquisition, M.Y.L.; methodology, H.K., M.M. and M.; project administration, M.Y.L.;
software, H.K.; supervision, M.Y.L.; validation, H.K., I.U.H. and S.U.K.; visualization, M.; writing—
review and editing, I.U.H., M. and M.Y.L. All authors have read and agreed to the published version
of the manuscript.
Funding: This research was supported by Basic Science Research Program through the National Re-
search Foundation of Korea (NRF) funded by the Ministry of Education (No. 2021R1I1A1A01055652).
Institutional Review Board Statement: Not available.
Data Availability Statement: Not available.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Aker, J.C. Dial “A” for agriculture: A review of information and communication technologies for agricultural extension in
developing countries. Agric. Econ. 2011, 42, 631–647. [CrossRef]
Agriculture 2022, 12, 1226 19 of 20
2. Barretto, R.; Buenavista, R.M.; Rivera, J.L.; Wang, S.; Prasad, P.V.; Siliveru, K. Teff (Eragrostis tef ) processing, utilization and future
opportunities: A review. Int. J. Food Sci. Technol. 2020, 56, 3125–3137. [CrossRef]
3. Chakraborty, S.; Newton, A.C. Climate change, plant diseases and food security: An overview. Plant Pathol. 2011, 60, 2–14.
[CrossRef]
4. Nicholls, E.; Ely, A.; Birkin, L.; Basu, P.; Goulson, D. The contribution of small-scale food production in urban areas to the
sustainable development goals: A review and case study. Sustain. Sci. 2020, 15, 1585–1599. [CrossRef]
5. WHO. Available online: https://www.who.int/news-room/fact-sheets/detail/food-safety (accessed on 26 April 2022).
6. Mottaleb, K.A.; Singh, P.K.; Sonder, K.; Kruseman, G.; Tiwari, T.P.; Barma, N.C.; Malaker, P.K.; Braun, H.-J.; Erenstein, O. Threat of
wheat blast to South Asia’s food security: An ex-ante analysis. PLoS ONE 2018, 13, e0197555. [CrossRef] [PubMed]
7. Food and Agriculture Organization of the United Nations (FAO). Supply and Deman Brief ; Food and Agriculture Organization of
the United Nations (FAO): Rome, Itlay, 2020.
8. Figueroa, M.; Hammond-Kosack, K.E.; Solomon, P.S. A review of wheat diseases—A field perspective. Mol. Plant Pathol. 2018, 19,
1523–1536. [CrossRef]
9. Huerta-Espino, J.; Singh, R.; German, S.; McCallum, B.; Park, R.; Chen, W.Q.; Bhardwaj, S.; Goyeau, H. Global status of wheat leaf
rust caused by Puccinia triticina. Euphytica 2011, 179, 143–160. [CrossRef]
10. Sankaran, S.; Mishra, A.; Ehsani, R.; Davis, C. A review of advanced techniques for detecting plant diseases. Comput. Electron.
Agric. 2010, 72, 1–13. [CrossRef]
11. Jha, K.; Doshi, A.; Patel, P.; Shah, M. A comprehensive review on automation in agriculture using artificial intelligence. Artif.
Intell. Agric. 2019, 2, 1–12. [CrossRef]
12. Khan, N.; Muhammad, K.; Hussain, T.; Nasir, M.; Munsif, M.; Imran, A.S.; Sajjad, M. An adaptive game-based learning strategy
for children road safety education and practice in virtual space. Sensors 2021, 21, 3661. [CrossRef] [PubMed]
13. Haroon, U.; Ullah, A.; Hussain, T.; Ullah, W.; Sajjad, M.; Muhammad, K.; Lee, M.Y.; Baik, S.W. A Multi-Stream Sequence Learning
Framework for Human Interaction Recognition. IEEE Trans. Hum.-Mach. Syst. 2022, 52, 435–444. [CrossRef]
14. Khan, S.U.; Haq, I.U.; Khan, N.; Muhammad, K.; Hijji, M.; Baik, S.W. Learning to rank: An intelligent system for person
reidentification. Int. J. Intell. Syst. 2022, 37, 5924–5948. [CrossRef]
15. He, J.; Baxter, S.L.; Xu, J.; Xu, J.; Zhou, X.; Zhang, K. The practical implementation of artificial intelligence technologies in
medicine. Nat. Med. 2019, 25, 30–36. [CrossRef] [PubMed]
16. Khan, S.U.; Hussain, T.; Ullah, A.; Baik, S.W. Deep-ReID: Deep features and autoencoder assisted image patching strategy for
person re-identification in smart cities surveillance. Multimed. Tools Appl. 2021, 1–22. [CrossRef]
17. Ullah, W.; Ullah, A.; Hussain, T.; Muhammad, K.; Heidari, A.A.; Del Ser, J.; Baik, S.W.; De Albuquerque, V.H.C. Artificial
Intelligence of Things-assisted two-stream neural network for anomaly detection in surveillance Big Video Data. Future Gener.
Comput. Syst. 2022, 129, 286–297. [CrossRef]
18. Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W. CNN features with bi-directional LSTM for real-time
anomaly detection in surveillance networks. Multimed. Tools Appl. 2021, 80, 16979–16995. [CrossRef]
19. Yar, H.; Hussain, T.; Khan, Z.A.; Koundal, D.; Lee, M.Y.; Baik, S.W. Vision sensor-based real-time fire detection in resource-
constrained IoT environments. Comput. Intell. Neurosci. 2021, 2021, 5195508. [CrossRef] [PubMed]
20. Patrício, D.I.; Rieder, R. Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review.
Comput. Electron. Agric. 2018, 153, 69–81. [CrossRef]
21. Barbedo, J.G.A. A review on the main challenges in automatic plant disease identification based on visible range images. Biosyst.
Eng. 2016, 144, 52–60. [CrossRef]
22. Barbedo, J.G. Factors influencing the use of deep learning for plant disease recognition. Biosyst. Eng. 2018, 172, 84–91. [CrossRef]
23. Barbedo, J.G.A. Automatic image-based detection and recognition of plant diseases—A critical view. In Proceedings of the XI
Congresso Brasileiro de Agroinformática, Sao Paulo, Brazil, 2–6 October 2017.
24. Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease
diagnostics. arXiv 2015, arXiv:1511.08060.
25. Barbedo, J.G.A.; Koenigkan, L.V.; Halfeld-Vieira, B.A.; Costa, R.V.; Nechet, K.L.; Godoy, C.V.; Junior, M.L.; Patricio, F.R.A.;
Talamini, V.; Chitarra, L.G. Annotated plant pathology databases for image-based detection and recognition of diseases. IEEE Lat.
Am. Trans. 2018, 16, 1749–1757. [CrossRef]
26. Johannes, A.; Picon, A.; Alvarez-Gila, A.; Echazarra, J.; Rodriguez-Vaamonde, S.; Navajas, A.D.; Ortiz-Barredo, A. Automatic
plant disease diagnosis using mobile capture devices, applied on a wheat use case. Comput. Electron. Agric. 2017, 138, 200–209.
[CrossRef]
27. Barbedo, J.G.A. Plant disease identification from individual lesions and spots using deep learning. Biosyst. Eng. 2019, 180, 96–107.
[CrossRef]
28. Ngugi, L.C.; Abelwahab, M.; Abo-Zahhad, M. Recent advances in image processing techniques for automated leaf pest and
disease recognition—A review. Inf. Process. Agric. 2021, 8, 27–51. [CrossRef]
29. Wójtowicz, A.; Piekarczyk, J.; Czernecki, B.; Ratajkiewicz, H. A random forest model for the classification of wheat and rye leaf
rust symptoms based on pure spectra at leaf scale. J. Photochem. Photobiol. B Biol. 2021, 223, 112278. [CrossRef] [PubMed]
30. Bao, W.; Zhao, J.; Hu, G.; Zhang, D.; Huang, L.; Liang, D. Identification of wheat leaf diseases and their severity based on
elliptical-maximum margin criterion metric learning. Sustain. Comput. Inform. Syst. 2021, 30, 100526. [CrossRef]
Agriculture 2022, 12, 1226 20 of 20
31. Paul, A.; Ghosh, S.; Das, A.K.; Goswami, S.; Choudhury, S.D.; Sen, S. A review on agricultural advancement based on computer
vision and machine learning. In Emerging Technology in Modelling and Graphics; Springer: Cham, Switzerland, 2020; pp. 567–581.
32. Kumar, M.; Hazra, T.; Tripathy, S.S. Wheat leaf disease detection using image processing. Int. J. Latest Technol. Eng. Manag. Appl.
Sci. (IJLTEMAS) 2017, 6, 73–76.
33. Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors 2018, 18, 2674.
[CrossRef] [PubMed]
34. Dixit, A.; Nema, S. Wheat Leaf Disease Detection Using Machine Learning Method—A Review. Int. J. Comput. Sci. Mob. Comput.
2018, 7, 124–129.
35. Xu, P.; Wu, G.; Guo, Y.; Yang, H.; Zhang, R. Automatic wheat leaf rust detection and grading diagnosis via embedded image
processing system. Procedia Comput. Sci. 2017, 107, 836–841. [CrossRef]
36. Islam, M.; Dinh, A.; Wahid, K.; Bhowmik, P. Detection of potato diseases using image segmentation and multiclass support vector
machine. In Proceedings of the IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), Windsor, ON,
Canada, 30 April–3 May 2017; pp. 1–4.
37. Alehegn, E. Ethiopian maize diseases recognition and classification using support vector machine. Int. J. Comput. Vis. Robot. 2019,
9, 90–109. [CrossRef]
38. Hossain, S.; Mou, R.M.; Hasan, M.M.; Chakraborty, S.; Razzak, M.A. Recognition and detection of tea leaf’s diseases using
support vector machine. In Proceedings of the IEEE 14th International Colloquium on Signal Processing & Its Applications
(CSPA), Penang, Malaysia, 9–10 March 2018; pp. 150–154.
39. Ullah, W.; Muhammad, K.; Ul Haq, I.; Ullah, A.; Ullah Khattak, S.; Sajjad, M. Splicing sites prediction of human genome using
machine learning techniques. Multimed. Tools Appl. 2021, 80, 30439–30460. [CrossRef]
40. Ahmad, F.; Ikram, S.; Ahmad, J.; Ullah, W.; Hassan, F.; Khattak, S.U.; Rehman, I.U. GASPIDs Versus Non-GASPIDs-Differentiation
Based on Machine Learning Approach. Curr. Bioinform. 2020, 15, 1056–1064. [CrossRef]
41. Aurangzeb, K.; Akmal, F.; Khan, M.A.; Sharif, M.; Javed, M.Y. Advanced machine learning algorithm based system for crops leaf
diseases recognition. In Proceedings of the IEEE 6th Conference on Data Science and Machine Learning Applications (CDMA),
Riyadh, Saudi Arabia, 4–5 March 2020; pp. 146–151.
42. Treboux, J.; Genoud, D. Improved machine learning methodology for high precision agriculture. In Proceedings of the IEEE
Global Internet of Things Summit (GIoTS), Bilbao, Spain, 4–7 June 2018; pp. 1–6.
43. Rumpf, T.; Mahlein, A.-K.; Steiner, U.; Oerke, E.-C.; Dehne, H.-W.; Plümer, L. Early detection and classification of plant diseases
with support vector machines based on hyperspectral reflectance. Comput. Electron. Agric. 2010, 74, 91–99. [CrossRef]
44. Ramesh, S.; Hebbar, R.; Niveditha, M.; Pooja, R.; Shashank, N.; Vinod, P. Plant disease detection using machine learning. In
Proceedings of the IEEE International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C),
Bangalore, India, 25–28 April 2018; pp. 41–45.
45. Phadikar, S.; Sil, J.; Das, A.K. Classification of rice leaf diseases based on morphological changes. Int. J. Inf. Electron. Eng. 2012, 2,
460–463.
46. Prajapati, H.B.; Shah, J.P.; Dabhi, V.K. Detection and classification of rice plant diseases. Intell. Decis. Technol. 2017, 11, 357–373.
[CrossRef]
47. Ahmed, K.; Shahidi, T.R.; Alam, S.M.I.; Momen, S. Rice leaf disease detection using machine learning techniques. In Proceedings
of the IEEE International Conference on Sustainable Technologies for Industry 4.0 (STI), Dhaka, Bangladesh, 24–25 December
2019; pp. 1–5.
48. Panigrahi, K.P.; Das, H.; Sahoo, A.K.; Moharana, S.C. Maize leaf disease detection and classification using machine learning
algorithms. In Progress in Computing, Analytics and Networking; Springer: Cham, Switzerland, 2020; pp. 659–669.
49. Waghmare, H.; Kokare, R.; Dandawate, Y. Detection and classification of diseases of grape plant using opposite colour local
binary pattern feature and machine learning for automated decision support system. In Proceedings of the IEEE 3rd International
Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 11–12 February 2016; pp. 513–518.
50. Zhao, J.; Fang, Y.; Chu, G.; Yan, H.; Hu, L.; Huang, L. Identification of leaf-scale wheat powdery mildew (Blumeria graminis f. sp.
Tritici) combining hyperspectral imaging and an SVM classifier. Plants 2020, 9, 936. [CrossRef] [PubMed]
51. Li, G.; Ma, Z.; Wang, H. Image recognition of wheat stripe rust and wheat leaf rust based on support vector machine. J. China
Agric. Univ. 2012, 17, 72–79.
52. Azadbakht, M.; Ashourloo, D.; Aghighi, H.; Radiom, S.; Alimohammadi, A. Wheat leaf rust detection at canopy scale under
different LAI levels using machine learning techniques. Comput. Electron. Agric. 2019, 156, 119–128. [CrossRef]
53. Tursunov, A.; Choeh, J.Y.; Kwon, S. Age and gender recognition using a convolutional neural network with a specially designed
multi-attention module through speech spectrograms. Sensors 2021, 21, 5892. [CrossRef] [PubMed]
54. Mustaqeem; Ishaq, M.; Kwon, S. A CNN-Assisted deep echo state network using multiple Time-Scale dynamic learning reservoirs
for generating Short-Term solar energy forecasting. Sustain. Energy Technol. Assess. 2022, 52, 102275. [CrossRef]
55. Maji, B.; Swain, M.; Mustaqeem. Advanced Fusion-Based Speech Emotion Recognition System Using a Dual-Attention Mechanism
with Conv-Caps and Bi-GRU Features. Electronics 2022, 11, 1328. [CrossRef]