0% found this document useful (0 votes)
37 views20 pages

2 Automated Wheat Diseases Classification Framework Using

Uploaded by

it-rpa102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views20 pages

2 Automated Wheat Diseases Classification Framework Using

Uploaded by

it-rpa102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

agriculture

Article
Automated Wheat Diseases Classification Framework Using
Advanced Machine Learning Technique
Habib Khan 1 , Ijaz Ul Haq 1 , Muhammad Munsif 1 , Mustaqeem 2 , Shafi Ullah Khan 3 and Mi Young Lee 1, *

1 Sejong University, Seoul 05006, Korea


2 Interaction Technology Laboratory, Department of Software Convergence, Sejong University,
Seoul 05006, Korea
3 Department of Electronics, Islamia College University, Peshawar 25000, Pakistan
* Correspondence: [email protected]

Abstract: Around the world, agriculture is one of the important sectors of human life in terms of food,
business, and employment opportunities. In the farming field, wheat is the most farmed crop but
every year, its ultimate production is badly influenced by various diseases. On the other hand, early
and precise recognition of wheat plant diseases can decrease damage, resulting in a greater yield.
Researchers have used conventional and Machine Learning (ML)-based techniques for crop disease
recognition and classification. However, these techniques are inaccurate and time-consuming due
to the unavailability of quality data, inefficient preprocessing techniques, and the existing selection
criteria of an efficient model. Therefore, a smart and intelligent system is needed which can accurately
identify crop diseases. In this paper, we proposed an efficient ML-based framework for various kinds
of wheat disease recognition and classification to automatically identify the brown- and yellow-rusted
diseases in wheat crops. Our method consists of multiple steps. Firstly, the dataset is collected from
different fields in Pakistan with consideration of the illumination and orientation parameters of the
Citation: Khan, H.; Haq, I.U.; Munsif, capturing device. Secondly, to accurately preprocess the data, specific segmentation and resizing
M.; Mustaqeem; Khan, S.U.; Lee, M.Y.
methods are used to make differences between healthy and affected areas. In the end, ML models
Automated Wheat Diseases
are trained on the preprocessed data. Furthermore, for comparative analysis of models, various
Classification Framework Using
performance metrics including overall accuracy, precision, recall, and F1-score are calculated. As a
Advanced Machine Learning
result, it has been observed that the proposed framework has achieved 99.8% highest accuracy over
Technique. Agriculture 2022, 12, 1226.
https://doi.org/10.3390/
the existing ML techniques.
agriculture12081226
Keywords: artificial intelligence; computer vision; machine learning; precision agriculture; wheat
Academic Editors: Maciej
diseases
Zaborowicz and Hongbin Pu

Received: 10 June 2022


Accepted: 8 August 2022
Published: 15 August 2022 1. Introduction
Agriculture is the most important sector due to its economic influence on society,
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
particularly in developing countries [1]. Food demand is growing exponentially day by day
published maps and institutional affil-
due to the increase in population and shortage of food ingredients. Crops such as wheat,
iations. maize, and rice are the main components of foods [2]. However, one of the most badly
influencing factors among others on the quality of crop production is crop diseases. Crop
diseases are the main source of crop yield losses in terms of quality and quantity, having a
bad effect on large as well as small-scale farmers [3]. Moreover, small-scale cultivators of
Copyright: © 2022 by the authors. developing countries are contributing up to 80% of the global production of crop yields; in
Licensee MDPI, Basel, Switzerland. contrast, food losses in these areas are much higher due to a lack of access to the resources
This article is an open access article and latest technology [4]. Apart from this, according to the World Health Organization
distributed under the terms and (WHO) [5], more than a hundred diseases are caused by poisoned food, and almost six
conditions of the Creative Commons hundred million people are ill, of which 0.4 million people die yearly. In addition, the
Attribution (CC BY) license (https:// farmers have no quick way to identify on-time diagnoses of particular diseases. So, the
creativecommons.org/licenses/by/
quality and quantity of the crops can improve through precision agriculture.
4.0/).

Agriculture 2022, 12, 1226. https://doi.org/10.3390/agriculture12081226 https://www.mdpi.com/journal/agriculture


Agriculture 2022, 12, 1226 2 of 20

Around the globe, wheat is the most important ingredient of food; therefore, it is the
most popular cereal which has been cultivated by farmers around the world [6]. According
to the Food and Agriculture Organization (FAO) of the United Nations [7], in the years
2018 and 2019, wheat produced almost 28% of total global cereal production from an
estimated area of 215 million hectares. However, the demand for wheat is far higher than
the production of wheat cereal, specifically in developing countries [2]. Different factors
are involved in the low production of wheat, one and most important factor are diseases,
which can cause 15–20% losses in global wheat production per annum [8]. Wheat leaf
common diseases such as leaf rust and yellow rust are the most widespread diseases in
wheat plants and can cause huge losses of food and economic activities deflation if they
remain uncontrolled [9]. Further, most farmers, especially in developing countries, depend
on agriculture experts to identify and diagnose the disease [10]. The quick response to
disease is very crucial to stop it from spreading to the entire plant and even the entire field,
particularly in wheat plants.
Wheat plant disease detection and identification are always a challenge for farmers
to look after the whole field and visit and examine each plant by themselves or through
an agriculture expert. Because of the density of wheat crops in the field, it is very time-
and resource-consuming to manually monitor the whole field [11]. Due to recent advance-
ments in computer technology such as human-computer interaction [12,13] and AI [14–19],
intelligent systems can help out farmers in the field identify wheat leaf diseases, using
different automatic methods, such as Computer Vision (CV) and AI-based methods [20].
The development of a robust ML and CV-based system for wheat disease recognition
that is capable to work in diverse field conditions with accurate performance has various
challenges and some of them are discussed below.
The first and very important challenge for plant disease recognition, especially for
wheat using CV and ML, is data collection of diseases and the creation of a challenging
database which have a variety of datasets such as images captured with various angles and
occlusion [21–23]. Current freely available datasets of plant disease include [24] which in-
cludes the images of wheat diseases, another one is reported in [25] which has datasets with
variations of angles and illumination. So, data acquisition from fields with wheat diseases
during different conditions is very essential for accurate automatic disease identification.
The second challenge is the segmentation of leaves from busy backgrounds and re-
gions of interest which has the potential to improve model accuracy [22,26,27]. Most of the
existing methods for wheat disease recognition use manual methods such as manual crop-
ping of images after capturing by farmers. Most inexperienced farmers cut the important
region in an image by cropping and leave the unessential region of the captured image
which badly affects the accuracy of the system. On the other hand, the segmentation of
interesting regions in leaves can improve the accuracy of the disease identification system.
The third one is the selection of proper methods for feature extraction and classification
of extracted features [28]; most of the existing techniques for wheat leaf disease detection
used rare feature extractors of CV and ML models without any investigation of other
descriptors and models’ performance. For example, [29] used only color and texture
descriptors with Random Forest (RF) classifiers for wheat disease classification. Further, [30]
used only the Maximum Margin Criterion (MMC) method for the severity and classification
of wheat leaf diseases. Thus, feature extractors and classifiers for classification should be
selected based on comparative performance.
In this paper, we proposed an ML-based wheat disease recognition framework and
achieved the highest recognition accuracy after a comparative analysis of various feature
extraction and classification algorithms. The major contributions of our work are as follows:
• We proposed a machine learning-based framework for the detection of salient cues
regarding wheat diseases and accurately classify them into yellow and brown rust. Our
model utilized a masked-based segmentation technique that automatically removes
the background, noises, and identifies healthy, unhealthy wheat crops, and determines
Agriculture 2022, 12, 1226 3 of 20

the affected and unaffected area of the crop. The proposed framework is lightweight
and automatically identifies the wheat crop diseases with a high recognition rate.
• A new dataset for wheat disease classification is introduced. The dataset is collected
from different wheat fields in various regions of Peshawar, and Dir Pakistan. We
focused on two categories of diseases, having a total of three classes, i.e., brown rust,
severe yellow, and healthy leaves, respectively. The dataset will be publicly available
to the research community.
• A comparative analysis has been conducted among ML techniques for wheat disease
recognition. The proposed framework achieved 98.8% accuracy for the wheat diseases
classification. Due to good generalization and a high recognition rate, the system can
be employed in various real-time industrial applications.
The rest of the paper is arranged as follows. Section 2 discusses the related work in
detail, followed by Section 3, where the proposed system is comprehensively discussed
with implementation details. Section 4 illustrates the experimental results in detail. The
outcomes and possible future research directions are presented in Section 5.

2. Related Work
Around the globe, researchers are striving to develop significant guidance and insights
to help farmers make better decisions and take actions accordingly. For the last two
decades, advancement in technology such as AI and computing has attained the attention
of researchers. To produce a more effective system for actual disease diagnosis and to
categorize diseases with high accuracy, there are many alternative schemes with diverse
combinations that may be used. All these conventional (statistical and image processing)
techniques as well as ML-based methods for the plant and leave diseases, specifically wheat
disease recognition and classification. Different researchers had contributed to the different
aspects of precision agriculture [31]. Various advances in digital image processing and
ML methods were used for crop leaf disease detection and recognition using their leaf
images [28,32–34]. The literature can be categorized into two subsections. This section
discussed comparatively less intelligent ways of precision agriculture such as pure image
processing or CV-based disease classification and more intelligent ones such as ML-based
task handling during precision agriculture.

2.1. Statistical-Based Approaches


To investigate pure image processing-based approaches toward plant leaf disease
recognition, Xu et al. [35] designed an image recognition-based embedded technique for
wheat leaf rust disease identification, where high-resolution images of wheat leaf rust were
first converted into a single-channel gray-level RGB images then used the Sobel operator
approach to detect vertical edges on a grey image. The background was removed, and a
binary feature point set of diseased spots was extracted. They used a flood-filling algorithm
to filter out the noisy points in the point set. Their framework achieved 92.3% accuracy by
using the technique of pure image processing. However, they targeted only one disease
and the proposed method is not robust when the condition of the field changes. To develop
a more intelligent system, similarly, Islam et al. [36] proposed an approach that combines
image processing and ML for identifying potato diseases from leaf images. They used
color-based image segmentation as a preprocessing step followed by feature extraction by
using statistical features and multiclass SVM classification for the categorization of potato
leaves. They took only 300 images consisting of three potato leaf classes (late blight leaf,
early blight leaf, and healthy potato leaf). Their method achieved 95% accuracy. However,
the dataset was taken from a very small and publicly available plant village. Furthermore,
they used statistical features instead of other available state-of-the-art feature descriptors
for feature extraction. The accuracy can be improved by using other ML techniques. Apart
from this, Alehegn et al. [37] developed a hybrid maize leaf disease classification and
recognition by proposing a hybrid approach to ML. They used preprocessing techniques for
feature extraction such as segmentation. Their dataset consists of 800 images and targeted
Agriculture 2022, 12, 1226 4 of 20

four classes such as the healthy leaf, common rust, leaf spot, and leaf blight. The 80/20 ratio
was considered for training/testing. They obtained the most accurate classification using
SVM when color, texture, and morphology features were combined. As a result, their model
has achieved 95.63% accuracy. Hossain et al. [38] developed an automated SVM-based
ML model for the recognition and classification of tea leaf diseases in Bangladesh. Their
dataset consisted of three classes, two for diseases (brown blight disease and algal leaf
disease), including a healthy leaf class. They used statistical features for feature extraction
for the given model. The suggested technique was able to classify more accurately with 93%
accuracy. However, their proposed framework has many limitations which include limited
samples in the dataset, using only pure statistical features, and their proposed framework
has low accuracy which can be improved by using more models and a big dataset.

2.2. Machine Learning-Based Approaches


ML-based techniques are widely used in various domains such as medical [39,40],
agriculture, and image classification domain. To investigate other ML models and state-
of-the-art feature extractors, Akmal et al. [41] proposed a technique for the recognition
of corn and potato leaf diseases by considering plant village datasets for classification
and validation. They used three feature extractors which include local ternary pattern
(LTP), histogram-oriented gradient (HOG), and segmented fractal texture analysis (SFTA).
On chosen crop diseases, competent results were produced between 92.8 to 98.7 percent
using the multi-class SVM (MSVM) along with the Cubic kernel function, which was
comparatively good and higher than existing methods. Jerome Treboux et al. [42] presented
an ML innovative approach to discriminate vineyards from other agricultural objects. The
dataset is composed of images taken by a drone from 5 vineyards in Valais, Switzerland.
For training and testing purposes, the dataset was divided into two portions. The 90/10
ratio was considered for training/testing. The authors achieved the highest accuracy
of 89.6% from baseline using color analysis. Further, their accuracy improved up to
94.27% after using the decision tree ensemble (DTE) innovative approach of ML. For early
plant disease detection, Rump et al. [43] proposed a framework for sugar beet diseases
using an ML algorithm based on SVM and spectral vegetation indices. Their proposed
framework has achieved an accuracy of up to 97% on discrimination of diseased and
healthy sugar beet leaves. Further, the accuracy for the classification of healthy leaves and
leaves with symptoms of three diseases was above 86%. However, the accuracy is low
while differentiating between more than one disease which can be improved by using a
robust descriptor and ML models. To develop a robust system, Ramesh et al. [44] proposed
an RFC-based ML technique for identification between healthy and diseased papaya leaves.
They have used a histogram of an oriented gradient (HOG) for feature extraction. For the
training of the ML model, they used only 160 images which is a very low number to achieve
high accuracy. Their framework achieved 70% accuracy. However, the achieved accuracy
can be improved by using a vast number of images and state-of-the-art feature descriptors.
Phadikar et al. [45] proposed an ML technique for the classification of rice healthy leaf
and diseased leaves such as blast and leaf brown-spot diseases. For training and testing
of the ML model, they used their dataset which was collected from major rice-producing
areas of India such as district East Midnapur of South Bengal. They performed two phases
of classification: (a) classification of the healthy and diseased leaves. (b) Classification
of various leaf diseases. Bayes’ theorem and SVM classifiers were used and their per-
formance is compared. They achieved 68.1% and 79.5% accuracies for SVM and Bayes’
classifier-based systems, respectively. For the same plant, Prajapati et al. [46] presented
a technique to identify and categorize three diseases of rice plants. For multi-class classi-
fication, they used a support vector machine (SVM). They collected samples from a rice
field in the village Shertha near Gujrat India, to produce their dataset of leaf images. They
tested some feasible image processing and background removal methods. To segment
the diseased part, they tested several segmentation algorithms. For disease segmentation,
K-means clustering with feeding centroid values was used. To extract different features,
Agriculture 2022, 12, 1226 5 of 20

they used three main categories: texture, color, and shape. Their approach achieved up
to 93.33% accuracy. However, their accuracy can be improved further by using a dataset
that has variations (illumination and different other angles) and state-of-the-art models. In
continuation of rice plants diseases identification, Ahmed et al. [47] introduced a model
for rice leaf disease detection using ML techniques. This study focused on three of the
most widespread rice plant diseases which include brown spot, leaf smut, and bacterial
leaf blight. Their dataset consists of 480 images and the 90/10 ratio was considered for
training/testing. They compared four ML techniques (logistic regression, decision tree,
KNN, and Naïve Bayes) with each other, and it was found that decision tree comparatively
performed 97.91% performance. However, they used statistical feature extractors which
are not very robust towards physical changes (illumination structures) and by improving
the dataset, the performance of the model can be improved. To develop a more robust
system, Panigrahi et al. [48] proposed a framework based on ML algorithms which include
SVM, RFC, DT, and KNN used for the detection of various maize crop diseases. The afore-
mentioned classification approaches are tested and compared to determine the best model
with the highest accuracy. In comparison to the other classification approaches, the RFC
algorithm showed good accuracy of 79.23%. The maize datasets are divided into 90% for
training and 10% for testing the whole dataset. The dataset contains 3823 images consisting
of four classes, namely healthy (1162 images), gray leaf spot (513 images), common rust
(1192 images), and northern leaf blight (985 images). However, they trained the models
on poorly captured images which are not sufficient because diseases can affect any part of
the plant. Waghmare et al. [49] proposed a multi-class SVM classification-based machine
learning technique for grape plant disease identification. As preprocessing steps, they used
segmentation to remove the background area. Their research focused on two major diseases
that commonly affect grape plants (black rot and downy mildew). Their dataset consists
of 450 samples (160 healthy leaves and 290 diseased leaves) of grape leaves. A special
texture-based feature is used to extract the segmented leaf texture. They used multi-class
SVM to classify the extracted texture patterns. Their model achieved 96.6% accuracy. How-
ever, the accuracy is low which can be improved. For wheat leaves diseases classification,
Zhao et al. [50] suggested an integral technique based on ML algorithms for leaf-scale
wheat powdery mildew. The proposed framework was evaluated and trained by a hyper-
spectral images-based dataset. Three diagnosis models were constructed which include
SVM, probabilistic neural network (PNN), and RFC. After a comparison of used models
based on accuracy, the best model which was SVM had 93.33% classification accuracy.
However, their accuracy is low and can be improved by using other state-of-the-art models
with various feature descriptors. To improve accuracy, GuanLin et al. [51] proposed a novel
approach to recognize two types of wheat rusts (stripe rust and wheat leaf rust) based
on SVM and multiple feature parameters of their dataset. As preprocessing steps, they
used image cutting, de-noising, and segmentation techniques. Furthermore, they extracted
color- and texture-related features from preprocessed images. As a classification model,
they utilized SVM. The authors achieved an accuracy of 96.67% by using the SVMs with
radial basis function (RBF) on the selected twenty-six features. However, they used only
one feature extractor and the model was trained on an invariant dataset which became less
robust when the physical appearance changed. To improve accuracy, Azadbakht et al. [52]
proposed an ML-based detection of one wheat disease. They developed their dataset based
on canopy scale and under different LAI levels. Their proposed framework identified the
severity level of wheat leaf rust at the canopy scale by using four ML techniques which
include Random Forests Regression (RFR), ν-support vector regression (ν-SVR), Gaussian
process regression (GPR), and boosted regression trees (BRT). They achieved high accuracy
up to 99% using ν-support vector regression (ν-SVR). However, the experiments were
performed only on one disease and their focus was only on the severity of one disease.
In Table 1, we tabulate the summary of the above-discussed related work. Different
techniques were proposed by researchers for various crop leave disease recognitions in the
ML domain which include maize, rice, tea, vineyard, and our focus plant wheat. They used
Agriculture 2022, 12, 1226 6 of 20

various methods for preprocessing, commonly resizing, de-noising, and cropping; some
of them used segmentation for other than wheat plant leaves as preprocessing steps. For
feature extractions, they used statistical and CV-based techniques. Recognition has been
done by algorithms that include pure statistical as well as ML approaches such as SVM
and RFC. However, we found deficiencies in the existing works about wheat leave disease
recognition and classification using ML methods in terms of unavailability of diverse
datasets, robust preprocessing techniques, and accuracy of the framework. Therefore,
we bridge this gap by proposing a framework for common wheat diseases recognition
which proved to have a comparatively high accuracy in the result of the collected dataset,
preprocessing steps such as masked-based segmentation, feature descriptors, and the
proposed fine-tuned RFC framework, specifically for wheat common disease recognitions.

Table 1. Summary of the literature and at the end the results of our proposed framework is marked
bold for comparison.

Algorithms/
Article Crop Preprocessing Features Accuracy
Models
Conversion images to
G single gray Binary features point Flood filling
Xu et al., 2017 [35] Wheat 92.3%
RGB model, set algorithm
background removal
Color based
Islam et al., 2017 [36] Potato Statistical features SVM 95%
segmentation
Texture and
Alehegen et al., 2019 [37] Maize Segmentation SVM 95.63%
morphological
Image resizing and
Hossain et al., 2018 [38] Tea Statistical Features SVM 93%
cropping
Aurangzeb et al., 2020 [41] Corn and Potato Image resizing LTP, HOG, SFTA MSVM 92.8% and 98.7%
Morphological
First order statistic,
Treboux et al., 2018 [42] Vineyards operation (Opening DTE 94.275%
Tamura, Haralick
and closing)
Image resizing, Physiological
Rumpf et al., 2010 [43] Sugar beet SVM 97%
Clustering parameters
Image resizing and
Ramesh et al., 2018 [44] Papaya HOG RFC 70%
Normalization
Enhancement via
Phadikar et al., 2012 [45] Rice mean filters and Colors descriptors SVM, NB 68.1% and 79.5%
segmentation
Back removal, Texture, Color,
Prajapati et al., 2017 [46] Rice SVM 93.33%
segmentation and shape
Pure statistical
Ahmed et al., 2019 [47] Rice Augmentation DT 97.91%
features
Resizing, denoising, Grayscale pixel NB, KNN, DT, 79.23% (highest
Panigrahi et al.,2020 [48] Maize
segmentation values SVM and RFC with RFC)
Waghmare et al., 2016 [49] Graphs Back removal Texture SVM 96.6%
Image smoothing via Disease level of
SVM, PNN,
Zhao et al., 2020 [50] Wheat S-G filter and severity, and affected 93.33%
and RFC
derivative function leaf spots
Li et al., 2012 [51] Wheat Cropping, denoising Colored and texture SVM with RBF 96.67%
Disease severity level,
Azadbakht et al., 2019 [52] Wheat Noise reduction leaf area index, and V-SVR, and RFR 99% and 79%
pixel values
Haralick texture,
Resizing, Masked
Proposed framework Wheat color histogram, and Fine-tuned RFC 99.8%
based segmentation
hue moments
Agriculture 2022, 12, 1226 7 of 20

3. Materials and Methods


In this section, the proposed approach for various wheat disease recognition and
classification is discussed. For better understanding, this section is divided into three
subsections. Section 3.1:Real-Time Data Collection, Section 3.2: Data Preprocessing and
Features Extraction3.3: Proposed Fine-Tuned Framework. These steps are comprehensively
discussed below in the relevant subsections. The pictorial representation of the overall
work is given in Figure 1, where first step is the collection of data, data are collected from
different wheat fields, and a new database from the data is made. In the second step, the
collected data are divided into three different classes which include healthy, rusted, and
yellow-rusted leaves. Furthermore, the data of different classes are divided into training
and testing with a ratio of 8:2, respectively. In the next step, different ML models are
trained which are discussed below in detail. For better evaluation, our model performance
Agriculture 2022, 12, x FOR PEER REVIEW 8 of 22
is evaluated using test data and calculated different evaluation matrices such as accuracy,
confusion matrix, precision-recall, and F1 score on unseen data. In this step, the remaining
20% of data that are allocated for the training phase is loaded and passed from the given
preprocessing and
preprocessing and feature
feature extraction
extraction steps,
steps, as in the
as in training phase,
the training phase, and
and observed
observed during
during
the prediction
the prediction of
of the
the model.
model.

Figure 1. Framework: for better understanding, the whole framework is divided into three sub-
Figure 1. Framework: for better understanding, the whole framework is divided into three sub-
sections which include: (1) data preprocessing and feature extraction. The images from wheat
sections which include: (1) data preprocessing and feature extraction. The images from wheat fields
fields are collected, and our dataset consists of three classes, including healthy, rusted, and yellow-
are collected,
rusted leaves.and
Thisour dataset
contains consists which
sub-steps of threeinclude
classes,preprocessing
including healthy,
whichrusted, and yellow-rusted
preprocess the data by
leaves.the
using This contains
resizing andsub-steps
segmentation which include preprocessing
techniques which preprocess
and feature extraction the data
where extracted by using
features
the resizing
from and segmentation
preprocessed techniques
data using different anddescriptors
feature feature extraction
are used.where
(2) Theextracted features
second one is modelfrom
preprocessed
training, data 80%
in which usingofdifferent
data loads feature
fromdescriptors
the datasetare for used. (2) purposes.
training The second one issix
Lastly, model training,
different
ML models
in which 80%areoftrained. (3) The
data loads from third step is testing,
the dataset after purposes.
for training training models,
Lastly,these are evaluated
six different ML modelsby
checking the(3)
are trained. performance of models
The third step on unseen
is testing, data. As
after training a testing
models, load,
these arethe remaining
evaluated by 20% of thethe
checking
data are usedofand
performance passed
models on from
unseenpreprocessing and feature
data. As a testing extraction
load, the remaining steps,
20%asofsame as applied
the data are usedinand
the training phase, and the performance of the model is observed.
passed from preprocessing and feature extraction steps, as same as applied in the training phase, and
the performance of the model is observed.
3.1. Real-Time Data Collection
For the Integration of computer vision technology in agriculture and facilitation of
plant disease diagnosis, researchers developed various open-access datasets, such as plant
village datasets containing over 50,000 images of different plant species with diseases an-
Agriculture 2022, 12, 1226 8 of 20

3.1. Real-Time Data Collection


For the Integration of computer vision technology in agriculture and facilitation of
plant disease diagnosis, researchers developed various open-access datasets, such as plant
village datasets containing over 50,000 images of different plant species with diseases
annotated by field experts. However, it does not contain sufficient wheat disease data. To
the best of our knowledge, there is no available and appropriate open-access dataset of
wheat leaves. Thus, over 3000 images for three different and important classes (Healthy,
Rusted, and Yellow-rusted) of wheat leaves are collected from the actual fields of two
geographical and environmentally different places of Pakistan, including Peshawar city
located in the east of Khyber Pass, mostly having very warm weather up to 40 ◦ C in summer
and Dir located near the Lowari pass in the Khyber Pakhtunkhwa which has mostly normal
temperature in summer up to 32 ◦ C. Moreover, the dataset is collected through smartphone
cameras with resolutions of 1024 × 768. However, the images are resized using the inter-
area interpolation technique in a preprocessing step to reduce computation time which is
discussed in the next section. In addition, the data are equally distributed in three classes
(healthy, rusted, and yellow-rusted leaves), each class contains 1050 images of wheat leaves.

3.2. Data Preprocessing and Features Extraction


For comprehensive understanding, this section is divided into two subsections which
include preprocessing and feature extraction. The details of each subsection are given in
the following.

3.2.1. Preprocessing
Preprocessing is a very important step in ML. It helps remove unwanted data and
reduce the computation time during the training and testing of models. Our preprocessing
is shown in Figure 2, which includes two techniques. The first technique in preprocessing
is resizing which is the adjustment of the sizes of images without having to take anything
out. To make image processing systems more accurate and run faster, high-resolution
images are nearly always down-sampled. In this work, the INTER_AREA interpolation
method is used for the scaling of images, and each image is resized from 1026 × 768 to
250 × 250. The objective of the interpolation function is to take these neighboring areas
of pixels and use them to expand or reduce the image’s size. In general, it is considerably
better to reduce the image size using the interpolation method. This is due to the fact
that the interpolation function removes pixels from an image. Inter-area is the desired
approach for image reduction because it produces moire free results. This step improved
our system performance in terms of computational complexity and accuracy. The second is
Image segmentation which is the process of segmenting an image into clusters based on the
similarity found in the intensity values of the input image. For instance, the pixel values in
wheat leaves that are similar to the affected region will belong to the affected cluster and
the rest of the region will consider the healthy part. This process seems simple apparently,
but it becomes very difficult when some pixel values laying on the boundary such as
boundary spots of brown rust. The brown-rusted pixels on the boundary sometimes are
much close to healthy region values; so, the decision about it becomes difficult, whether to
consider it as the part of affected, healthy or accommodate it into another region. The Fuzzy
set-based ideas allow us to deal with these situations. For example, there is a set consisting
of N number of elements, and one needs to divide these into C number of clusters. Each
element of the set will have a C number of membership values according to C clusters. So,
an element will be the part of that cluster C which have the highest membership value.
The intensity values of the wheat leaf images as a set of N number of pixel values and
C-means clustering algorithms are used. This works with the same concept and perform
this process is iteratively performed for each pixel value using the following formula given
in Equation (1).
N C
m
∑ ∑ uij
2
Jm = xi − c j (1)
i =1 j =1
Agriculture 2022, 12, 1226 9 of 20

where N represents image intensity values, C is the number of clusters and m is any real
number greater than 1. While u shows the degree of membership in cluster j, xi is the
ij
current value in the image, c j and is the center of a specific cluster. In our case, the m value
is taken 2 and achieves the highest performance when the number of clusters was 3. The
Agriculture 2022, 12, x FOR PEER REVIEW 10 of 22
images were passed from the proposed algorithm and the colors were assigned to various
parts of the leaves. Generally, diseased parts of the image have greater intensities than
healthy parts.

Figure2.
Figure 2. In image segmentation,
segmentation, the
thefirst
firstcolumns
columnsare
arethe
theinput
inputimages,
images,and the
and second
the column
second column
showsthe
shows thesegmented
segmentedimage
imageof
ofthe
thecorresponding
correspondinginput
inputimage.
image.

Therefore,
Therefore,we weassigned
assignedthethehighest
highestintensity
intensityvalues
valuesto
tothe
thediseased
diseasedspots
spotsand
andmade
madeaa
cluster
clusterfrom
fromit,
it,and
andthethe rest
rest of
of the
the parts
parts are divided
divided into the boundary and healthy clusters.
As
Asthe
theresult of of
result segmentation,
segmentation, more suitable
more and highlighted
suitable imagesimages
and highlighted are obtained, as shown
are obtained, as
in Figure
shown in2Figure
for further
2 for processing.
further processing.

3.2.2.
3.2.2. Feature
FeatureExtraction
Extraction
Feature
Feature extractionisisone
extraction oneofof
thethe
most important
most steps
important for ML-based
steps model
for ML-based development.
model develop-
The performance of ML algorithms depends on the extracted features. If extracted features
ment. The performance of ML algorithms depends on the extracted features. If extracted
are relevant to the ROI, ML classifiers can differentiate among classes with high accuracies.
features are relevant to the ROI, ML classifiers can differentiate among classes with high
The basic idea of feature extraction is to extract only those features which have a high
accuracies. The basic idea of feature extraction is to extract only those features which have
weight in terms of the representation of an object and reduce computational complexity
a high weight in terms of the representation of an object and reduce computational com-
by avoiding further processing of less meaningful features. Many descriptors are used by
plexity by avoiding further processing of less meaningful features. Many descriptors are
researchers for varieties of features that include texture, shape, and color feature descriptors.
used by researchers for varieties of features that include texture, shape, and color feature
As shown in Figure 1, three relevant feature descriptors are used which include Histogram
descriptors. As shown in Figure 1, three relevant feature descriptors are used which in-
of Oriented Gradient (HOG), particularly designed for shape extraction, Local Binary
clude Histogram of Oriented Gradient (HOG), particularly designed for shape extraction,
Pattern (LBP) which is mostly used for texture feature extraction, Hue- Moment (HM)
Local Binary Pattern (LBP) which is mostly used for texture feature extraction, Hue- Mo-
which is a statistical descriptor using for shape feature extraction, Color Histogram (CH) is
ment (HM) which is a statistical descriptor using for shape feature extraction, Color His-
togram (CH) is used for color features extraction, and Haralick Texture (HT) that is using
14 features as textures. Each of the descriptors is used separately, and three (HM, HT, and
CH) of them are combined for performance analysis. After detailed experiments, it is
found that the combination of these three feature descriptors is effective. In terms of test-
Agriculture 2022, 12, 1226 10 of 20

used for color features extraction, and Haralick Texture (HT) that is using 14 features as
textures. Each of the descriptors is used separately, and three (HM, HT, and CH) of them
are combined for performance analysis. After detailed experiments, it is found that the
combination of these three feature descriptors is effective. In terms of testing accuracy,
these three feature descriptors are explained below in detail.

Hue Moments (HM)


Hue moments are descriptors used for shape feature extractions, accepting gray and
binary input images. In our case after the image segmentation, the healthy and diseased
areas become distinguished and the shape of the segmented clusters could be helpful in
the identification of the diseases. For that simple translation, invariant shape features are
calculated using Equation (2).

µij = ∑ ∑(x − x)i (y − y) j I (x, y) (2)


x y

where x and y represent the location of the pixel connected to the object region while x and
y represent the centroid of the shape of the object. Centroid is considering the center of the
mass which is calculated using Equations (3) and (4), where M00 the area is while M10 and
M01 represent coordinates of the center of a shape.

M10
x= (3)
M00

M01
y= (4)
M00
To equip the central HM with scale invariance. It is very essential to normalize the
central moment. The normalized version of the main moment is calculated by using
Equation (5).
µi,j
ηij = (i+ j)/2+1 (5)
µ00
Now, the central moment is a translation and scale-invariant, however, it is not enough
for robust shape matching. The central moment must be rotation- and reflection-invariant
along with scale and rotation invariance. So, the following seven moments are calculated
using Equations (6)–(12) and defined as hn where n is the number of moments.

h0 = η20 + η02 h1 (6)

h1 = (η20 − η02 )2 + 4η11


2
(7)
2 2
h2 = (η30 − 3η12 ) + (3η21 − η03 ) (8)
2 2
h3 = (η30 + η12 ) + (η21 + η03 ) (9)

h4 = (η30 − 3η12 )(η30 + η12 )[(η30 + η12 )2 − 3(η21 + η03 )2 ] + (3η21 − η03 )[3(η30 + η12 )2 − (η21 + η03 )2 ] (10)

h5 = (η20 − η02 )[(η30 + η12 )2 − (η21 + η03 )2 + 4η11 (η30 + η12 )(η21 + η03 )] (11)

h6 = (3η21 − η03 )(η30 + η12 )[(η30 + η12 )2 − 3(η21 + η03 )2 ] + (η30 − 3η12 )(η21 + η03 )[3(η30 + η12 )2 − (η21 + η03 )2 ] (12)

These seven invariants of the moments describe the shape of objects as a 7D vector
which is used in this work as a feature with the concatenation of other features in the
training of the ML model.
Agriculture 2022, 12, 1226 11 of 20

Color Histogram (CH)


Color has a very important role in image recognition as, in our case, the affected area
of leaves has the color yellow or brown, and healthy areas are green. So, due to this, color
features along with other features are considered to attain better performance. Our collected
dataset was in RGB in which an individual channel contains various information. The red
channel contains high, and the blue contains less information. To separate illumination
from the color RGB to HSV, the model is converted to build a histogram of illumination. The
histogram is a statistical representation of visuals that calculate a histogram by summation
of occurrences of illumination levels. The histogram transforms image intensities into
frequencies without considering the coordinates of the pixels. Due to this property, a
color histogram becomes rotation invariant. The HSV image histogram is calculated using
Equation (13). The histogram is calculated for each channel and then concatenated after the
summation of the intensities in each channel.
" #
L L L
{ H, S, V } = ∑ i H ∑ is ∑ iv (13)
i =0 i =0 i =0

where L denotes levels of intensities and i H , is and iv represent frequencies of levels in H, S,


and V channels, respectively. After calculating the histogram, it is converted into a vector
and considered a color-based feature for input images.

Haralick Texture (HT)


Texture features are also always considered very important in CV. Texture features
show the properties of the surface whether the surface is smooth, rough, or how a pattern
exists on the surface. In automatic plant health analysis, texture features have very im-
portant, i.e., in our case, healthy and yellow leave surfaces are mostly smooth; however,
brown-rusted ones have high roughness. To keep aware of our system of the texture
of leaves, texture features are calculated by using the HT descriptor. HT calculates the
total of 14 features, however, we selected only four (contrast, correlation, uniformity, and
homogeneity) of them due to robustness properties. Contrast features return the variations
in the intensity of the adjacent pixels of input images. If the image has no variations, then
the contrast will be zero. Correlation is the property of a grayscale image that returns how
much the adjacent pixel is interrelated to each other. The value of correlation can be 1, −1,
or Nan. Uniformity shows how much the surface of the image is uniform. It is calculated
by taking the square of the summation of the gray level co-occurrence matrix (GLCM).
Homogeneity calculates intimacy between adjacent pixels. At the end of these features, we
made a vector and considered it as a texture feature vector. Contrast, correlation, uniformity,
and kurtosis are calculated using Equations (14)–(17), respectively.

Gl −1 Gl −1
Con = ∑ ∑ (||i − j||2 G (i, j) (14)
i j

Gl −1
1
Cor = ∑ σi σj
[{i.j × G (i, j)} − µi µ j ] (15)
i,j=0

Gl −1 Gl −1
Uni = ∑ ∑ G(i, j)2 (16)
i j

Gl −1 Gl −1
−1
Homo = ∑ ∑ {1 + (i, j)2 } G (i, j) (17)
i j
Agriculture 2022, 12, 1226 12 of 20

where the texture of an image is stored in a matrix I (i, j) called GLCM. So, here, in these equations,
Gl represents the total number of gray levels of an image. After extraction, these features are
normalized, using skewness, mean, and kurtosis as defined in Equations (18) and (19).

E( xi 3 ) − 3µσ2 − µ3
Skew = (18)
σ3
"  #
Xi − µ 4
Kur = E (19)
σ

where Skew and Kur represent skewness and kurtosis, respectively. Further, in these
equations, E denotes the expected mean and Xi normalized scale matrix. After extracting
the above features, these are fused into one matrix and passed for model training in the
next module.

3.3. Proposed Fine-Tuned Framework


The major goal of our study is to develop an efficient recognition system that can auto-
matically identify and classify wheat diseases. Therefore, machine vision is a technology
that has the ability to substitute human inspectors in order to achieve automatic evaluation
and diagnosis of wheat diseases, thus providing objective inspection results. To achieve the
goal of an efficient recognition framework, a comprehensive baseline study was conducted
as discussed in Section 3.4. In the experimental evaluations, the RFC performance achieved
the best result as compared to other baseline models, and it is therefore recommended
to fine-tune the RFC model in order to achieve the best or desired performance. RFC is
mostly used to solve classification problems and its basic workflow consists of randomly
selected samples from the provided dataset and then constructing a mathematical tree for
decision during prediction for each sample and later providing a vote for each predicted
sample. The model chooses the high vote decision as a final prediction. Furthermore, RFC
does not require huge memory and can be parallelized while training on the multi-cores of
the computer, which speeds up the training process. RFC mainly consists of DTs where it
randomly selects several features from the subset of the input features and further processes
it to decide a class for the input sample. A tree in the RFC consists of three components,
including a root node from where the random forest starts. The decision node comes after
splitting the root node and the leaf node which indicates the end of a tree.
n
GI = 1 − ∑ J =1 p j
2
(20)

In addition, the RFC performance depends on various hyperparameters which include


the number of trees, feature selection method, tree depth, complexity handling, etc. After
performing the experiments and to keep the trade-off between model computational
complexity and accuracy, we choose only 100 trees for the final model. For the subset of
feature selection, we use different techniques which include Gini, entropy, and log loss.
However, due to simplicity and less computational complexity, we choose the Gini as
presented in Equation (20) where GI represents the Gini index probability for each feature
represented by p. For the depth of the tree leaves, the node expands until all leaves are
pure. Another important attribute of the RFC is bootstrap which considers only a specific
part of the data to construct trees. As mentioned above, the wheat disease class seems very
similar due to color similarities such as brown and yellow rust. Thus, we keep bootstrap
false to consider all input data to construct trees. Besides this cost complexity, a pruning
parameter called CCP alpha is used to choose only those trees which have minimum
computational complexity.
Furthermore, the key contribution of the proposed fine-tuned RFC is the self-collected
data and its preprocessing techniques where a proposed segmentation technique is applied
for the region of interest extraction, in order to reduce the computation complexity. Ad-
ditionally, our selected most relevant feature descriptors perform well in accuracy which
Agriculture 2022, 12, 1226 13 of 20

plays a major role in the proposed lightweight and accurate framework. If we compare our
proposed fine-tuned RFC with other ML models such as neural networks, it can perform
well when handling classification tasks but most of them are computationally expensive.
So, with this set of the RFC model, we achieve the high performance presented in the
result section.

3.4. Comparative Analysis of Baseline Models


Baseline model analysis can show the effectiveness of the proposed fine-tuned RFC. In
our study, it is very significant to evaluate the performance of our model and comparatively
analyze the results on deployed data, as we are also proposing a new dataset. Therefore,
apart from the proposed fine-tuned RFC, we consider five other ML models adjusting for
baselines. The risk of increasing unnecessary model complexity is checked and reduced
accordingly after the detailed comparative analysis of these baseline models. Hence, several
experiments are done on the selected models, including Logistic Regression (LR), KNN,
Decision Tree (DT), NB, and SVM, as given in Table 2. The LR model is used to assign
observations to discrete sets of classes. The logistic sigmoid activation function transforms
the result of LR to return a probability value which can then be mapped to three classes. It
is used to solve our multi-class classification problem through the “one vs all” technique.
LR achieved 89.6% accuracy on the given dataset. KNN is also used by many researchers
for solving crop classification problems as mentioned in Table 1. This algorithm works on
“feature similarity”, which means that values will be assigned to a data point based on
how nearly it resembles the points in the training set and it has reached 99.0% accuracy,
which is the third one in Table 2. The DT technique is also included in our experiments,
it starts with the root node and grows into a tree-like structure with more branches. The
purpose of this technique is to develop a mathematical model that predicts the value of the
targeted classes, and the decision tree solves the problem by using the tree representation.
Decision nodes are used for making the decision and have several branches whereas leaf
nodes are the result of those ML decisions and do not contain any other branches. DT
achieved 99.2% performance and ranked as runner-up results after our proposed fine-tuned
RFC. Furthermore, NB algorithms are also used to compare the results. NB uses Bayes
theorems to create a set of classification algorithms known as Naïve Bayes classifiers. It is a
family of algorithms that share a similar concept, i.e., each pair of features is categorized as
independent of the others. It is a probabilistic technique, which makes predictions based
on an object’s probabilities. Additionally, SVM is used for comparing the results, as it
is the most widely used and popular supervised ML algorithm, which is used for both
classification and regression problems. According to the literature, SVM is mostly used
for classification problems in machine learning. The SVM algorithm’s objective is to find a
hyperplane in N-dimensional space (n is the number of features) that classifies distinctly
the data points. Several hyperplanes are selected to split the three classes of data points
and it has achieved 94.6% accuracy. After a comparative analysis of Table 2, it has been
observed that the proposed fine-tuned RFC model achieved 99.8% accuracy on the given
three classes dataset.

Table 2. Baseline Models Experimental Analysis.

Model LR SVM NB KNN DT Proposed Framework


Accuracy (%) 89.6 94.4 97.7 99.0 99.2 99.8

4. Experimental Results
In this section, we discuss the experimental setting, collected dataset, evaluation
metrics, and evaluation of the performance of our proposed framework. Furthermore, we
elaborate on the performance of trained models comparatively.
Agriculture 2022, 12, x FOR PEER REVIEW 15

Agriculture 2022, 12, 1226 In this section, we discuss the experimental setting, collected dataset,
14 of evaluation
20
rics, and evaluation of the performance of our proposed framework. Furthermor
elaborate on the performance of trained models comparatively.

4.1. Experimental
4.1.Settings
Experimental Settings
All the experiments are carried out on a computer system with specifications of
All the experiments are carried out on a computer system with specifications of I
Intel® Xeon® X5560 processor with 2.80 GHz clock speed and installed memory (RAM)
Xeon® X5560 processor with 2.80 GHz clock speed and installed memory (RAM) 8.00
8.00 Giga bite and GPU of GTX GFORCE 1070 are installed. In addition, the Microsoft
bite and GPU of GTX GFORCE 1070 are installed. In addition, the Microsoft Win
Windows operating system are used. Apart from this, different libraries are utilized for
operating system are used. Apart from this, different libraries are utilized for the im
the implementation of our project which includes python 3.7 as a programming language,
mentation of our project which includes python 3.7 as a programming language, Op
OpenCv version 3.4 for preprocessing which is a CV library, and scikit learn ML library
version 3.4 for preprocessing which is a CV library, and scikit learn ML library ve
version 0.24.2 is used for training and testing of various ML models. Matplotlib is a python-
0.24.2 is used for training and testing of various ML models. Matplotlib is a python-b
based visualization library for the visualization of various images, results, and graph
visualization library for the visualization of various images, results, and graph genera
generation. Furthermore, the OS glob library is utilized for reading different files from the
Furthermore, the OS glob library is utilized for reading different files from the hard d
hard drive.

4.2. Dataset 4.2. Dataset


After
After collecting thecollecting
data, the the data,are
images theresized
imagesusing
are resized using the
the inter-area inter-area interpol
interpolation
technique in atechnique in a preprocessing
preprocessing step to reducestep to reduce time
computation computation
which istime whichinis the
discussed discussed i
previous section. In addition, the distribution of the wheat leaf
previous section. In addition, the distribution of the wheat leaf dataset consists of threedataset consists of
classes (healthy, rusted, and yellow-rusted leaves) each class contains
classes (healthy, rusted, and yellow-rusted leaves) each class contains 1050 images of wheat 1050 imag
wheat leaves.
leaves. Furthermore, FigureFurthermore,
3 shows image Figure 3 shows
samples from image samples
different fromthe
classes: different classes: the
first row
rowsamples,
contains healthy containsthe
healthy
secondsamples, theand
is rusted, second is rusted,
the last and the last
row visualized therow visualized
yellow rust the y
rust class
class of the wheat of the wheat diseases.
diseases.

Figure
Figure 3. Samples 3. Samples
of images. Theoffirst,
images. The first,
second, second,
and third andshow
rows thirdhealthy,
rows show healthy,
rusted, rusted, and yello
and yellow
rusted leaves of wheat diseases, respectively.
rusted leaves of wheat diseases, respectively.

4.3. Evaluation4.3. Evaluation Metrics


Metrics
Whenofthe
When the training alltraining
models of all models
became becameascompleted,
completed, as shown
shown in Figure in Figure
1, then, the 1, then
remaining
remaining unseen 20% dataunseen
of the20%
wholedata of theare
dataset whole dataset
passed fromare passed from
preprocessing andpreprocessing
feature and
extraction phases,
ture and, then, extracted
extraction features
phases, and, then,are classified,
extracted using different
features trained
are classified, models
using different tra
as discussed above
models in as
detail. The models
discussed aboveare evaluated
in detail. Thebased
modelsonare
accuracy, precision,
evaluated based recall,
on accuracy, p
and F1 score. sion,
All these performance
recall, and F1 score.evaluation
All these values are calculated
performance evaluationfrom the confusion
values are calculated from
matrix using Equations (21)–(24),
confusion matrix respectively.
using Equations These equations
(21)–(24), containThese
respectively. different valuescontain d
equations
which were acquired from the confusion matrix of each model. These magic values
ent values which were acquired from the confusion matrix of each model. These m include
True Positive (TP)
valueswhich shows
include Truehow many(TP)
Positive times our model
which showsclassified
how many thetimes
positive
our testing
model classifie
data sample as positive,
positive True Negative
testing data sample(TN)as shows how
positive, many
True times our
Negative (TN)model
shows classified
how many time
the negative class
model correctly, False
classified the Positive
negative(FP),
classand False Negative
correctly, (FN) show
False Positive the rate
(FP), and FalseofNegative
misclassification of different positive classes. As a result, we found the highest accuracy
Agriculture 2022, 12, 1226 15 of 20

which is 99.8% of our proposed framework followed by DT which is 99.2%. On other hand,
the testing accuracy of LR is observed lower than all of the others which were 89.6%.

TP + TN
Acc = (21)
TP + TN + FP + FN
TP
Precision = (22)
TP + FP
TP
Recall = (23)
TP + FP
2 ∗ ( Precision ∗ Recall )
F1 − score = (24)
Precision + Recall

4.4. Results
To evaluate the performance of our framework, different experiments on the training
dataset are conducted. All testing images were selected randomly. The size of the testing
dataset was 20% of our whole dataset which consists of 210 images for each category
(healthy, rusted, yellow-rusted). Furthermore, based on the testing set, it is observed that
different evaluation graphs include confusion matrix, overall accuracy, and performance
evaluation graph. These graphs contain precision, recall, and F1-score. All of these are
discussed below in detail.
In these experiments, a comparative analysis of the performance of ML models is
conducted on the combined extracted features of colors, shape, and harlick textures of
wheat leaves. Accuracy is calculated using Equation (21). We divided the total accurate
prediction of trained models by the total number of testing samples of the allocated dataset
for testing. 20% of the whole dataset is split for testing as mentioned in the above relevant
section. Calculated accuracies of all the trained models are mentioned in Figure 4, in which
the proposed framework has the highest accuracy of 99.8% followed by DT (99.2%) and
KNN (99.0%). On the other hand, the accuracy of the LR is 89.6% which is lower than
Agriculture 2022, 12, x FOR PEER REVIEW
all the mentioned models. However, SVM and NB achieved accuracies up to 94.0% 17 and
of 22
97.7%, respectively.

Figure 4. Accuracy graph of all six trained models: the proposed framework has the highest accu-
Figure 4. Accuracy graph of all six trained models: the proposed framework has the highest accuracy
racy followed by DT while LR has the lowest, and all others have a performance value between
followed by DT while LR has the lowest, and all others have a performance value between these two.
these two.

For better evaluation and verification of our experimental results, we calculated var-
ious evaluation values from the confusion matrix. These values include precision, recall,
and F1-score. These metrics are calculated using Equations (22)–(24), respectively. All
these values are depicted in Figure 5, where our proposed framework has achieved the
Figure 4. Accuracy graph of all six trained models: the proposed framework has the highest accu-
racy followed by DT while LR has the lowest, and all others have a performance value between
these two.

For
better evaluation and verification of our experimental results, we calculated
Agriculture 2022, 12, 1226 var-
16 of 20
ious evaluation values from the confusion matrix. These values include precision, recall,
and F1-score. These metrics are calculated using Equations (22)–(24), respectively. All
For betterinevaluation
these values are depicted Figure 5,and verification
where of our experimental
our proposed framework results,hasweachieved
calculatedthe
various evaluation values from the confusion matrix. These values include precision, recall,
highest 99.8% precision,
and F1-score. recall, and F1
These metrics score. These
are calculated results show
using Equations that
(22)–(24), almost all
respectively. testing
All these
samples are classified
values arecorrectly
depicted inby our5,proposed
Figure framework
where our proposed whichhasisachieved
framework secondly the followed
highest
by DT with 99.2% for all values. This means that only 1% of the data of each class isare
99.8% precision, recall, and F1 score. These results show that almost all testing samples mis-
classified correctly by our proposed framework which is secondly followed by DT with
classified. However, LR has lower 91.0%, 90.0%, and 90.0% for these values, respectively.
99.2% for all values. This means that only 1% of the data of each class is misclassified.
Although, the performance
However, LR has oflower
other91.0%,
used90.0%,
ML models
and 90.0% isfor
between the range
these values, of ourAlthough,
respectively. proposed
framework andthe LR. All models
performance except
of other themodels
used ML LR model classified
is between the rangethe class
of our of yellow
proposed images
framework
and LR. All models except the LR model classified the class
correctly due to clear pixels in the input images. However, other classes of images areof yellow images correctly
due to clear pixels in the input images. However, other classes of images are misclassified
misclassified because these classes have high minimum differences due to light and dark
because these classes have high minimum differences due to light and dark disease spots
disease spots on onwheat leaves.
wheat leaves.

Figure 5. Performance
Figure 5.evaluation
Performancebased on precision,
evaluation recall,recall,
based on precision, and andF1-score:
F1-score:the
theproposed frame-
proposed framework
work followed byfollowed
DT has bythe
DT highest
has the highest performance,
performance, however,LR
however, LRhashas the
the lowest
lowestprecision, recall,
precision, and
recall,
and F1 score. F1 score.

The confusion matrices for each model are computed that show us the accurate and
wrong predictions for each class of testing samples. The confusion matrix consists of four
values which include TP, TN, FP, and FN in which TP and TN show the accurate prediction
while the other two values show the rate of false predictions against the ground truth.
Confusion matrices for the proposed framework which has higher accuracy followed by
DT and lowest accuracy RL models are tabulated in Table 3. The confusion matrix of
the proposed framework verifies the best performance because, as shown, the proposed
framework has classified all testing samples correctly except only one rusted sample is
classified as healthy. On the other hand, DT has the second-highest accuracy due to only
5 testing samples being misclassified. Further, the Performance of LR is lower because
10.4% of the testing samples are misclassified instead of their corresponding classes.
Agriculture 2022, 12, 1226 17 of 20

Table 3. Confusion matrices of the trained models.

The Proposed Framework


Predicted Classes ↓
Healthy Rusted Yellow rusted
Actual Classes→
Healthy 215 1 0
Rusted 0 201 0
Yellow Rusted 0 0 212
Overall Accuracy (%) 99.8
DT
Predicted Classes ↓
Healthy Rusted Yellow rusted
Actual Classes→
Healthy 213 1 1
Rusted 0 205 1
Yellow Rusted 1 1 207
Overall Accuracy (%) 99.2
LR
Predicted Classes ↓
Healthy Rusted Yellow rusted
Actual Classes→
Healthy 167 48 0
Rusted 2 204 0
Yellow Rusted 13 2 194
Overall Accuracy (%) 89.6

Additionally, for a fair evaluation, the proposed framework is tested on independent


data collected from online resources. Moreover, as we have mentioned, a suitable dataset
is not available for the wheat leaves; therefore, we collect around 50 images of the three
mentioned classes from google images and evaluate the model on it. Results of this
evaluation are given in Table 4 of the updated manuscript, where the model has achieved
around 88% for all classes. Here, the accuracy was found low because of the low and weird
quality of the images.

Table 4. Performance of the proposed framework on independent data.

Class Total Testing Images Accurate Prediction


Healthy 16 13
Yellow 18 12
Rusted 16 11

The result of the suggested method has been compared against three state-of-the-art
approaches, and the findings are shown in Table 5. The data are arranged in the table
year-wise. The authors proposed methods for wheat disease classification using different
techniques such as [52], used texture features and classified it with four ML models achieved
99% accuracy. In [50], the authors used the affected area divided by the total area of the
wheat leaves formula for the finding of a disease as a feature and classified it using three
ML models which include SVM, PNN, and the proposed framework, and achieved 93.33%
accuracy. Further, in [30], the authors used color, texture, and their combination as a feature
and classified them using the EMMC matrix. They achieved 94.16 % accuracy. On the other
hand, our proposed framework achieved 99.8% accuracy in the result of a comparative
analysis of different ML models with the proposed RFC model. We achieved this accuracy
by concatenating the three descriptor features which include Haralick-texture, Color-
Agriculture 2022, 12, 1226 18 of 20

histogram, and Hue-moment. The use of these techniques has increased in recent years,
as researchers utilized them in high-level research for forecasting [53], age estimation [54],
and time-series analysis [55].

Table 5. Comparison with SOTA papers where our work achieved the highest accuracy, using the
proposed framework on the extracted features of three descriptors.

Authors Year Features Classifiers Accuracy (%)


Azadbakht et al. [52] 2019 Texture SVR, RFR, GRR, BRT 99
Zhao et al. [50] 2020 Diseased area/total area of leaf SVM, PNN, RFC 93.33
Elliptical-Maximum Margin
Color, texture, and combination
Bao et al. [30] 2021 Criterion (E-MMC) 94.16
of these two
metric learning
Haralick-texture,
Proposed method 2022 Color-histogram, Hue-moment, Fine-Tuned RFC 99.8
LBP, HOG

5. Conclusions and Future Work


This paper presented a proposed framework for wheat diseases using the ML approach.
We have collected high-quality images of wheat leave diseases that include brown-rusted
and yellow-rusted ones with the addition of a healthy leave class from different fields of
Pakistan to evaluate the proposed system. For accurate preprocessing, segmentation and
resizing techniques are used. Various features such as shape, color, and texture features are
extracted using different feature descriptors and combined. Six ML models are trained on
extracted combined features of segmented images for comparative analysis of the SOTA
ML model’s performance. After the comparative analysis, the proposed fine-tuned RFC
model is observed with better accuracy. The performance of the proposed approach is
evaluated using unseen data and a variety of evaluation matrices which included accuracy,
precision, recall, and F1-score. For further evaluation, a comparison of our approach and
the existing SOTA approaches is conducted. As a result, our method is observed to be more
accurate in the recognition and classification of wheat diseases than existing approaches.
In the future, our objective is to extend the developed dataset to more wheat disease
classes and incorporate the functionality of suggestion of treatment for corresponding
detected disease. Furthermore, our aim is to deploy that model on resource-constrained
devices such as handheld devices (android, iPhone, or jetson nano) which will become
ready to help farmers very efficiently on time and in the field, without wasting resources
and time.

Author Contributions: Conceptualization, H.K.; data curation, M.M.; formal analysis, I.U.H. and
S.U.K.; funding acquisition, M.Y.L.; methodology, H.K., M.M. and M.; project administration, M.Y.L.;
software, H.K.; supervision, M.Y.L.; validation, H.K., I.U.H. and S.U.K.; visualization, M.; writing—
review and editing, I.U.H., M. and M.Y.L. All authors have read and agreed to the published version
of the manuscript.
Funding: This research was supported by Basic Science Research Program through the National Re-
search Foundation of Korea (NRF) funded by the Ministry of Education (No. 2021R1I1A1A01055652).
Institutional Review Board Statement: Not available.
Data Availability Statement: Not available.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Aker, J.C. Dial “A” for agriculture: A review of information and communication technologies for agricultural extension in
developing countries. Agric. Econ. 2011, 42, 631–647. [CrossRef]
Agriculture 2022, 12, 1226 19 of 20

2. Barretto, R.; Buenavista, R.M.; Rivera, J.L.; Wang, S.; Prasad, P.V.; Siliveru, K. Teff (Eragrostis tef ) processing, utilization and future
opportunities: A review. Int. J. Food Sci. Technol. 2020, 56, 3125–3137. [CrossRef]
3. Chakraborty, S.; Newton, A.C. Climate change, plant diseases and food security: An overview. Plant Pathol. 2011, 60, 2–14.
[CrossRef]
4. Nicholls, E.; Ely, A.; Birkin, L.; Basu, P.; Goulson, D. The contribution of small-scale food production in urban areas to the
sustainable development goals: A review and case study. Sustain. Sci. 2020, 15, 1585–1599. [CrossRef]
5. WHO. Available online: https://www.who.int/news-room/fact-sheets/detail/food-safety (accessed on 26 April 2022).
6. Mottaleb, K.A.; Singh, P.K.; Sonder, K.; Kruseman, G.; Tiwari, T.P.; Barma, N.C.; Malaker, P.K.; Braun, H.-J.; Erenstein, O. Threat of
wheat blast to South Asia’s food security: An ex-ante analysis. PLoS ONE 2018, 13, e0197555. [CrossRef] [PubMed]
7. Food and Agriculture Organization of the United Nations (FAO). Supply and Deman Brief ; Food and Agriculture Organization of
the United Nations (FAO): Rome, Itlay, 2020.
8. Figueroa, M.; Hammond-Kosack, K.E.; Solomon, P.S. A review of wheat diseases—A field perspective. Mol. Plant Pathol. 2018, 19,
1523–1536. [CrossRef]
9. Huerta-Espino, J.; Singh, R.; German, S.; McCallum, B.; Park, R.; Chen, W.Q.; Bhardwaj, S.; Goyeau, H. Global status of wheat leaf
rust caused by Puccinia triticina. Euphytica 2011, 179, 143–160. [CrossRef]
10. Sankaran, S.; Mishra, A.; Ehsani, R.; Davis, C. A review of advanced techniques for detecting plant diseases. Comput. Electron.
Agric. 2010, 72, 1–13. [CrossRef]
11. Jha, K.; Doshi, A.; Patel, P.; Shah, M. A comprehensive review on automation in agriculture using artificial intelligence. Artif.
Intell. Agric. 2019, 2, 1–12. [CrossRef]
12. Khan, N.; Muhammad, K.; Hussain, T.; Nasir, M.; Munsif, M.; Imran, A.S.; Sajjad, M. An adaptive game-based learning strategy
for children road safety education and practice in virtual space. Sensors 2021, 21, 3661. [CrossRef] [PubMed]
13. Haroon, U.; Ullah, A.; Hussain, T.; Ullah, W.; Sajjad, M.; Muhammad, K.; Lee, M.Y.; Baik, S.W. A Multi-Stream Sequence Learning
Framework for Human Interaction Recognition. IEEE Trans. Hum.-Mach. Syst. 2022, 52, 435–444. [CrossRef]
14. Khan, S.U.; Haq, I.U.; Khan, N.; Muhammad, K.; Hijji, M.; Baik, S.W. Learning to rank: An intelligent system for person
reidentification. Int. J. Intell. Syst. 2022, 37, 5924–5948. [CrossRef]
15. He, J.; Baxter, S.L.; Xu, J.; Xu, J.; Zhou, X.; Zhang, K. The practical implementation of artificial intelligence technologies in
medicine. Nat. Med. 2019, 25, 30–36. [CrossRef] [PubMed]
16. Khan, S.U.; Hussain, T.; Ullah, A.; Baik, S.W. Deep-ReID: Deep features and autoencoder assisted image patching strategy for
person re-identification in smart cities surveillance. Multimed. Tools Appl. 2021, 1–22. [CrossRef]
17. Ullah, W.; Ullah, A.; Hussain, T.; Muhammad, K.; Heidari, A.A.; Del Ser, J.; Baik, S.W.; De Albuquerque, V.H.C. Artificial
Intelligence of Things-assisted two-stream neural network for anomaly detection in surveillance Big Video Data. Future Gener.
Comput. Syst. 2022, 129, 286–297. [CrossRef]
18. Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W. CNN features with bi-directional LSTM for real-time
anomaly detection in surveillance networks. Multimed. Tools Appl. 2021, 80, 16979–16995. [CrossRef]
19. Yar, H.; Hussain, T.; Khan, Z.A.; Koundal, D.; Lee, M.Y.; Baik, S.W. Vision sensor-based real-time fire detection in resource-
constrained IoT environments. Comput. Intell. Neurosci. 2021, 2021, 5195508. [CrossRef] [PubMed]
20. Patrício, D.I.; Rieder, R. Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review.
Comput. Electron. Agric. 2018, 153, 69–81. [CrossRef]
21. Barbedo, J.G.A. A review on the main challenges in automatic plant disease identification based on visible range images. Biosyst.
Eng. 2016, 144, 52–60. [CrossRef]
22. Barbedo, J.G. Factors influencing the use of deep learning for plant disease recognition. Biosyst. Eng. 2018, 172, 84–91. [CrossRef]
23. Barbedo, J.G.A. Automatic image-based detection and recognition of plant diseases—A critical view. In Proceedings of the XI
Congresso Brasileiro de Agroinformática, Sao Paulo, Brazil, 2–6 October 2017.
24. Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease
diagnostics. arXiv 2015, arXiv:1511.08060.
25. Barbedo, J.G.A.; Koenigkan, L.V.; Halfeld-Vieira, B.A.; Costa, R.V.; Nechet, K.L.; Godoy, C.V.; Junior, M.L.; Patricio, F.R.A.;
Talamini, V.; Chitarra, L.G. Annotated plant pathology databases for image-based detection and recognition of diseases. IEEE Lat.
Am. Trans. 2018, 16, 1749–1757. [CrossRef]
26. Johannes, A.; Picon, A.; Alvarez-Gila, A.; Echazarra, J.; Rodriguez-Vaamonde, S.; Navajas, A.D.; Ortiz-Barredo, A. Automatic
plant disease diagnosis using mobile capture devices, applied on a wheat use case. Comput. Electron. Agric. 2017, 138, 200–209.
[CrossRef]
27. Barbedo, J.G.A. Plant disease identification from individual lesions and spots using deep learning. Biosyst. Eng. 2019, 180, 96–107.
[CrossRef]
28. Ngugi, L.C.; Abelwahab, M.; Abo-Zahhad, M. Recent advances in image processing techniques for automated leaf pest and
disease recognition—A review. Inf. Process. Agric. 2021, 8, 27–51. [CrossRef]
29. Wójtowicz, A.; Piekarczyk, J.; Czernecki, B.; Ratajkiewicz, H. A random forest model for the classification of wheat and rye leaf
rust symptoms based on pure spectra at leaf scale. J. Photochem. Photobiol. B Biol. 2021, 223, 112278. [CrossRef] [PubMed]
30. Bao, W.; Zhao, J.; Hu, G.; Zhang, D.; Huang, L.; Liang, D. Identification of wheat leaf diseases and their severity based on
elliptical-maximum margin criterion metric learning. Sustain. Comput. Inform. Syst. 2021, 30, 100526. [CrossRef]
Agriculture 2022, 12, 1226 20 of 20

31. Paul, A.; Ghosh, S.; Das, A.K.; Goswami, S.; Choudhury, S.D.; Sen, S. A review on agricultural advancement based on computer
vision and machine learning. In Emerging Technology in Modelling and Graphics; Springer: Cham, Switzerland, 2020; pp. 567–581.
32. Kumar, M.; Hazra, T.; Tripathy, S.S. Wheat leaf disease detection using image processing. Int. J. Latest Technol. Eng. Manag. Appl.
Sci. (IJLTEMAS) 2017, 6, 73–76.
33. Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors 2018, 18, 2674.
[CrossRef] [PubMed]
34. Dixit, A.; Nema, S. Wheat Leaf Disease Detection Using Machine Learning Method—A Review. Int. J. Comput. Sci. Mob. Comput.
2018, 7, 124–129.
35. Xu, P.; Wu, G.; Guo, Y.; Yang, H.; Zhang, R. Automatic wheat leaf rust detection and grading diagnosis via embedded image
processing system. Procedia Comput. Sci. 2017, 107, 836–841. [CrossRef]
36. Islam, M.; Dinh, A.; Wahid, K.; Bhowmik, P. Detection of potato diseases using image segmentation and multiclass support vector
machine. In Proceedings of the IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), Windsor, ON,
Canada, 30 April–3 May 2017; pp. 1–4.
37. Alehegn, E. Ethiopian maize diseases recognition and classification using support vector machine. Int. J. Comput. Vis. Robot. 2019,
9, 90–109. [CrossRef]
38. Hossain, S.; Mou, R.M.; Hasan, M.M.; Chakraborty, S.; Razzak, M.A. Recognition and detection of tea leaf’s diseases using
support vector machine. In Proceedings of the IEEE 14th International Colloquium on Signal Processing & Its Applications
(CSPA), Penang, Malaysia, 9–10 March 2018; pp. 150–154.
39. Ullah, W.; Muhammad, K.; Ul Haq, I.; Ullah, A.; Ullah Khattak, S.; Sajjad, M. Splicing sites prediction of human genome using
machine learning techniques. Multimed. Tools Appl. 2021, 80, 30439–30460. [CrossRef]
40. Ahmad, F.; Ikram, S.; Ahmad, J.; Ullah, W.; Hassan, F.; Khattak, S.U.; Rehman, I.U. GASPIDs Versus Non-GASPIDs-Differentiation
Based on Machine Learning Approach. Curr. Bioinform. 2020, 15, 1056–1064. [CrossRef]
41. Aurangzeb, K.; Akmal, F.; Khan, M.A.; Sharif, M.; Javed, M.Y. Advanced machine learning algorithm based system for crops leaf
diseases recognition. In Proceedings of the IEEE 6th Conference on Data Science and Machine Learning Applications (CDMA),
Riyadh, Saudi Arabia, 4–5 March 2020; pp. 146–151.
42. Treboux, J.; Genoud, D. Improved machine learning methodology for high precision agriculture. In Proceedings of the IEEE
Global Internet of Things Summit (GIoTS), Bilbao, Spain, 4–7 June 2018; pp. 1–6.
43. Rumpf, T.; Mahlein, A.-K.; Steiner, U.; Oerke, E.-C.; Dehne, H.-W.; Plümer, L. Early detection and classification of plant diseases
with support vector machines based on hyperspectral reflectance. Comput. Electron. Agric. 2010, 74, 91–99. [CrossRef]
44. Ramesh, S.; Hebbar, R.; Niveditha, M.; Pooja, R.; Shashank, N.; Vinod, P. Plant disease detection using machine learning. In
Proceedings of the IEEE International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C),
Bangalore, India, 25–28 April 2018; pp. 41–45.
45. Phadikar, S.; Sil, J.; Das, A.K. Classification of rice leaf diseases based on morphological changes. Int. J. Inf. Electron. Eng. 2012, 2,
460–463.
46. Prajapati, H.B.; Shah, J.P.; Dabhi, V.K. Detection and classification of rice plant diseases. Intell. Decis. Technol. 2017, 11, 357–373.
[CrossRef]
47. Ahmed, K.; Shahidi, T.R.; Alam, S.M.I.; Momen, S. Rice leaf disease detection using machine learning techniques. In Proceedings
of the IEEE International Conference on Sustainable Technologies for Industry 4.0 (STI), Dhaka, Bangladesh, 24–25 December
2019; pp. 1–5.
48. Panigrahi, K.P.; Das, H.; Sahoo, A.K.; Moharana, S.C. Maize leaf disease detection and classification using machine learning
algorithms. In Progress in Computing, Analytics and Networking; Springer: Cham, Switzerland, 2020; pp. 659–669.
49. Waghmare, H.; Kokare, R.; Dandawate, Y. Detection and classification of diseases of grape plant using opposite colour local
binary pattern feature and machine learning for automated decision support system. In Proceedings of the IEEE 3rd International
Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 11–12 February 2016; pp. 513–518.
50. Zhao, J.; Fang, Y.; Chu, G.; Yan, H.; Hu, L.; Huang, L. Identification of leaf-scale wheat powdery mildew (Blumeria graminis f. sp.
Tritici) combining hyperspectral imaging and an SVM classifier. Plants 2020, 9, 936. [CrossRef] [PubMed]
51. Li, G.; Ma, Z.; Wang, H. Image recognition of wheat stripe rust and wheat leaf rust based on support vector machine. J. China
Agric. Univ. 2012, 17, 72–79.
52. Azadbakht, M.; Ashourloo, D.; Aghighi, H.; Radiom, S.; Alimohammadi, A. Wheat leaf rust detection at canopy scale under
different LAI levels using machine learning techniques. Comput. Electron. Agric. 2019, 156, 119–128. [CrossRef]
53. Tursunov, A.; Choeh, J.Y.; Kwon, S. Age and gender recognition using a convolutional neural network with a specially designed
multi-attention module through speech spectrograms. Sensors 2021, 21, 5892. [CrossRef] [PubMed]
54. Mustaqeem; Ishaq, M.; Kwon, S. A CNN-Assisted deep echo state network using multiple Time-Scale dynamic learning reservoirs
for generating Short-Term solar energy forecasting. Sustain. Energy Technol. Assess. 2022, 52, 102275. [CrossRef]
55. Maji, B.; Swain, M.; Mustaqeem. Advanced Fusion-Based Speech Emotion Recognition System Using a Dual-Attention Mechanism
with Conv-Caps and Bi-GRU Features. Electronics 2022, 11, 1328. [CrossRef]

You might also like