Tomato Leaf Disease
Tomato Leaf Disease
PII: S1568-4946(19)30714-8
DOI: https://doi.org/10.1016/j.asoc.2019.105933
Reference: ASOC 105933
Please cite this article as: Karthik R., Hariharan M., S. Anand et al., Attention embedded residual
CNN for disease detection in tomato leaves, Applied Soft Computing Journal (2019), doi:
https://doi.org/10.1016/j.asoc.2019.105933.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
Karthik R1, Hariharan M2, Sundar Anand3, Priyanka Mathikshara4, Annie Johnson5, Menaka R6
1,3-6
School of Electronics Engineering, Vellore Institute of Technology, Chennai.
2
School of Computing sciences and Engineering, Vellore Institute of Technology, Chennai
of
Abstract
Automation in plant disease detection and diagnosis is one of the challenging research areas that
pro
has gained significant attention in the agricultural sector. Traditional disease detection methods rely on
extracting handcrafted features from the acquired images to identify the type of infection. Also, the
performance of these works solely depends on the nature of the handcrafted features selected. This can
be addressed by learning the features automatically with the help of Convolutional Neural Networks
(CNN). This research presents two different deep architectures for detecting the type of infection in
tomato leaves. The first architecture applies residual learning to learn significant features for
classification. The second architecture applies attention mechanism on top of the residual deep network.
re-
Experiments were conducted using Plant Village Dataset comprising of three diseases namely early
blight, late blight, and leaf mold. The proposed work exploited the features learned by the CNN at various
processing hierarchy using the attention mechanism and achieved an overall accuracy of 98% on the
validation sets in the 5-fold cross-validation.
lP
1. Introduction
Tomato holds an inevitable place in the economy of Indian agriculture. India stands third in the
production of tomatoes with a yield of 53,00,000 tons and it is harvested around 3,50,000 hectares of
a
land. The harvest index of tomato in India is comparatively less than in other countries. One of the major
reasons for the reduction in yield is due to diseases that occur frequently on the leaves of the plant.
Tomato crops are highly affected by diseases like bacterial spot, early blight, late blight, and leaf mold.
urn
1
Journal Pre-proof
harmful chemicals. In this type of farming, mobile robotics, remote sensor networks and drones are used
to advocate controlled and measured amounts of medicine to the infected areas of the plants.
The main challenge of precision farming is that, it had numerous challenges in data collection,
processing and make expert inferences. Therefore, precision farming incorporated image processing and
computer vision techniques to process the information in the cultivation field. Image processing was
of
quite successful in solving the problems of disease detection, weed detection, understanding the
symptoms of a disease and even more recently, grading the yield output. As machine learning algorithms
continued to advance, the accuracy in image processing techniques continued to grow consistently.
However, these algorithms demand handpicked features to detect diseases which made deep learning
pro
techniques pertinent. This research places an attempt towards application of deep learning architectures
for detection of diseases in tomato leaves.
2. Related Works
Several research works have been presented in the last two decades towards detection of disease
in different crops. Image processing techniques were applied to extract the features and given as input to
re-
machine learning algorithms for precise classification. In short, these approaches can be broadly
classified into (1) Machine learning methods (2) Deep learning methods.
Akhtar et al. presented an automated approach for plant disease detection using Gray Level Co-
lP
occurrence Matrix (GLCM) and Wavelet-based features [7]. The features were trained with different
machine learning algorithms namely K- Nearest Neighbor (KNN), Naïve Bayes Classifier, Support
Vector Machine (SVM), Decision Tree and Recurrent Neural Networks. An automated approach for
tomato grading system was presented by Semary et al. [8]. This approach utilized color and texture
features and classified using SVM. Prasad et al. developed an automated approach for leaf disease
diagnosis using Gabor Wavelet Features (GWF) and GLCM features. These multi-resolution features
a
An approach to detect the severity of the disease in leaves was proposed in [11]. Statistical features in
the RGB and HSV color space were utilized for determining the severity level. H. Sabrol et al. presented
an approach for leaf disease detection in tomatoes by combining Otsu’s segmentation with decision trees
for classification. This approach considered color, shape and texture features for learning the
characteristics of the leaf diseases [12].
Padol et al. presented an approach to detect leaf diseases using color and texture features. The
infected region was initially segmented using K-means clustering. Then, features were extracted from
Jo
the required region of interest and trained using SVM for classification [13]. Another approach using K-
means algorithm was proposed for leaf disease detection and classification [14]. T. Mehra et al. employed
K-means clustering to identify the presence of fungal infections on leaves [15]. One of the major
challenge in applying the above clustering algorithms is the determination of precise number of clusters
and fixing of parameters to differentiate each cluster.
2
Journal Pre-proof
In the past few years, Scale invariant feature transforms were explored for many image processing
problems [16-18]. An approach using Scale Invariant Feature Transform (SIFT) for detection of leaf
disease was presented by Dandawate et al. [19]. In this work, SIFT features were trained using SVM for
detecting the presence of disease. SIFT based features were combined with Johnson SB distribution for
effective classification of diseases in tomatoes [20].
All the above methods for disease detection were based on hand engineered features extracted
of
from the leaf portion of the image. The accuracy of these works solely depends on the nature of the
handcrafted features selected. Also, it is to be noted that the performance of these works needs to be
validated against a wide range of datasets. These drawbacks can be addressed by using deep learning
techniques.
pro
2.2 Deep learning based methods
Unlike machine learning algorithms, deep learning algorithms can be applied directly over the
input data and does not require any handcrafted features. In today’s world, the computing power delivered
by High-Performance Computing (HPC) and Graphics Processing Unit (GPU) allows for efficient
training of deep models while simultaneously implementing parallelism in computing. A number of deep
re-
learning models have been proposed in order to train leaf images to perform disease detection.
Most of the researches were based on applying existing deep learning architectures like VGG16,
AlexNet, ResNet, GoogleNet etc. for detection of infection in tomato leaves. Jia Shijie et al. presented
an approach to detect diseases in tomato leaves using VGG16 architecture [21]. Suryawati et al. presented
another deep CNN using VGG16 architecture to detect infection in tomato leaves [22]. Aravind et al.
lP
compared the performance of VGG16 with AlexNet architecture for disease detection in tomato leaves.
It was inferred that the model trained with AlexNet architecture was accurate than VGG16 by a small
margin [23]. Jayme Garcia et al. utilized a pre-trained GoogleNet CNN architecture for disease detection
in leaves [24]. Zhang et al. presented a transfer learning approach using AlexNet, GoogleNet, and
ResNet architectures for disease detection in tomato leaves [25]. Liang et al. presented an approach
involving the use of Resnet50, Wideresnet50, DPN92 neural networks for classification of plant diseases
a
[26]. A deep architecture based on LeNet was proposed to detect the type of disease in tomato leaves in
[27].
Q. H. Cap et al. used two super resolution (SR) models which are based on super resolution
urn
convolutional neural networks (SRCNN) and enhanced super resolution generative adversarial networks
(ESRGAN). The SRCNN model is used to identify prominent disease features, whereas, the ESRGAN
model focuses on high frequency details to obtain a more accurate prediction [28]. Another deep learning
architecture named ‘PD2SENET’ was proposed to detect and indicate the severity of the disease [29]. In
this architecture, the shallow layers considered raw pixel values of plant images as input and the
progressive feature maps are generated with the help of residual learning. Srdjan Sladojevic et al.
presented a CaffeNet based architecture for detection of leaf diseases. This architecture had eight layers
Jo
for learning the characteristics of the disease patterns and utilized around 30K samples for training the
model [30]. Alvaro Fuentes et al. presented deep learning meta architectures for disease detection by
combining Faster Region-based Convolutional Neural Network (Faster R-CNN), Region-based Fully
Convolutional Network (R-FCN), and Single Shot Multibox Detector (SSD) with ResNet and VGG
3
Journal Pre-proof
architectures. It was inferred that, R-FCN with ResNet combination outperformed the other two methods
[31].
In addition to the application of existing CNN architectures, several custom architectures were
proposed for disease detection in tomato leaves. Ferdouse et al. presented one such CNN to identify
diseases in tomato leaves [32]. This architecture consists of 15 layers to extract a wide range of features
of
for classification. Ruedeeniraman et al. presented a VegeCare tool that made use of Deep Neural Network
(DNN) to classify six tomato diseases [33]. Fuentes et. al. presented another deep architecture to identify
diseases in tomatoes [34]. Melike Sardogan et al. presented a CNN model to identify the type of disease
in tomato plant [35]. This method considered only 400 images for training, which is relatively less for a
pro
deep learning model. Pardede et al. presented an unsupervised convolutional auto-encoder for automatic
detection of plant diseases [36].
In contrast to the above standalone deep learning applications, few CNN models were also
presented as mobile applications focusing on disease detection in tomato leaves. A. Elhassouny et al.
presented a MobileNet CNN model that involves depth-wise separable convolution operations to address
computational burden of the traditional CNN for real time applications [37]. Another mobile application
was developed by H. Durmus et al. which used SqueezeNet for classification of tomato leaf diseases
re-
[38].
Though several approaches were presented for detection of diseases in tomato leaves, there
lP
automated way.
2. The performance of the machine learning based models solely depends on the nature of
the manually selected handcrafted features. Hence, feature extraction has to be made
urn
automatic to select and learn an optimal set of features for classification purpose.
3. Most of the deep learning models give equal weightage to all features derived across
different levels. But to make the model more sensitive for classification, feature weighting
has to be done at each stage. By doing so, significant features can be learnt and passed to
deeper levels of the network for precise classification.
4. Some of the deep learning models utilize generic and proven architectures like VGG16,
GoogleNet etc. Hence, it utilizes millions of parameters for classification. For real-time
Jo
4
Journal Pre-proof
To address the above research gaps, the proposed research employed two different deep
architectures for disease detection in tomato leaves. The following are the major contributions of the
proposed work.
1. Two different deep learning architectures were proposed in this research. The first
architecture employed residual learning to learn a hierarchy of features for better
of
classification. The second architecture employed the attention mechanism to specifically
learn distinctive feature maps and improve the performance of the residual CNN.
2. To the best of our knowledge, this is the first attempt to develop attention based residual
pro
deep network for disease detection in tomato leaves. Attention mechanism is employed to
learn and weight significant features across different levels. Hence, the significant features
were given more weightage with the help of attention coefficients learnt and passed to
deeper levels for precise classification.
3. The proposed architecture was trained with a large collection of samples. 95999 images
were used for training the model and 24001 images were used for validation purpose.
re-
3. Proposed Work
This research proposes a novel CNN framework that specializes in the task of infestation
detection in the tomato plant. The objective of this work is to design a computationally inexpensive and
accurate learning model for disease detection. Two different deep architectures were proposed in this
work, to detect disease infestation in tomato leaf. The first architecture integrates residual learning on
lP
top of a feed-forward CNN. The second architecture integrates the strengths of Attention mechanism and
Residual Learning on CNN.
The learning pattern of a CNN is generally based on aggregation of feature maps derived at
a
multiple levels. As a consequence of this aggregation occurring in the deep layers of the CNN, it tends
to lose the significance of the fine granular details learnt by the initial layers. The traditional CNN based
urn
methods for tomato leaf infestation detection focusses on learning the features in an orderly fashion
starting from basic image level features like edges and move towards complex texture based differences.
By doing so, few significant details are not passed to the deeper layers of the network. Hence in this
method, residual connections are employed to pass those significant features extracted in the initial layers
to the deeper layers of the network. This supports effective aggregation of feature maps for precise
classification.
The architecture of the proposed residual connection based on CNN is presented in Fig. 1. It
Jo
consists of a sequence of three Residual Progressive Feature Extraction (RPFE) blocks, each set to learn
progressive features. The number of channels increases from 32 to 128 along the depth of the network.
The first RPFE block has a convolution receptive area of size 7x7, trailed by a 5x5 kernel for the second
block and finally a 3x3 filter for the third block. Then, it applies the average pooling over the feature
map. This enables the classifier to model a reduced set of features without much loss of context and also
5
Journal Pre-proof
avoids the risk of overfitting. The entire model is followed by a sequence of 1x1 convolutional layers
after the last RPFE block.
of
pro
Fig. 1. The architecture of the Proposed Residual CNN
for the first block, 5x5 for the second, 3x3 for the third and so on). This is followed by a Rectified Linear
Unit (ReLU) activation layer that rectifies the convolved image, zeroing out negative values. ReLU was
used because it sustains a steady gradient even for larger activations, thus stabilizing the learning.
The output from the convolutional layers, ‘x’ in the first and second RPFE blocks is directed in two
functional paths, F(x) and G(x). F(x) denotes the set of operations (max pooling and batch normalization)
a
that were applied to take ‘x’, in a simple feed-forward manner to the next block. G(x) indicates the set of
operations that skip ‘x’ to the next block using convolutional and max-pooling layers. Finally, the
response from the RPFE blocks, Y(x) are generated by summing the individual responses of F(x) and
urn
Fig. 2 presents the visual representation of the skip functions, generically to an RPFE block in the
described Residual CNN.
Jo
6
Journal Pre-proof
of
pro
Fig. 2. A representation of the functions F(x) and G(x) that are skipped through the RPFE block
re-
to generate the output Y(x) from the residual block.
The proposed network was designed end-to-end only with 2D convolution, pooling, batch norm layers,
with no dense layers. The observed bottleneck in the case of the last layers being fully connected is that
the model fails to exploit the run entirely in the GPU. With full convolution, the system can now generate
a spatial map whose correspondences can be tracked to different parts of the input image. This essentially
lP
translates to sliding a classifier over the input image, making predictions at each window, regardless of
the input size. This approach towards identifying the infection makes it possible to,
(b) exploit spatial locality (when used with a stride less than the filter size)
a
(c) feature the different parts of the image (matched with the receptive fields of the input layers’
urn
Convolutional Layer
The convolutional layer defines a set of filters that perform the convolution operation over the entire
image. involve a series of convolution operation among an input volume ‘I’ and a set of ‘n’ convolutional
filters ‘FE’ followed by a non-linear activation. This finally yields an output volume ‘O’ as presented in
Jo
Eq. 2.
𝑂 𝑖, 𝑗 𝑎 ∑ ∑ ∑ 𝐹 𝑢, 𝑣 𝐼 𝑖 𝑢, 𝑗 𝑣 𝑏 (2)
where,
7
Journal Pre-proof
The activation maps produced with the help of above relation are the encoding of the input ‘I’ in a low
dimensional space i.e. it refers to the parameters used to build every feature map ‘Om’. After ‘Om’ is
of
calculated, it is subjected to a max-pooling operation to down-sample it. Intuitively, each convolutional
layer in this architecture learns the various attributes that capture discriminatory patterns to differentiate
the type of infection in the tomato leaf.
pro
ReLU Activation
The Rectified Linear Unit (ReLU) is an activation function adopted in the design of most neural
networks, particularly CNN's. It is the identity function, f(x) = x, for all positive values and zeros out for
negative values of input ‘x’. ReLU is sparsely activated, which helps to mimic the inactivity of the
biological neuron to certain impulses.
re-
Max Pooling Layer
This pooling layer maximally activates only a bunch of neurons from the feature map. It is used with a
stride factor of ‘2’ on a ‘2-by-2’ window, across all the RPFE blocks. This effectively reduces the width
and height of the feature maps while preserving the number of channels.
lP
In Deep Neural Networks each layer sees different feature information from the previous layer after every
single gradient update on a batch of data. And the data distribution of this input feature map largely
varies, as the parameter of the previous layers is updated during the training phase. This significantly
affects the training pace and also calls for various heuristics to decide upon the parameter initialization.
a
Batch Normalization is a popular trick used to curtail this problem of Internal Covariate Shift and the
outputs of the BN layer for a batch ‘x’ is given by Eq. 3.
urn
𝑦 𝛽 𝜑 (3)
√
where ‘m’ and ‘s’ are respectively the mean and standard deviation of the batch ‘x’. ‘β’, ‘φ’ are trainable
parameters, that are updated at each iteration. ‘ε’ is set to a small constant, introduced to increase the
variance, as well as prevent the denominator from zeroing out. Batch Normalization overcomes the
vanishing/exploding gradient problem by normalizing the values to a range between -3 to 3, fitting a
Jo
maximum likelihood estimate (along with the line of channel activations, across a batch) for normal
distribution.
The details of the tensor at each layer of this architecture are tabulated in Table 1.
8
Journal Pre-proof
Table 1: A tabulation of the connections between the layers and the dimensions of the output
tensors at each layer, for the entire Residual CNN.
No. of Connected to the previous
Layer (type) Output Shape
Parameters layer
input_1 (InputLayer) (None, 256, 256, 3) 0
conv2d_1 (Conv2D) (None, 256, 256, 32) 4736 input_1 (0,0)
of
conv2d_2 (Conv2D) (None, 128, 128, 32) 9248 conv2d_1(0,0)
max_pooling2d_1 (MaxPooling2D) (None, 128, 128, 32) 0 conv2d_1(0,0)
max_pooling2d_2 (MaxPooling2D) (None, 128, 128, 32) 0 conv2d_39(0,0)
batch_normalization_1 (BatchNorm) (None, 128, 128, 32) 128 max_pooling2d_1(0,0)
pro
add_1 (Add) max_pooling2d_2(0,0),
(None, 128, 128, 32) 0 batch_normalization_1(0,0)
The attention model works on top of the RPFE CNN by retaining the context relevant features.
The previous RPFE based model combines the features extracted in each block with the features derived
from its preceding layer. In this way, equal importance is given to all features collected from the earlier
RPFE blocks. For precise feature learning, significant features from the previous blocks need to be
Jo
weighted high relative to other features. Hence, an attention mechanism was introduced on top of the
RPFE architecture to learn and select prominent features from the previous RPFE blocks. This model
learns an attention mask that weighs the relative importance of spatial features at that feature map. This
way it learns attention coefficients for each pixel in the feature map to understand the properties of the
9
Journal Pre-proof
infestation in an effective manner. The architecture of the proposed attention based on CNN, built on top
of the described residual architecture in section 2.1. is presented in Fig. 3.
of
pro
Fig. 3. An overview of the architecture employed to integrate attention within the residual net
framework.
re-
3.2.1 Attention embedded Residual Progressive Feature Extraction (ARPFE) block
This architecture uses the attention mechanism across blocks to learn a weighted function for
modeling the activations from the preceding blocks. The skip connections from the previous blocks are
now weighted across the depth axis for each pixel in the spatial expanse of that layer.
lP
The output from the convolutional layers, ‘x’ in the first and second ARPFE blocks is directed in
two functional paths, F(x) and G’(x). F(x) denotes the set of operations (max pooling and batch
normalization) that were applied to take ‘x’, in a simple feed-forward manner to the next block. G’(x)
indicates the attention-aided weighted set of operations that skip ‘x’ through convolutional and max-
pooling layers. As discussed in 3.1.1 the weighted summation is used to generate the output Y(x) from
a
where ‘α’ is the attention weight matrix whose dimensions are the same as the spatial dimensions of G(x).
The attention weight matrix ‘α’ is point-wise multiplied (broadcasted along the depth) across the
Jo
corresponding cross-section of G(x). So, ‘α’ is weighted function in G(x) and G’(x) is derived from ‘α’
as given by Eq. 4. This process is illustrated in Fig. 4.
10
Journal Pre-proof
The method for learning these weights is shown in Fig. 5 The residual feature map G(x) is passed
through a dense layer (with ReLU activation) that learns a parameter ‘𝛼 ’ for each pixel cross-section
volume G(x)(i,j) .
The dense matrix is flattened out, to form a feature vector. The activation values from the feature
vector are passed through a Softmax layer. The weights ‘𝛼 ’ are now computed calculated as a Softmax
of
probability distribution, such that summation of ∑ 𝛼 =1.
pro
re-
lP
Fig. 4. A visual representation for generating the output Y(x) from an ARPFE block using
attention based weights.
a
urn
Jo
11
Journal Pre-proof
Fig. 5 A generic scheme for learning the attention function ɑij . (a): Feature map G(x) (b): A ReLU-
activated dense layer matrix with one unit for each cross-section (i,j) of G(x)i,j . (c): A softmax
activation layer following the dense layer from (b) to learn a probability distribution of weights for
each 𝛼 . The weighted sum of ɑ s over G(x) yields G’(x) (d): Output G’(x) is computed as
∑ 𝛼 ∗ 𝐺 𝑥
Softmax Classifier
of
The proposed system uses a k-way softmax classifier to make classify the image to one among k
categories. This loss is given by Eq. 5.
pro
𝐶𝐸 ∑ 𝑡 𝑙𝑜𝑔 𝑓 𝑠 (5)
where the f(s)i is the output conditional probability P( y = ŷi| si ) for some training example ‘si’, predicted
value ŷi . This probability function for softmax activation is given in Eq. 6.
𝑓 𝑠 ∑
(6)
re-
The details of the tensor at each layer of this architecture are tabulated in Table 2.
Table 2: A tabulation of the connections between the layers and the dimensions of the output
tensors at each layer, for the entire Residual CNN.
lP
max_pooling2d_4(0,0)
12
Journal Pre-proof
of
max_pooling2d_5 (MaxPooling2D) (None, 32, 32, 64) 0 conv2d_4(0,0)
batch_normalization_2 (BatchNorm) (None, 32, 32, 64) 256 max_pooling2d_5(0,0)
add_2 (Add) (None, 32, 32, 64) 0 multiply_1(0,0),
multiply_1(0,0),
pro
batch_normalization_2(0,0)
The proposed system was trained with the augmented collection of the benchmarked Plant Village
Dataset. The source code was written in Tensorflow Deep Learning programming framework and
compiled to run on the NVIDIA Tesla P100 GPU. The model was evaluated on a 5-fold cross validation
set (of 120K samples) with each fold stratified into roughly equal numbers for each class (by random
sampling with replacement). The loss function was minimized using the Adaptive Moment Estimation
a
(Adam) optimizer. This optimization algorithm uses the running average of both the gradient and the
second moment.
urn
Three different experiments were conducted. The first experiment applied a baseline model for
disease detection and classification in tomato. The second experiment used the residual connections
across the Progressive Feature Extraction blocks. The third experiment integrated both attention and
residual connections in CNN.
4.1 Dataset
Jo
The proposed model for disease detection in Tomato was developed using the Plant Village
Disease Classification Challenge dataset and further data augmentation techniques were applied to
increase the size of the dataset. Table 3 presents the distribution of augmented samples for each fold in
cross-validation process. The dataset used in our experiment includes one healthy class and 3 diseased
13
Journal Pre-proof
classes. Table 4 shows the samples for each disease class and the effects of the data augmentation
techniques on them.
Data augmentation techniques have been applied for increasing the data set, thereby reducing the
overfitting. Central zoom was performed to produce a data set of images that have only the leaf and not
the background information, random crop & zoom was performed to focus on specific parts of the leaf
of
and various contrast levels were used to make the dataset robust to various lighting conditions. The
stratified 5-fold cross-validation used to evaluate the proposed model and this ensured balance between
the classes for each of the 5 folds due to random sampling.
pro
Table 3. Details of distribution of samples in cross-validation process
Random Zoom
Original Image Contrast Central Zoom
Category and Crop
a
Healthy
urn
Early Blight
Jo
14
Journal Pre-proof
Late Blight
of
pro
Leaf Mould
re-
4.2 Experiment 1: Application of the baseline model
The baseline model was a simple feed-forward CNN with no cross-connections or any learning
aided mechanisms. It took around 1.5 days for training the model on the GPU. This resulted in an
lP
accuracy of 84%.
Building on experiment 1, the residual model includes skip connections from one block to the
other. The skip connections take the feature map from the ReLU activated convolutional layer in RPFE
block b, onto the convolutional layer in the RPFE block b+1, as described in section 2.1. The dimensions
a
at the both ends of the skip connection are matched by filtering the admitted feature maps with
convolutional layers and trimming with max pooling.
urn
It took around 10 hours for training the model and it took approx. 150 epochs to reach
convergence. The proposed residual based network is subjected to 5 – fold cross-validation process and
the resultant observations are presented in Table 5.
15
Journal Pre-proof
Fold 1
of
Fold 2
pro
re-
lP
Fold 3
a
urn
Jo
16
Journal Pre-proof
Fold 4
of
pro
re-
Fold 5
lP
It could be observed that the Residual CNN was able to detect the type of disease in tomatoes
with an accuracy of 90-95%. Also, the loss of the network decays appreciably during the training phase,
leading to precise classification.
Building on the second experiment, the attention based residual model adds on a weighing scheme
urn
to the output feature map G(x) from the skip connections. The weighing scheme computes an attention
matrix ′𝛼 ′ is point-wise multiplied (broadcasted along the depth) across the corresponding cross-section
of G(x)ij as described in section 2.2. These attention weights ′𝛼 ′ are learnt dynamically upon seeing new
training batches.
It took around 10 hours for training the model and it took approx. 150 epochs to reach
convergence. The proposed residual based network is subjected to 5-fold cross-validation process and
Jo
17
Journal Pre-proof
of
Fold 1
pro
re-
Fold 2
a lP
urn
Fold 3
Jo
18
Journal Pre-proof
Fold 4
of
pro
re-
Fold 5
.
lP
It was evident that the proposed attention based residual CNN was able to converge better than residual
CNN.
In this research, three different deep architectures were analyzed to detect the performance of
disease detection. The first method applied a baseline model for disease detection in tomato leaves. The
second approach was based on the residual connections across the Progressive Feature Extraction blocks.
urn
The third approach integrated both attention mechanism and residual connections in CNN. The
observations of these experiments are tabulated in Table 7. It could be observed that the proposed
attention based residual CNN performed better in detecting the type of infection with an accuracy of
98%.
S. No Method Accuracy
(in %)
1 Baseline CNN model 84
2 Residual CNN model 95
19
Journal Pre-proof
The performance of the proposed attention based residual CNN is compared against the existing
methods reported in the literature and the resultant observations sorted according to accuracy obtained
are highlighted in Table 8.
of
Table 8. Performance comparison of proposed work with other existing works.
Size of Accuracy
S. No Source Type of features Method dataset (in %)
pro
1 Ferdouse et al. [32] Automatic CNN 3000 76
2 Chit Su Hlaing et al. [20] Hand-Crafted features Quadratic SVM 3535 83.5
3 Melike Sardogan et al. [35] Automatic CNN with LVQ 500 86
4 P. B. Padol et al [13] Hand-Crafted features SVM classifier 137 88.89
5 J. Shijie et al. [21] Automatic VGG16 based CNN 7040 89
6 Azeddine Elhassouny et al. [37] Automatic CNN 7176 90.3
re-
7 Semary et al. [8] Hand-Crafted features SVM 708 92
8 P. Tm et al. [27] Automatic LeNet based CNN 54306 95
9 Suryawati et al. [22] Automatic VGGNet based CNN 18160 95.24
10 Jayme Garcia et al. [24] Automatic GoogleNet based CNN 40409 96
11 Sladojevic et al [30] Automatic CNN 30880 96.3
lP
It could be observed that, an accuracy of 83 to 97% was obtained for machine learning methods that
employed hand crafted features for disease detection [8,12,13,20]. Also, the model was trained with less than
urn
4k samples, which is very less to generalize all feature patterns. Recent deep learning researches employed
well trained architectures like VGG16, ResNet, GoogleNet etc. for disease detection in tomato leaves
[21,22,24,25,27]. In addition to these works, certain deep-layered CNN architectures were also proposed for
infestation detection in tomato leaves [30,34,35,37]. Though these works yield appreciable results, the
accuracy of these works were in the range of 76 to 97%. As the proposed model employed attention mechanism
to learn and weight significant features, it was able to achieve an accuracy of 98%, which is a significant
improvement when compared to other works.
Jo
5. Conclusion
This research presents an efficient mechanism to detect the type of infestation in tomato leaves.
To the best of our knowledge, this is the first attempt to employ the attention gating mechanism in
20
Journal Pre-proof
residual CNN for disease detection in tomatoes. The main contribution of this work is the integration of
attention mechanism on top of the Residual network for effective feature learning. It helps to selectively
weigh the features different layers at the inception of a single layer. Hence, the receptive field at a layer
is extended to look at feature maps from different levels of the processing hierarchy. The current layer
can now process its input with more contextual information. Learning at the layers preceding the current
layer is now aided by the perception of the features at the current layer. This is due to back propagation
of
of the tensors along the skip connections.
The proposed network learnt around 600K parameters to detect the type of infection, which is
comparatively less than the existing deep learning approaches reported in the literature. Experimental
results indicate that the proposed attention based residual network was able to detect the type of infection
pro
with an accuracy of 98%. It could also be noted that the ARPFE blocks establish the extensibility of the
design of the proposed system to any input size.
References
1. R. Anand, S. Veni and J. Aravinth, "An application of image processing techniques for detection of
diseases on brinjal leaves using k-means clustering method," 2016 International Conference on Recent
re-
Trends in Information Technology (ICRTIT), pp. 1-6, 2016.
2. K. Thangadurai and K. Padmavathi, "Computer Vision image Enhancement for Plant Leaves Disease
Detection," 2014 World Congress on Computing and Communication Technologies, pp. 173-175, 2014.
3. C. Mattihalli, E. Gedefaye, F. Endalamaw and A. Necho, "Real Time Automation of Agriculture Land, by
lP
automatically Detecting Plant Leaf Diseases and Auto Medicine," 2018 32nd International Conference on
Advanced Information Networking and Applications Workshops (WAINA), pp. 325-330, 2018.
4. Y. Liu, S. Zhou and J. Sun, "Detection of Ginseng leaf cicatrices base on K-means clustering
algorithm," 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering
and Informatics (CISP-BMEI), pp. 1-5, 2017.
a
5. V. Singh, Varsha and A. K. Misra, "Detection of unhealthy region of plant leaves using image processing
and genetic algorithm," 2015 International Conference on Advances in Computer Engineering and
Applications, pp. 1028-1032, 2015.
urn
6. P. Lottes, J. Behley, A. Milioto and C. Stachniss, "Fully Convolutional Networks With Sequential
Information for Robust Crop and Weed Detection in Precision Farming," in IEEE Robotics and
Automation Letters, vol. 3, no. 4, pp. 2870-2877, Oct. 2018.
7. Akhtar, A., A. Khanum, S. A. Khan, and A. Shaukat. Automated Plant Disease Analysis (APDA):
Performance comparison of machine learning techniques. Proceedings of the 11th International
Conference on Frontiers of Information Technology, 60–65, 2013.
Jo
8. Semary, N. A., Tharwat, A., Elhariri, E., & Hassanien, A. E. (2015). Fruit-Based Tomato Grading System
Using Features Fusion and Support Vector Machine. Intelligent Systems’, 401–410, 2014.
9. Prasad, S., Peddoju, S. K., & Ghosh, D. (2015). Multi-resolution mobile vision system for plant leaf
disease diagnosis. Signal, Image and Video Processing, 10(2), 379–388.
21
Journal Pre-proof
10. D. Ashourloo, H. Aghighi, A. A. Matkan, M. R. Mobasheri and A. M. Rad, "An Investigation Into
Machine Learning Regression Techniques for the Leaf Rust Disease Detection Using Hyperspectral
Measurement," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,
vol. 9, no. 9, pp. 4344-4351, 2016.
11. Parikh, M. S. Raval, C. Parmar and S. Chaudhary, "Disease Detection and Severity Estimation in Cotton
of
Plant from Unconstrained Images," 2016 IEEE International Conference on Data Science and Advanced
Analytics (DSAA), pp. 594-601, 2016.
12. H. Sabrol and K. Satish, "Tomato plant disease classification in digital images using classification
tree," 2016 International Conference on Communication and Signal Processing (ICCSP), 2016, pp. 1242-
pro
1246.
13. P. B. Padol and A. A. Yadav, "SVM classifier based grape leaf disease detection," Proceedings of the
Conference on Advances in Signal Processing (CASP), 2016, pp. 175-179.
14. S. Kaur, S. Pandey and S. Goel, "Semi-automatic leaf disease detection and classification system for
soybean culture," in IET Image Processing, vol. 12, no. 6, pp. 1038-1048, 2018.
re-
15. T. Mehra, V. Kumar and P. Gupta, "Maturity and disease detection in tomato using computer vision," 2016
Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC), Waknaghat,
2016, pp. 399-403.
16. Annis Fathima, R. Karthik, V. Vaidehi, Image stitching with combined moment invariants and SIFT
features, Elsevier Procedia Computer Science, Vol. 19, pp. 420 – 427, 2013.
lP
17. R. Karthik, Annis Fathima, V. Vaidehi, Panoramic view creation using Invariant moments and SURF
features, Third IEEE International Conference on Recent trends in Information technology ICRTIT,
2013.
18. Menaka, R. and Karthik, R. ‘A novel feature extraction scheme for visualisation of 3D anatomical
a
structures’, Int. J. Biomedical Engineering and Technology, Vol. 21, No. 1, pp.49–66, 2016.
19. Dandawate, Y., and R. Kokare. An automated approach for classification of plant diseases towards
urn
development of futuristic decision support system in Indian perspective. Proceedings of the International
Conference on Advances in Computing, Communications and Informatics (ICACCI), 794–99, 2015.
20. C. S. Hlaing and S. M. Maung Zaw, "Tomato Plant Diseases Classification Using Statistical Texture
Feature and Color Feature," 2018 IEEE/ACIS 17th International Conference on Computer and
Information Science (ICIS), Singapore, 2018, pp. 439-444.
21. J. Shijie, J. Peiyi, H. Siping and s. Haibo, "Automatic detection of tomato diseases and pests based on leaf
Jo
22. E. Suryawati, R. Sustika, R. S. Yuwana, A. Subekti and H. F. Pardede, "Deep Structured Convolutional
Neural Network for Tomato Diseases Detection," 2018 International Conference on Advanced Computer
Science and Information Systems (ICACSIS), pp. 385-390, 2018.
22
Journal Pre-proof
23. Aravind Krishnaswamy Rangarajan, Raja Purushothaman, Aniirudh Ramesh, Tomato crop disease
classification using pre-trained deep learning algorithm, Procedia Computer Science,Volume 133,
2018,Pages 1040-1047.
24. Jayme Garcia Arnal Barbedo, Plant disease identification from individual lesions and spots using deep
learning,Biosystems Engineering,Volume 180,2019,Pages 96-107.
of
25. Keke Zhang, Qiufeng Wu, Anwang Liu, and Xiangyan Meng, “Can Deep Learning Identify Tomato Leaf
Disease?,” Advances in Multimedia, vol. 2018, Article ID 6710865, 10 pages, 2018.
26. Liang S., Zhang W. (2020) Accurate Image Recognition of Plant Diseases Based on Multiple Classifiers
Integration. In: Jia Y., Du J., Zhang W. (eds) Proceedings of 2019 Chinese Intelligent Systems Conference.
pro
CISC 2019. Lecture Notes in Electrical Engineering, vol 594. Springer.
27. P. Tm, A. Pranathi, K. SaiAshritha, N. B. Chittaragi and S. G. Koolagudi, "Tomato Leaf Disease Detection
Using Convolutional Neural Networks," 2018 Eleventh International Conference on Contemporary
Computing (IC3), pp. 1-5, 2018.
28. Q. H. Cap, H. Tani, H. Uga, S. Kagiwada and H. Iyatomi, "Super-Resolution for Practical Automated
re-
Plant Disease Diagnosis System," 2019 53rd Annual Conference on Information Sciences and Systems
(CISS), Baltimore, MD, USA, 2019, pp. 1-6.
29. Qiaokang Liang, Shao Xiang, Yucheng Hu, Gianmarc Coppola, Dan Zhang, Wei Sun, PD2SE-Net:
Computer-assisted plant disease diagnosis and severity estimation network, Computers and Electronics in
Agriculture,Volume 157,2019,Pages 518-529.
lP
30. Srdjan Sladojevic, Marko Arsenovic, Andras Anderla, Dubravko Culibrk, and Darko Stefanovic, Deep
Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification, Computational
Intelligence and Neuroscience, Volume 2016, Article ID 3289801, 11 pages.
31. Alvaro Fuentes, Sook Yoon, Sang Cheol Kim and Dong Sun Park, A Robust Deep-Learning-Based
Detector for Real-Time Tomato Plant Diseases and Pests Recognition, Sensors 2017, 17, 2022, pp. 1-21.
a
32. Ferdouse Ahmed Foysal M., Shakirul Islam M., Abujar S., Akhter Hossain S. (2020) A Novel Approach
for Tomato Diseases Classification Based on Deep Convolutional Neural Networks. In: Uddin M., Bansal
urn
33. Ruedeeniraman N., Ikeda M., Barolli L. (2020) Performance Evaluation of VegeCare Tool for Tomato
Disease Classification. In: Barolli L., Nishino H., Enokido T., Takizawa M. (eds) Advances in Networked-
based Information Systems. NBiS - 2019 2019. Advances in Intelligent Systems and Computing, vol 1036.
Springer.
Jo
34. Fuentes A., Im D.H., Yoon S., Park D.S. (2017) Spectral Analysis of CNN for Tomato Disease
Identification. In: Rutkowski L., Korytkowski M., Scherer R., Tadeusiewicz R., Zadeh L., Zurada J. (eds)
Artificial Intelligence and Soft Computing. ICAISC 2017. Lecture Notes in Computer Science, vol 10245.
Springer.
23
Journal Pre-proof
35. M. Sardogan, A. Tuncer and Y. Ozen, "Plant Leaf Disease Detection and Classification Based on CNN
with LVQ Algorithm," 2018 3rd International Conference on Computer Science and Engineering
(UBMK), pp. 382-385, 2018.
of
37. A. Elhassouny and F. Smarandache, "Smart mobile application to recognize tomato leaf diseases using
Convolutional Neural Networks," 2019 International Conference of Computer Science and Renewable
Energies (ICCSRE), Agadir, Morocco, 2019, pp. 1-4.
pro
38. H. Durmuş, E. O. Güneş and M. Kırcı, "Disease detection on the leaves of the tomato plants by using deep
learning," 2017 6th International Conference on Agro-Geoinformatics, Fairfax, VA, 2017, pp. 1-5.
re-
a lP
urn
Jo
24
Journal Pre-proof
HIGHLIGHTS
An attention based deep residual network is proposed in this research to detect the type of
infection in tomato leaves.
This enhanced deep learning architecture is the first of its kind developed for automatic
of
detection of infection in tomato leaves.
95999 images were used for training the model and 24001 images were used for
validation purpose.
Experimental results indicate that the proposed attention based residual network was able
pro
to detect the type of infection with an accuracy of 98%.
re-
a lP
urn
Jo
*Declaration of Interest Statement Journal Pre-proof
Conflict of interest
None
of
pro
re-
a lP
urn
Jo