0% found this document useful (0 votes)
38 views4 pages

ML Paper

This study presents an automated system for plant disease detection using deep learning techniques applied to the PlantVillage dataset, achieving a validation accuracy of 95.77% with a MobileNetV2 model. The methodology emphasizes computational efficiency, making it suitable for real-time deployment in agricultural settings, thus empowering farmers with timely insights to prevent crop losses. Future enhancements are suggested to improve model robustness and applicability in diverse field conditions.

Uploaded by

MAINAK PATRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views4 pages

ML Paper

This study presents an automated system for plant disease detection using deep learning techniques applied to the PlantVillage dataset, achieving a validation accuracy of 95.77% with a MobileNetV2 model. The methodology emphasizes computational efficiency, making it suitable for real-time deployment in agricultural settings, thus empowering farmers with timely insights to prevent crop losses. Future enhancements are suggested to improve model robustness and applicability in diverse field conditions.

Uploaded by

MAINAK PATRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Plant Disease Detection using plant village dataset

Mainak Patra, Manish Kumar, Ricky Mahto, Rishant Bhakta


Department of Information Science
Nitte Meenakshi Institute Of Technology
Bangalore, India
{1NT22IS088, 1NT22IS089, 1NT22IS135, 1NT22IS136}

Abstract—Plant diseases have emerged as one of the leading that the final model remains resource-efficient without compro-
causes of agricultural loss worldwide, significantly impacting mising on accuracy. The proposed system is intended for real-
both the quality and quantity of crops. Traditional methods of time deployment in agricultural settings, empowering farmers
identifying plant diseases often rely on expert knowledge and
manual inspection, which are not only time-consuming but also with timely insights to prevent disease outbreaks and reduce
inaccessible to farmers in remote or resource-limited regions. In crop loss.
this study, we propose an automated system for plant disease
detection using deep learning techniques applied to the publicly II. R ELATED W ORK
available PlantVillage dataset. Leveraging the power of transfer Over the past decade, several researchers have explored
learning, a pre-trained MobileNetV2 model is fine-tuned to
classify plant diseases across 38 different classes based on leaf the potential of machine learning and computer vision for
images. Extensive preprocessing and data augmentation tech- automated plant disease detection. Early studies employed
niques are utilized to enhance model’s generalization capabilities. traditional machine learning classifiers such as Support Vector
The trained model achieves a validation accuracy of 95.77%, Machines (SVM), Decision Trees, and k-Nearest Neighbors
demonstrating high efficacy in identifying plant diseases. This (kNN), typically relying on handcrafted features like color
system not only enhances the speed and accuracy of disease
diagnosis but also paves the way for integration into mobile histograms, texture metrics, and edge detection. While these
applications for real-time, on-field usage by farmers. methods provided some success, they lacked robustness across
varied conditions and plant types.
I. I NTRODUCTION With the advent of deep learning, particularly CNNs,
the field has witnessed significant advancements. Ferentinos
Agricultural sustainability is a cornerstone of human sur- (2018) conducted extensive experiments using CNNs trained
vival and economic development. However, plant diseases on multiple plant datasets, achieving classification accuracies
have become a significant impediment to achieving optimal of over 99%. Similarly, Mohanty et al. (2016) demonstrated
agricultural productivity. These diseases often spread rapidly the feasibility of using deep architectures such as AlexNet and
and can decimate entire crop yields if not identified and GoogLeNet on the PlantVillage dataset with promising results.
managed promptly. The early detection of plant diseases is These studies laid the foundation for high-performance plant
thus crucial for implementing effective remedial measures. disease classifiers.
Traditionally, plant disease identification has relied on the More recently, researchers have explored enhancements to
visual inspection of crops by experienced pathologists or CNN architectures to further boost performance. Sladojevic et
agricultural experts. While effective to some extent, this al. (2016) applied deep neural networks to classify a range
method is inherently limited due to the scarcity of experts, the of plant diseases with high accuracy. Zhang et al. (2019)
subjectivity of human judgment, and the logistical challenges introduced attention-based mechanisms that enable the net-
of reaching remote farming communities. The emergence work to focus on relevant leaf areas, improving classification
of machine learning, particularly deep learning, offers an confidence. Despite these successes, many of these models are
innovative alternative by enabling automated, accurate, and computationally intensive and not easily deployable in field
scalable solutions. conditions.
Convolutional Neural Networks (CNNs) have demonstrated Our approach aims to bridge this gap by employing Mo-
exceptional performance in image classification tasks and are bileNetV2, which offers a balance between model complexity
well-suited for the problem of disease recognition from plant and performance. By leveraging transfer learning, we achieve
leaf images. In this project, we make use of the PlantVillage high classification accuracy while maintaining computational
dataset, a comprehensive repository of labeled leaf images efficiency, making the model suitable for mobile deployment.
capturing both healthy and diseased specimens. The goal is
to develop an image classification model capable of detecting III. DATASET AND P REPROCESSING
plant diseases using a lightweight and efficient deep learning The dataset used in this study is the PlantVillage dataset,
architecture. which is widely recognized in the domain of agricultural
Our work specifically employs MobileNetV2, a deep CNN machine learning. It consists of 54,305 high-quality images
optimized for mobile and embedded devices, which ensures of plant leaves categorized into 38 classes, covering a range
of diseases and healthy conditions. The images are in RGB Finally, the output layer is a Dense layer with 38 neurons
format and were captured under consistent lighting and back- (equal to the number of classes), utilizing a softmax activation
ground conditions, which ensures clarity and uniformity. function to predict the probability distribution over the classes.
Prior to training the model, the dataset undergoes several C. Compilation and Training Strategy
preprocessing steps to ensure compatibility with the chosen The model is compiled with the Adam optimizer, a widely
deep learning architecture and to enhance model performance. used optimization algorithm that combines the advantages
First, all images are resized to 224x224 pixels to match the of Adaptive Gradient Algorithm (AdaGrad) and Root Mean
input shape required by MobileNetV2. This resizing operation Square Propagation (RMSProp). An initial learning rate of
ensures consistency across the dataset and reduces the com- 0.0001 is chosen to allow for stable convergence. The cat-
putational overhead. egorical crossentropy loss function is used, as it is well-suited
To prevent overfitting and improve the generalization of for multi-class classification tasks with one-hot encoded target
the model, data augmentation techniques are employed. This labels.
includes random horizontal flipping, zooming, and rotation. Accuracy is monitored as the primary performance metric.
Such transformations simulate real-world variations in leaf In addition to dropout, early stopping is utilized during training
orientation and lighting, thus making the model more robust. to halt the process when validation performance plateaus,
The dataset is divided into training and validation subsets further mitigating the risk of overfitting. The model is trained
with an 80:20 ratio. This stratified split ensures that each class using a mini-batch size of 32 over 10 epochs. Data is fed
is well-represented in both subsets, providing a reliable basis using a generator that applies real-time data augmentation
for training and evaluating the model. including zoom, horizontal flips, and rotations, which enhances
the model’s ability to generalize to new data.
IV. M ETHODOLOGY
D. Advantages of the Methodology
The core of our methodology lies in the application of This methodology allows the combination of a lightweight
transfer learning using a pre-trained MobileNetV2 model to architecture with high accuracy, making it not only effective
perform multi-class classification of plant diseases. Transfer but also practical for deployment in real-world farming con-
learning significantly reduces the computational burden and ditions. The modularity of the architecture means it can be
training time by leveraging features learned from large-scale easily adapted to other similar classification problems, and
datasets like ImageNet. In our context, it enables us to use a the use of transfer learning significantly reduces the need for
well-optimized feature extractor for image data and fine-tune extensive training data and computational power. Overall, this
it for the task of identifying specific plant diseases. approach delivers a balanced trade-off between performance,
A. Architectural Design resource efficiency, and ease of deployment, aligning well with
MobileNetV2 is specifically designed for mobile and em- the practical needs of precision agriculture.
bedded vision applications. It is characterized by its use of
inverted residual structures and linear bottlenecks, which are V. I MPLEMENTATION
instrumental in reducing the number of parameters without a
significant loss in accuracy. This architecture is ideal for on- The model training and evaluation are implemented using
device inference where computational resources are limited, Python, TensorFlow, and Keras libraries in the Google Colab
such as smartphones and edge devices used in agricultural environment with GPU acceleration. These tools provide an
fields. efficient framework for building, training, and deploying deep
The MobileNetV2 model serves as the base archi- learning models.
tecture, with the top classification layer removed (‘in- The training process spans 10 epochs, with performance
clude top=False‘). This enables the replacement of the Im- evaluated after each epoch using the validation dataset.
ageNet classifier with a task-specific head designed for plant Throughout the training, metrics such as accuracy and loss
disease detection. During the initial training phase, the base are recorded to monitor the learning curve. The final model
model’s layers are frozen to preserve the pre-trained features, achieves a validation accuracy of 95.77%, demonstrating its
which are beneficial for general image recognition tasks like capability to generalize well to unseen data.
detecting edges, textures, and colors. Post-training, the model is saved in HDF5 format for future
B. Custom Classification Head use. Class labels and mappings are stored in a JSON file
To adapt the MobileNetV2 base model to the PlantVillage to facilitate easy interpretation of predictions. An inference
dataset, a new classification head is appended. This head pipeline is developed that allows the user to input a leaf
consists of a Global Average Pooling layer, which reduces image, preprocess it, and obtain a disease prediction with the
each feature map to a single number and effectively lowers corresponding class name.
the total number of parameters. This is followed by a Dense The model’s compact size and high accuracy make it
(fully connected) layer with 256 neurons using the ReLU well-suited for integration into mobile applications and IoT
activation function, allowing the model to learn complex non- devices. This real-time inference capability can be invaluable
linear relationships between the extracted features. Dropout for farmers seeking quick and reliable disease diagnosis in the
regularization is applied at a rate of 0.3 to reduce overfitting. field.
causing it to favor more frequently occurring classes and
misclassify rare ones. Although data augmentation techniques
were applied to reduce this disparity, it is not a complete
solution. To address this, future work should consider synthetic
data generation, targeted data collection for minority classes,
or the use of class weighting strategies during model training.
The current implementation also relies exclusively on RGB
images, which limits the information available to the model. In
many agricultural applications, multispectral and hyperspectral
imaging provide rich data that can reveal subtle physiological
traits of plants not visible in RGB. Integrating additional
imaging modalities, such as near-infrared, thermal, or flu-
orescence imaging, could significantly enhance the model’s
ability to detect early-stage diseases or differentiate between
similar symptoms caused by different pathogens. Moreover,
incorporating non-visual contextual data such as temperature,
Fig. 1: Model Accuracy humidity, rainfall patterns, and soil conditions can improve the
decision-making capabilities of the system.
From a practical deployment perspective, the model’s effec-
tiveness would be further enhanced through robust real-world
testing. Deploying the model on smartphones or edge devices
like Raspberry Pi, and evaluating it under field conditions,
would provide valuable insights into usability and perfor-
mance. In addition, implementing active learning methods
where the model can learn incrementally from new user-
submitted images could continuously improve its accuracy.
For future enhancements, we aim to expand the dataset by
collecting images from actual farm environments with varied
crops, backgrounds, and lighting conditions. We are also ex-
ploring federated learning approaches, allowing multiple users
to collaboratively improve a shared model without the need to
transfer sensitive data to a central server, thus preserving data
privacy. Furthermore, integrating the solution into multilingual
mobile applications with voice and visual aids can make it
Fig. 2: Model Loss accessible to a broader farming community across different
regions and literacy levels. These developments would con-
tribute toward making the system more inclusive, scalable, and
A. Limitations and Future Considerations truly impactful for sustainable agriculture.

While the proposed model performs admirably on the VI. C ONCLUSION AND F UTURE W ORK
PlantVillage dataset, it is important to recognize several limita-
tions that may affect its performance and applicability in real- In this study, we developed a deep learning-based system
world scenarios. One of the primary concerns is the nature of for detecting plant diseases using the PlantVillage dataset and
the dataset itself. The images within the PlantVillage dataset a MobileNetV2-based CNN model. The system demonstrated
were captured in controlled environments with consistent a validation accuracy of 95.77%, highlighting the effectiveness
lighting and plain backgrounds. These ideal conditions help of transfer learning in agricultural applications. The method-
in the development of accurate models; however, they do ology emphasizes computational efficiency, making the model
not represent the complexity and diversity of field condi- suitable for real-time deployment on mobile and embedded
tions where real-world plant images might include shadows, platforms.
overlapping foliage, pest damage, weather effects, or varying By providing accurate, fast, and scalable disease identifi-
lighting intensities. As a result, the model may exhibit reduced cation, this research supports the vision of smart agriculture.
performance when deployed outside of lab-like settings. The tool has the potential to empower farmers with actionable
Another significant limitation lies in the imbalance of class insights, reduce reliance on expert diagnosis, and contribute
distribution. Certain plant diseases are well-represented in to sustainable farming practices. With further enhancements,
the dataset, while others have relatively fewer samples. This the system can be a vital component of digital agriculture
imbalance can introduce bias in the model’s predictions, ecosystems.
A. Future Work
Although the system performs well in controlled scenarios,
several areas warrant further investigation:
• Incorporating attention mechanisms for contextual aware-
ness
• Extending to multi-lingual disease annotation
• Deploying via API or IoT-based camera systems
• Exploring federated learning for distributed agriculture
data privacy
ACKNOWLEDGMENT
We would like to extend our heartfelt gratitude to the
esteemed faculty members of the Department of Information
Science and Engineering at Nitte Meenakshi Institute of Tech-
nology for their consistent encouragement, expert insights, and
unwavering guidance throughout the duration of this project.
Their mentorship was instrumental in refining our ideas,
shaping our methodology, and ensuring academic rigor in our
research work. The theoretical foundations and practical skills
imparted during the coursework were vital in successfully
implementing the concepts explored in this project.
We also wish to acknowledge the immense contribution of
the researchers and developers who curated and made the
PlantVillage dataset publicly available. Their work provided
a rich and diverse dataset without which the implementation
of this system would not have been possible. Additionally, we
are grateful to the Kaggle community for hosting the dataset
and offering a collaborative platform that facilitates research
and innovation. Access to such high-quality, open-source data
and tools has enabled us to test, validate, and evaluate our
models comprehensively.
Furthermore, we extend our appreciation to the open-source
software community and contributors to tools such as Ten-
sorFlow, Keras, and Google Colab, which were critical in
developing, training, and deploying our models. The synergy
of academic support and open data resources played a founda-
tional role in the realization of this research and its outcomes.
R EFERENCES
[1] K. P. Ferentinos, “Deep learning models for plant disease detection and
diagnosis,” Computers and Electronics in Agriculture, vol. 145, pp. 311–
318, 2018.
[2] S. P. Mohanty, D. P. Hughes, and M. Salathé, “Using deep learning for
image-based plant disease detection,” Frontiers in Plant Science, vol. 7,
p. 1419, 2016.
[3] S. Sladojevic et al., “Deep Neural Networks Based Recognition of Plant
Diseases by Leaf Image Classification,” Computational Intelligence and
Neuroscience, 2016.
[4] S. Zhang et al., “Attention-based CNN for plant leaf classification,”
Computers and Electronics in Agriculture, vol. 168, 2019.
[5] M. Sandler et al., “MobileNetV2: Inverted Residuals and Linear Bottle-
necks,” in CVPR, 2018.
[6] TensorFlow Developers, “TensorFlow: An end-to-end open source ma-
chine learning platform,” Available at: [Link] 2024.
[7] Keras Team, “Keras Documentation,” Available at: [Link] 2024.
[8] Kaggle, “PlantVillage Dataset,” Available at: [Link]
datasets/abdallahalidev/plantvillage-dataset, Accessed 2025.
[9] J. Deng et al., “ImageNet: A Large-Scale Hierarchical Image Database,”
in CVPR, 2009.

You might also like