0% found this document useful (0 votes)
43 views14 pages

A Deep Learning Framework For Smart Agriculture Real-Time Weed Classification Using Convolutional Neural Network

This research article presents a deep learning framework for real-time weed classification in smart agriculture using a Convolutional Neural Network (CNN). The system utilizes live video feeds to identify and classify weeds, achieving an accuracy of 95.50% and an F1 Score of 96.25%, outperforming the YOLOv4 technique. The prototype is designed for autonomous operation, enabling precise herbicide spraying while minimizing environmental impact and labor costs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views14 pages

A Deep Learning Framework For Smart Agriculture Real-Time Weed Classification Using Convolutional Neural Network

This research article presents a deep learning framework for real-time weed classification in smart agriculture using a Convolutional Neural Network (CNN). The system utilizes live video feeds to identify and classify weeds, achieving an accuracy of 95.50% and an F1 Score of 96.25%, outperforming the YOLOv4 technique. The prototype is designed for autonomous operation, enabling precise herbicide spraying while minimizing environmental impact and labor costs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

J. Smart Sens. Comput.

, 2025, 1, 25205

Journal of Smart Sensors & Computing


GR
JOURNALS

Research Article | Open Access |

A Deep Learning Framework for Smart Agriculture: Real time


Weed classification using Convolutional Neural Network
Sushilkumar S. Salve,* Sourav S. Chakraborty, Sanskar Gandhewar and Shrutika S. Girhe

Department of Electronics and Telecommunication, Sinhgad Institute of Technology, Lonavala, 410401, Maharashtra, India
*Email: [email protected] (S. S. Salve)

Abstract
Agricultural sector being the foundation of food supply and raw material production contributes significantly to the GDP
growth and value chain. Thus, the effective elimination of weed in modern day agriculture is essential as the current world
scenario demands for efficient and resourceful ways for crop cultivation and harvesting. The urgent need for elimination of
weeds arises due to their tendency to extricate all the essential minerals and moisture that the crops require for their
appropriate growth. The main objective of this study is to successfully acquire live video feed as input, classification into
categories of crop, weed and none. Finally, upon detection of weed the spraying mechanism releases a pre-determined
amount of herbicide upon the weed. A total of 5471 image samples were captured to train the CNN model. The prototype
mentioned in this paper uses Convolutional Neural Network (CNN) technique for feature extraction, Fully Connected Layers
or Dense Layers (FCLs) for classification using SoftMax as the activation function respectively. The activation function also
here is being used to remove all negative (less significant) values. Also, a comprehensive comparison was made between the
CNN and YOLOv4 technique and performance parameters of both were evaluated. The CNN technique achieved an accuracy
of 95.50% whereas YOLOv4 achieved 91.00%. Finally, the F1 Score was evaluated to be 96.25% and 91.96% respectively.
Compared to existing models, our prototype demonstrated higher accuracy and real-time adaptability in field conditions,
proving suitable for autonomous weed management systems. Unlike earlier systems that depended mostly on stored images
or fixed datasets, our approach stands out by using a live video feed to identify weeds in real-time. It’s built on a mobile
platform that can automatically spray herbicides, making precision farming possible without the need for constant human
supervision.

Keywords: Computer vision; Convolutional Neural Network; Deep learning; SoftMax; Max pooling; Weed detection; Image
pre-processing.

Received: 05 April 2025; Revised: 15 May 2025; Accepted: 27 May 2025; Published Online: 30 May 2025.

1. Introduction location.[2] Weeds can reduce yield by up to 34% if they are


In crop fields, weeds are naturally occurring plants that not controlled, whereas animal pests and diseases cause yield
compete with crops for vital resources like space, light, loss of 18% and 16%, respectively. Weed infestations can
moisture, and air, which may lower crop yield. Effective result in crop losses of roughly 23% to 44% in typical crop
weed control is essential during cultivation because they fields.[1] Simultaneously, the agricultural sector is under
impede crop growth.[1] Farmers may experience lower yields pressure to achieve steadily rising yields as the demand for
and financial losses as a result of weeds competing for more food production rises at the same time.[3] This
resources with cash crops. The impact of weeds varies emphasizes how precision farming and robotics are
depending on the crop type and the farm’s geographical necessary to increase yield while lowering dependency on

DOI: https://doi.org/10.64189/ssc.25205
© The Author(s) 2025
This article is licensed under Creative Commons Attribution Non Commercial 4.0 International (CC-BY-NC 4.0) J. Smart Sens. Comput., 2025, 1, 25205 | 1
Research article

conventional farming practices. Modern technology has are all components of the cultural approach of weed
made it possible for autonomous machines to carry out management. This approach has significant labour costs and
agricultural tasks effectively. High-quality crops can be drawbacks. Applying herbicides is thought to be a significant
produced with little human labor when robotics and alternative to hand weeding. However, excessive herbicide
intelligent machinery are integrated into agriculture.[3,4] use can result in harvest losses, harm to the environment,
A weed detection system uses machine learning high production costs, and the development of herbicide
algorithms to identify unwanted plants in an agricultural resistance. Without getting to the weeds, some of these
field. Farmers can reduce their use of weed and herbicides, pesticides even wind up on the soil and food crops. Since
which can be harmful to the environment and public health. spraying food crops is viewed as a risk to the safety of the
Plans for targeted weed control can be created by utilizing food being consumed, a thorough weed control method is
the information on the types of the weeds that the detecting necessary.
system can supply.[5] A new technology that has the potential On the other hand, as specified by P. Kavitha Reddy et
to completely transform agriculture is machine learning- al.,[5] deep learning techniques particularly those that use
based weed detection. The system's purpose is to locate and neural networks have become increasingly popular in recent
identify weeds in a field so that farmers can take specific years. These methods use big datasets of tagged images to
action to get rid of them, gather live videos and photos of a train and intricate neural network models. The neural
field, apply machine learning techniques to the same, and network automatically collects pertinent information and
then determine the weeds. Numerous methods, such as object classifies the input photos using iterative learning
detection, feature extraction, segmentation, and procedures. The YOLO algorithm is a well-known
Classification, can be used to complete this process. We implementation of the convolutional neural network (CNN),
decided to use a live feed CNN technique to address this which is the foundation of deep learning techniques in
problem it's more like analyzing the input dataset to find the computer vision (CV).
weeds.[5,6] In this paper, a low cost and robust weed detection, live
The weeds within rows might not be accurately removed video-based and elimination system with automated spraying
by conventional machinery. Sunil G C et al. emphasized using Convolutional Neural Network (CNN) as the main
while introducing their study on the thought that the computing algorithm, SoftMax and ReLU as activation
herbicide which is sprayed uniformly across the field, functions and classification of the same using Fully
treating weeds and crops alike, at a set pace when compared Connected Layers (FCLs) is given along with a detailed
to site-specific herbicide applications prove less feasible as comparison of YOLOv4 with the proposed method.
blanket herbicide applications may have a more negative The major reason for why CNN was selected is due to its
impact on the ecosystem. As a result, applying an herbicide ability to focus on fine-grained feature learning, especially
selectively to areas of concern may improve precision while useful in identifying small or overlapping weed patterns.
lowering input costs and environmental problems. YOLOv4 was chosen for comparison due to its real-time
Umamaheswari S et al.[7] mentioned about the field of detection speed. Other models like Faster R-CNN or ViT
robotic farming and precision agriculture that needs were not used due to higher computational demands
to advance in response to current problems with the lack of unsuitable for edge deployment on Raspberry Pi 4. The two
agricultural labour and resources, the emergence of new crop activation functions SoftMax and ReLU were selected for
diseases, and weeds. The issues of climate change and their simplicity, speed, and established use in CNN
sustainable agriculture are intimately tied to the challenge of architectures. Alternatives like Swish or Leaky ReLU can
effective weed classification and detection. According to improve performance but require higher computational cost
various resources and findings the study suggests, existing and tuning.
species may be exposed to new and hybrid weeds as a result
of climate change. Because weeds can hinder the growth of 2. Materials and methods
farm crops, it is crucial to create new technologies that aid in This particular section describes the materials and design
identifying them. Identifying weeds can also help remove required for the successful development of the particular
them, which lowers the need for pesticides and offers proposed system. Here a detailed overview of the
effective substitutes when the crops are harvested. components, methodology utilized and many other
O. M. Olaniyi et al.[8] mentioned about the various ways specifications are mentioned. The system prototype well
of weed eliminations as people have become more civic and integrates the combination of Internet of Things (IOT) with
knowledgeable about weeds, experts have been looking for image processing, feature extraction, deep learning
ways to eradicate the infamous pest with the least amount of algorithm and identification along with precision spraying
harm to the plant. The three main strategies for controlling unit.
weeds are cultural, chemical, and automated approaches.
Bush fallowing, mulching, fire clearance, early flooding, 2.1 System overview
hand weeding, shifting crops, and maintaining a clean reaper The proposed system is implemented using Convolutional

2 | J. Smart Sens. Comput., 2025, 1, 25205 GR Scholastic


Research article

Neural Network to develop and cultivate a robust, multi- field. Both the L293D module and the DC motors are
scalable and versatile weed detection system that produces powered using a 12V DC power supply. The L293D is also
accurate results in real time using live video feed via a interfaced with the Raspberry Pi 4 model B as the master
webcam. The input dataset then goes through various control unit.
processes and at the end determines the result based on three A Bluetooth module i.e., HC-04 is also interfaced to the
particular parameters i.e., i) weed, ii) crop, iii) none. The microcontroller for controlling the directions provided by the
various processes particularly include, Image Acquisition, motor driver module. This Bluetooth module supports V2.0+
feature extraction, classification and training of the model. EDR (Enhanced Data Rate) up to 3 Mbps modulation along
A generalized block diagram is represented as Fig. 1 that with 2.4 GHz radio transceiver and baseband. A python
gives an idea regarding the actual flow of the components program is being compiled and executed by the
within the proposed system, and their particular task microcontroller that enables the user to connect with the
involved in the accurate execution. The proposed prototype Bluetooth module using the application “Serial Bluetooth
contains various components mounted on a robust wooden Terminal”, where user can give commands in the form of
platform which are powered by a 12V DC adapter. numbers for specifying movements in specific direction (i.e.,
The main microcontroller unit i.e., Raspberry Pi 4 Model 1 = forward, 2 = reverse, 3 = left, 4 = right, 5 = terminate).
B is powered by a 5V USB-C type charger. The output can A single-channel relay module is also being used and
be observed on a desktop monitor via connection with an interfaced with the microcontroller in order to control the
HDMI cable. Fig. 2 Carefully shows stage by stage fluid pump inside the sprayer prototype. It is rated for
deployment and implementation of a particular CNN based switching up to 10A at 250V AC or 24V DC. This is also
weed detection system using Max Pooling, ReLU, Dropout, powered using the 12V DC power supply which was used to
Fully Connected Layers (FCLs) and SoftMax for multiple power the motors and driver module. Now in this relay when
stages of detection and processing of the input dataset.[9,10] the input (IN) is driven LOW, at that time the relay coil
energizes and it switches the normally closed (NC) contact
2.2 Working principle point to normally open (NO) contact. This action effectively
2.2.1 Hardware contributes in turning a connected device (i.e., the fluid
The “A Deep Learning Framework for Smart Agriculture: pump) on or off at specified intervals upon weed detection.
Real-Time Weed Classification Using CNN” uses a robust The camera module used in the particularly developed
and sturdy navigable prototype that enables the system to be system is the Xiaomi Mi HD USB 2.0 Web-Cam. It can
mounted of a hard bound wooden base with a four-wheel capture live video feeds up to a resolution of 1280×720p HD
chassis. The two forward wheels are attached with two 12V and has a frame rate of 30 FPS. With up to a 90 wide angle
DC geared motors of 300 r.p.m each and the two rear wheels field of view it has no driver requirements, thus compatible
are attached as dummy wheels for support. As they support with the Raspberry Pi 4 microcontroller. The OpenCV library
heavier load i.e., in this case a wooden platform 12V DC efficiently helps the prototype to capture and process the live
geared motors are used. These motors are then connected to feed of input dataset for image pre-processing.
an L293D module. This L293D module is a motor driver The Raspberry Pi 4 model B microcontroller acts as the
module which is widely used in embedded systems to control heart and brains of the system. This is basically a card sized
the direction of DC motors and stepper motors. This module mini-computer that operates using its own software,
is capable of driving two DC motors independently in both performing tasks that an actual desktop can perform
forward and reverse direction. This adds precision and independently including browsing, media playback and
control to the whole system and grants mobility across the major IOT development. It has a 4 GB LPDDR4-3200

Fig. 1: Block diagram of proposed prototype.

GR Scholastic J. Smart Sens. Comput., 2025, 1, 25205 | 3


Research article

Fig. 2: Schematic of the proposed system overview.

SDRAM and has a microSD card slot that comprises of the both the algorithms. The image gives a clear idea about all
actual controller software. It consists of four USB ports two the particular components and their whereabouts in the
with USB 3.0 capabilities and the other two with USB 2.0 particular model.
support. Two micro-HDMI slots are provided for interfacing
with external display peripherals supporting resolutions up to 2.3 Software
4K 60 FPS. The power supply is provided via a 5V DC USB- In regard to the proposed model in this study, Debian
C type connector, and has an ambient operating temperature GNU/Linux 10 (buster) has been installed onto the Raspberry
within the range of 0C to 50C. Pi 4 Model B as its operating system which uses Python
Fig. 3 gives a glimpse of the proposed system that is being IDLE as compiler to script and execute the python code for
cultivated and developed for the comprehensive study of the implementation and training of CNN and YOLOv4

Fig. 3: A schematic representation of the proposed prototype and its components.

4 | J. Smart Sens. Comput., 2025, 1, 25205 GR Scholastic


Research article

models. Various open-source python libraries like OpenCV


and TensorFlow have also been implemented in the same to
facilitate top-notch image processing and deep learning
model implementations for accurate classification and
detection of weed in farms and agriculture fields.
The software used in this study is adequate to support
latest hardware components like camera modules and other
hardware peripherals that are essential for the proper working
of the system and its overall performance. A personalized
dataset of varied images was curated that ensured the model Fig. 5: A demonstration of the rectified linear unit.
was trained based on images of crops and weeds of various
ethnicity portraying varied lighting, backgrounds and crop The particular non-linearity introduced by the ReLU
types are present. Augmentations such as flip, rotate, crop activation function allows the CNN network to learn more
and brightness change were also used. complex patterns and functions that are beyond the linear
relationships. This makes the network computationally more
2.4 Implementation efficient as fewer neurons activate at once, improving
2.4.1 Image Acquisition generalization, acting as simple threshold functionality.
The input data first is captured using a Xiaomi USB 2.0 HD When its compared to other functions such as sigmoid/tanh
webcam that supports capturing video datasets up to 720p etc, it avoids expensive exponentials thus facilitating faster
and a frame rate of 30 frames per second (fps). This input convergence rates during training of the network and helping
data then undergoes image pre-processing, where the pixel the gradients remain significant during backpropagations.
values originally ranging from 0 to 255 are normalized to a Fig. 5 shows how the negative inputs are converted into
scale of {0, 1}. Upon normalization, the performance of the zero’s thus introducing sparsity in the activations.[15]
CNN model improves ensuring better numerical stability and Equation (1) shows the mathematical representation of
faster convergence. The input data also undergoes grayscale ReLU as an activation function.[16]
conversion as weed detection relies more upon shapes and
𝑓(𝑥) = max⁡(0, 𝑥) (1)
textures than colour.[11] Fig. 4 accurately helps us imagine
how colour images are converted to grayscale for the model. where,
The system is made more efficient by resizing the data to if 𝑥 > 0, then 𝑓(𝑥) = 𝑥
64×64 pixels thus reducing the image size and lowering the or else 𝑥 ≤ 0, then 𝑓(𝑥) = 0
computational cost.[12,13] The above equation is observed to be common in most of the
studies as it being a very generalised equation here
particularly showing how the function actively converts
negative value inputs into zero’s and keeps the positive ones
unaffected.

Fig. 4: Grayscale conversion of input dataset.

2.5 Feature extraction


Now features are being extracted from the pre-processed
image using 2D Convolution that extracts out all important
features and patterns like edges etc.[14] The CNN model
consists of 4 convolutional layers, each with 32 filters of size
3×3, followed by ReLU activation and 2×2 max-pooling. Fig. 6: Graphical representation of ReLU.
The input images are grayscale with a resolution of 64×64
pixels. The second convolutional layer again applies 64 Fig. 6 demonstrates the how the activation function looks
filters of the same size. Rectified Linear Unit (ReLU) here like when plotted between two axis. However, when its
acts as the activation function which converts the negative limitations are taken into account, some neurons might give
values to zero thus introducing non-linearity. output as zero and never get activated. But never the less, the

GR Scholastic J. Smart Sens. Comput., 2025, 1, 25205 | 5


Research article

function has proven its efficiency and reliability even after give higher accuracy while training the model with data set
considerations of its drawbacks. but will produce low accuracy in the testing procedures, thus
Max Pooling and Dropout are also being used as Max leaving a large gap between the training and testing accuracy.
Pooling reduces the image spatial dimensions while The Dropout technique effective in such cases as during the
preserving the essential features and the Dropout reduces the training process it randomly removes a small fraction of
overfitting by randomly setting 25% of the neurons to zero neurons in the network, in our case 25%, 50% and 80% for
during training procedure. A window of 2×2 size moves all different layers, so the dropout rates were set at 0.25, 0.5, and
over the feature map, thus keeping only maximum value 0.8 respectively.
from each window. This particularly contributes in reducing In mathematical terms,[17] a mask is being applied to a set
computational complexity. of neurons according to the percentage of dropout applied
Max Pooling in CNN is basically a down sampling during the training period. At each step a mask matrix is
technique which proves extremely beneficial in reducing generated where each entry is in form of a binary variable
spatial features and dimensions of an input volume dataset. i.e., 0 or 1 indicating which neuron to be dropped or not.
It is non-linear in nature that serves for better efficiency and
𝑦 = 𝑊. (𝑀⁡⨀⁡𝑥) (2)
reduced computational power. It operates independently on
each and every depth slice of the input image and resizes it where,
spatially. It involves sliding a window called kernel of size 𝑥 =⁡input to a layer
2×2 across the input data and performing matrix 𝑊 =⁡weight matrix for particular layer
multiplication taking only the maximum values from each 𝑀 =⁡mask matrix
frame. Fig. 7 shows accurately the same using a set of sample ⨀ =⁡element - wise product
values. With dropout, the mask matrix particularly applied, where
each element of 𝑀 is ‘0’ having probability of p and ‘1’ with
probability 1 – p. During testing the dropout is called off but
the weights are scaled by 1 – p to take account of the neurons
that were dropped off during training process.
The layers are being employed where each layer detects
more and more complex patterns in the input images. These
higher-level features include shapes, edges, textures. Pooling
of such layers helps the model to recognize the objects
regardless of their position in an image thus making the
model translation-invariant. The First Dropout layer
Fig. 7: Max pooling in CNN. introduces early regularization in the dataset, preventing co-
adaptation of the neurons and encouraging increased robust
These particular maximum values then constitute a single feature learning. The Flatten layer now converts these multi-
pixel in the newly pooled output. The 2×2 window that dimensional feature maps into 1-D vector for better transition
moves all over the input image follows a particular stride of into the dense layers (FCLs). The Second Dropout layer
a certain number of pixels. This particular process when again randomly drops units from the flattened layer before its
repeated until the final output produces an output image of transition into the dense layers giving more regularization
size almost half the original and effectively reduction in which were prone to overfitting due to their large number of
pixels by 75%.[15] parameters.
Now while training a neural network, it might not only
learn the general pattern but also the noise and specific 2.6 Classification
ungeneralised details of unseen data. This overfitting might After the successful extraction of features from the input

Fig. 8: Dropout in CNN.

6 | J. Smart Sens. Comput., 2025, 1, 25205 GR Scholastic


Research article

Fig. 9: Various dropout layers in CNN.

images, the model network now flattens the image dataset network. Out of the three major components of the YOLOv4
into a 1D vector and feeds to the Fully Convolutional Layer network (i.e., backbone, neck, head), the head network is
(FCL) as it accepts only one-dimensional input. responsible for classification and final detection. It basically
E.g. applies anchor boxes on the feature maps and generates the
MaxPooling2D output = (7, 7, 64) output with particular probabilities of the classes.[20,21]
Equivalent 1D vector output = (7×7×64) = (3136) The process initiates with an input image of size
The dense layer comprises of 1024 neurons that acts as a 416× 416,⁡where multiple detection heads of different scales
hidden layer processing extracted features from previous are being used. The feature maps are of sizes 13× 13, 26 ×
CNN layers. In the output layer of the FCL, three neurons are 26 and 52× 52. If 𝑆 = Grid Size, 𝐵 =⁡Number of anchor
taken that denote three possible classes. Here the SoftMax boxes per grid cell and 𝐶 =⁡Number of classes, then the
activation function is used that converts the output into tensor output for each scale shape would be:
probabilities whose sum results to 1. It is basically a
𝑆 × 𝑆 × 𝐵 × (5 + 𝐶) (4)
mathematical function which is majorly used in cases
involving multiple classes, where vector of real numbers where,
(logits) is converted into probability distribution, where the 5 = 4 bounding box coordinates (𝑡𝑥 , 𝑡𝑦 , 𝑡𝑤 , 𝑡ℎ ) + 1
values are in the range of 0 and 1. In [18] Brahim Jabir et al. C = Class probabilities
accurately depicted and visualized how the hidden layers in In Equation (4), the output tensor of YOLOv4 has been
a fully connected dense layer interact with one another and calculated. Now the bounding box offsets relative to the
work accordingly. The CNN consisted of 3 convolutional anchor boxes are being predicted. If 𝑡𝑥 , 𝑡𝑦 are predicted
layers with filter sizes (3×3), (3×3), and (5×5) respectively, offsets for box center and 𝑡𝑤 , 𝑡ℎ are the predicted offsets for
followed by ReLU activations and MaxPooling. width and height. So, to calculate the actual box predictions,
Mathematically, Eq. 3. Accurately shows the working of the equations would look like:
SoftMax activation function for precise model prediction and
𝑏𝑥 = ⁡𝜎(𝑡𝑥 ) + ⁡ 𝑐𝑥 (5)
detection.[19]
𝑒 𝑍𝑖 𝑏𝑦 = ⁡𝜎(𝑡𝑦 ) + ⁡ 𝑐𝑦 (6)
𝑃𝑖 ⁡ = ⁡ 𝑍𝑗 (3)
∑𝑛
𝑗=1 𝑒 𝑏𝑥 = ⁡ 𝑝𝑤⁡∙ 𝑒 𝑡𝑤 (7)
where, 𝑏ℎ = ⁡ 𝑝ℎ⁡∙ 𝑒 𝑡ℎ
(8)
𝑒 𝑍𝑖 =⁡Exponential of input 𝑍𝑖 (Raw Score)
∑𝑛𝑗=1 𝑒 𝑍𝑗 =⁡Sum of exponentials of all inputs where,
𝜎 = Sigmoid function
Here, 𝑃0⁡indicates probability of crop, 𝑃1 indicates
(𝑐𝑥 , 𝑐𝑦 ) = Top-left coordinate of grid cell
probability of weed and 𝑃2 indicates probability of none. The
class with highest probability is the model’s prediction. (𝑝𝑤 , 𝑝ℎ ) = Width and height of the anchor box
Whereas in YOLOv4, the classification techniques are Now, when we actually step into the probability distribution
directly including into the object detection process. analysis over all the classes, we use the SoftMax activation
Originally this method is ideal for real-time object detection function here as well for independent multi-labelled
but in this paper, we have proposed a different approach to classification. Equation (3) shows the SoftMax
utilize CNN for real-time object detection and training. implementation of CNN as well as YOLOv4. But when we
YOLOv4 algorithm performs localization by detecting the go with sigmoid for binary per class classification, the
position of an object and classification by object type equation looks like:
identification in a single forward pass using the neural 𝑃(𝑐𝑖 ⁡|⁡𝑜𝑏𝑗𝑒𝑐𝑡) = ⁡𝜎(𝑡𝑐𝑖 ) (9)

GR Scholastic J. Smart Sens. Comput., 2025, 1, 25205 | 7


Research article

𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑐𝑖 ) = 𝑃𝑜𝑏𝑗⁡ ∙ 𝑃(𝑐𝑖 ⁡|⁡𝑜𝑏𝑗𝑒𝑐𝑡) (10) 3. Results and analysis


3.1 Performance evaluation metrics
Equation (10) denotes the final confidence probability for the
The proposed prototype in this paper is being evaluated and
class 𝑐𝑖 .[22]
judged on the basis of the following performance evaluation
parameters. These parameters are found out after conducting
2.7 Training the model
multiple number of experiments and epochs upon
A large dataset of photos from agricultural fields are gathered
considerations in regard to various factors and scenarios to
and pre-processed in order to train the proposed prototype.
ensure overall accurate analysis of the performance of the
These photos usually show different kinds of weeds and
system.
crops in a variety of backgrounds, lighting, and
environmental settings. In order to create labelled data for
supervised learning, the photos are tagged to differentiate 3.1.1 Accuracy
between weed and non-weed areas. To enhance model Accuracy of a system is basically the ratio of positively
generalization, the dataset is then enhanced using methods predicted results to the total number of observations done.
including flipping, rotation, scaling, and colour changes. To Equation (11) shows how accuracy is being calculated using
guarantee balanced learning and assess performance at following parameters where, the numerator accounts for all
various phases, the pre-processed data is separated into the predictions that the model got correct and the
training, validation, and test sets.[23] Training was performed denominator denotes all predictions that were made.[25]
with batch size of 32, 50 epochs, Adam optimizer (lr = 𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ⁡ (11)
0.001), and categorical cross-entropy loss function. 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
After the dataset is ready, a deep learning model based on where,
convolutional neural networks (CNNs) is trained to identify 𝑇𝑃 = True Positive
and categorize weeds. Using an optimizer like Adam or SGD, 𝑇𝑁 = True Negative
the model minimizes a loss function, usually cross-entropy, 𝐹𝑃 = False Positive
during training to identify patterns and characteristics that 𝐹𝑁 = False Negative
differentiate weeds from crops. The output layer predicts Here True Positive is referred to the case when the object to
class probabilities using SoftMax activation. Metrics such as be detected is actually weed and the system positively
F1-score, recall, accuracy, and precision are used to track the classifies it as a weed whereas, True Negative is the case
model's performance. Using edge devices or mobile when the class was not a weed, and the prototype accurately
applications, the top-performing model is chosen after classifies it as not a weed.
multiple epochs based on validation performance and then Now when it comes to the false detections and scenario’s
used for real-time weed detection and control in the field. we have parameters like False Positive and Negative
respectively. False Positive is the case when the model
2.8 Testing of model positively classified it to be a weed but in actual it was not a
After the model's training and validation, the testing phase weed class and False Negative is the case when the model
commences. A different test dataset with previously unseen classifies the object as not of a weed class, in practical it
photos is used to assess the trained model. This aids in belonged to the weed class.
evaluating how well the model generalizes to fresh, actual
data. To determine performance metrics like accuracy,
precision, recall, and F1-score, the model's predictions are 3.1.2 Precision
contrasted with the actual labels. These measures reveal the Precision in deep learning is a performance evaluation metric
model's ability to discriminate between weeds and crops, the basically evaluates the quality and correctness of the
particularly under difficult circumstances like changing accuracy parameter i.e., positive classifications by the model.
lighting, occlusions, or background noise. Any incorrect 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = ⁡
𝑇𝑃
(12)
classifications are examined to find trends or particular 𝑇𝑃+𝐹𝑃

instances where the model might be having trouble.[24] Equation (12) shows how precision of a model is being
The model is tested offline as well as in real time in the calculated on the basis of True Positives and False Positives.
field using Raspberry Pi 4 Model B. In this stage, the model As here we seek to determine the actual correctness of a
is fed live video input, and the accuracy of the weed detection model, hence this only considers positive classification
and localization is monitored. To make that the system scenario’s where the prediction is always right. But it also
functions well in real-world situations, its response speed, shows a major drawback by not the negatives at all, that
effectiveness, and dependability are tracked. The model is might cause the model to miss certain correct predictions
connected to an automated weed-removal sprayer, that (i.e., low recall).[26]
performs reliably and accurately. Additionally, field testing
offers insightful input for retraining or additional model 3.1.3 Recall
refinement to increase resilience. Within this performance evaluation parameter, we check in

8 | J. Smart Sens. Comput., 2025, 1, 25205 GR Scholastic


Research article

actual how many cases did the model actually classified prototype was being implemented using CNN as well as
positively out of all the positive ones. It ranges from 0 to 1. YOLOv4 deep learning algorithms and after successful
This basically measures the model’s real ability to capture all testing phase, the results have been concluded and compiled
the relevant instances of the positive class. according to the above defined performance evaluation
𝑇𝑃 metrics.
𝑅𝑒𝑐𝑎𝑙𝑙 = ⁡ (13) The results of each technique have been thoroughly
𝑇𝑃+𝐹𝑁

Equation (13) answers to our question, “Out of all the actual evaluated ensuring untampered standards and accurate real-
weeds, how many did our model find?” If a model has higher world simulation. Table 1 summarizes the result metrics of
recall, then we can safely say that the model is classifying YOLOv4 technique that was implemented on the very same
most of the positive classes, hence maximum weeds in the setup for a through comparison.
field of crops are being successfully detected. Using Equation (11) we can calculate the value of accuracy
But if the recall alone is too high, that would mean that as follows:
103 + 79
the model is classifying every object as weed, thus making 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ⁡ = 91.0%
the recall of the model 100% but reducing precision in its 103 + 79 + 6 + 12
Similarly, using Equation (12) and (13) the precision and
classification which accounts to be a failure in the model’s
recall are calculated:
classification.[27]
103
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = 94.49%
3.1.4 F1 score 103 + 6
This parameter is solely based on the values of precision and 103
recall of the particular model as it is a harmonic mean of the 𝑅𝑒𝑐𝑎𝑙𝑙 = = 89.56%
103 + 12
precision and recall of the model. It ranges in between 0
(worst) and 1 (best). This metric is the one that gives us a Now, Equation (14) is being used to calculate the F1 Score
trade-off between the precision and recall of a particular for the particular technique:
model. As the harmonic mean is observed to punish the 2 × 0.9449 × 0.8956
extreme high resulting values more, thus this is preferred 𝐹1⁡𝑆𝑐𝑜𝑟𝑒 = = 91.96%
0.9449 + 0.8956
over arithmetic mean process. As a result, both precision and
Table 1: Results during field testing using YOLOv4.
recall values have to be above the mark in order to achieve a
Field True Cases False Cases %Error %Success
reasonably higher F1 score.
Trial TN TP FN FP
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛⁡∙𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1⁡𝑠𝑐𝑜𝑟𝑒 = 2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 (14) 1 8 9 3 0 15 85
2 6 11 3 0 15 85
Equation (14) shows how mathematically F1 score is being 3 3 16 0 1 5 95
calculated using the precision and recall metric values. It is 4 10 10 0 0 0 100
especially used in cases where a particular model has an 5 7 11 2 0 10 90
imbalanced dataset or cases where the model needs to have a 6 11 9 0 0 0 100
proper balance between precision and recall.[28] 7 8 11 1 0 5 95
8 6 9 3 2 25 75
3.2 Experimental analysis 9 14 6 0 0 0 100
3.2.1 Metric values 10 6 11 0 3 15 85
The proposed prototype in this paper is trained and Total 79 103 12 6 - -
developed using a standard self-developed dataset. The Average - - - - 9.0 91.0

Fig. 10: Graphical representation of YOLOv4 results.

GR Scholastic J. Smart Sens. Comput., 2025, 1, 25205 | 9


Research article

Fig. 11: Graphical representation of CNN results.

Table 2 summarizes the result metrics of CNN technique recall are calculated:
that was implemented on the very same setup for a through 116
comparison. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = 96.66%
116 + 4
Table 2: Results during field testing using CNN. 116
Field True Cases False Cases %Error %Success 𝑅𝑒𝑐𝑎𝑙𝑙 = = 95.86%
116 + 5
Trial
Now, Equation (14) is being used to calculate the F1 Score
TN TP FN FP
for the particular technique:
1 6 11 2 1 15 85
2 6 12 2 0 10 90 2 × 0.9666 × 0.9586
𝐹1⁡𝑆𝑐𝑜𝑟𝑒 = = 96.25%
3 6 12 0 2 10 90 0.9666 + 0.9586
4 3 16 0 1 5 95
5 7 12 1 0 5 95 3.2.2 Confusion matrix
6 11 9 0 0 0 100 From the shown confusion matrix, it can be clearly observed
7 8 12 0 0 0 100 that the CNN technique has completely outperformed the
8 4 16 0 0 0 100 YOLOv4 algorithm and proven its proficiency in accurate
9 14 6 0 0 0 100 object detection and recognition. The YOLOv4 in field
10 10 10 0 0 0 100 testing lacks true positive cases (TP = 103), whereas its
Total 75 116 5 4 - - greater true negative (TN = 79) and false negative (FN = 12)
Average - - - - 4.5 95.5 values result in lower precision as compared to CNN.[29]

3.3 Discussion
116 + 75
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ⁡ = 95.50% The prototype in this paper is being developed and
116 + 75 + 4 + 5 implemented using CNN classification and YOLOv4
Similarly, using Equation (12) and (13) the precision and supervised algorithms for a comparison-based study and

Fig. 12: Graphical representation of CNN results.

10 | J. Smart Sens. Comput., 2025, 1, 25205 GR Scholastic


Research article

direct, as the two models serve complementary rather than


identical purposes.
Now as we observe in Table 3, a through comparison has
been stated amongst accuracies of four other methods being
generally used in effective object detection and
classifications with the two root methods mentioned in this
study. Jun Zhang et al.[12] mentioned in his study about the
higher accuracy of the original ViT model due to its stronger
sequence modelling abilities and unique capabilities to
capture long-range dependencies. But when we carefully
consider both CNN and ViT in a comprehensive way, then
the CNN model due to its better balance for local and global
features, results in a overall better performance and improved
classification.
Fig. 13: Confusion matrix for YOLOv4 technique.
Table 3: Comparison with other detection techniques.
Model Name Accuracy (%)
VGG16 86.21
GoogleNet 79.23
AlexNet 80.09
ViT 89.09
YOLOv4 91.00%
CNN 95.50%

As weed detection has proved to be the most challenging


task for development of autonomous robotic weed detection
and elimination systems, robust and precise computer-vision
based detection and sprayer systems that implement deep
learning algorithms can overcome this particular challenge
Fig. 14: Confusion matrix for CNN technique. by accurately identifying the weeds among the crop fields
and effectively eliminating the particularly targeted weed.
detailed analysis in the search for the best algorithm to be When we are to talk about the future scope and research
implemented. This step is particularly necessary for accurate possibilities in this particular ground then, more focus can be
classification of weeds and crops based on different asserted upon developing and curating bigger and much more
geographical locations and regions. The main objective of detailed dataset that provides much deeper and rich
this study was to determine the optimal performance of classification opportunities for the algorithm and its hidden
various deep learning (DL) algorithms in classification and layers. Also, focus can be asserted more on using hardware
precise elimination of weeds amongst the crop field. with better computational capabilities and processing power
In this study, the F1 Score that the model achieved for like powerful GPUs and high-performance CPUs as results
YOLOv4 was 91.96% while for the CNN technique it will drastically improve due to efficient processing of
achieved a score of 96.25%. This was observed as the millions of parameters and simplified matrix operations.
YOLOv4 technique is faster but cannot catch on to complex
scenarios and smaller details of the particular object to be
detected. Thus, it misses certain aspects of the weeds and 4. Conclusion
doesn’t provide detection in certain case scenarios, providing A robust Weed detection and elimination system is the
faster speeds for sure but compromises accuracy in detecting needed in-order to efficiently boost-up the agriculture sector
smaller or overlapping features of the object. Whereas, CNN for large scale production of healthy crops and utilization of
is less likely to miss object detection as it focusses more on limited agriculture resources in an efficient way. The system
specific details of an object to be detected. in this study proposes a unique way of developing a
While the custom CNN-based classifier demonstrated prototype using machine learning and deep learning
higher accuracy in identifying weed presence, it does not algorithms that harnesses computer vision technology for
localize the exact position of the weeds. This limits its accurate classification of weeds and crops without any
practical application for precision spraying. In contrast, involvement of human labour or assistance. The study
YOLOv4 is an object detector that not only identifies weeds suggests selection of appropriate deep learning technique for
but also provides spatial coordinates, enabling site-specific the task that can achieve high end and promising results in
weed management. Therefore, the comparison is not entirely the particular field of application. The CNN algorithm

GR Scholastic J. Smart Sens. Comput., 2025, 1, 25205 | 11


Research article

Fig. 15: Graphical comparison between various techniques.

proved to be more precise and accurate in doing so with an The authors confirm that there was no use of artificial
accuracy of 95.50%, precision and recall of 96.66% and intelligence (AI)-assisted technology for assisting in the
95.86% respectively. This surpasses the scores of the writing or editing of the manuscript and no images were
YOLOv4 technique for weed detection, although it cannot manipulated using AI.
beat the speed and agility of YOLOv4 but when it comes to
accurate classification and comprehensive detection CNN References
takes up the stakes and proves it worth by securing an F1 [1] C. S. G. Sunil, Y. Zhang, C. Koparan, M. R. Ahmed, K.
Score of 96.25%. The CNN classifier model is suitable for Howatt, X. Sun, Weed and crop species classification using
general field assessment, such as identifying whether weeds computer vision and deep learning technologies in
are present in an image. However, for the practical greenhouse conditions, Journal of Agriculture and Food
application of targeted and precision spraying, the YOLOv4 Research, 2022, 9, 100325, doi: 10.1016/j.jafr.2022.100325.
object detector is essential due to its ability to localize weeds
[2] B. Turan, I. Kadioglu, A. Basturk, B. Sin, A. Sadeghpour,
within the image. YOLOv4 achieved an average inference Deep learning for image-based detection of weeds from
speed of 30 FPS (frames per second), making it suitable for emergence to maturity in wheat fields, Smart Agricultural
real-time applications, whereas the custom CNN model Technology, 2024, 09, 100552, doi:
averaged around 5 FPS, making it more suitable for offline 10.1016/j.atech.2024.100552.
analysis. Although future research scope for this particular[3] A. Upadhyay, G. C. Sunil, Y. Zhang, C. Koparan, X. Sun,
field of study is broad and insightful, yet this paper Development and evaluation of a machine vision and deep
successfully highlights certain areas of aspect that can learning-based smart sprayer system for site-specific weed
significantly improve the performance of a large scale weed management in row crops: An edge computing approach,
detection and elimination system. Despite of certain Computers and Electronics in Agriculture, 2024, 216,
limitations being encountered during the implementation of 108495, doi: 10.1016/j.jafr.2024.101331.
the study such as artificial lighting conditions, shadow [4] S. Zahoor, S. A. Sof, Weed identification in crop field
overlaying etc., the authors have achieved to prove the using CNN, Journal of University of Shanghai for Science
proficiency of the particularly suggested method of and Technology, 2021, 23, 15-21, doi:
implementation for future implementations to come. 10.3390/smartcities3030039.
[5] P. K. Reddy, R. A. Reddy, M. A. Reddy, K. Sai Teja, K.
Conflict of Interest Rohith, K. Rahul, Detection of weeds by using machine
There is no conflict of interest. learning, Proceedings of the International Conference on
Emerging Trends in Engineering and Technology, 2023, 882-
Supporting Information 892.
Not applicable [6] W. -H. Su, Advanced machine learning in point
spectroscopy, RGB- and Hyperspectral-imaging for
Use of artificial intelligence (AI)-assisted technology for automatic discriminations of crops and weeds: a review,
manuscript preparation Sensors, 2021, 21, 4707, doi: 0.3390/smartcities3030039.

12 | J. Smart Sens. Comput., 2025, 1, 25205 GR Scholastic


Research article

[7] U. S. Umanaheswari, A. R. Arjun, M. D. Meganathan, Optimal speed and accuracy of object detection, arXiv
Weed detection in farm crops using parallel image preprint arXiv:2004.10934, 2020.
processing, 2018 Conference on Information and [21] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You
Communication Technology (CICT), Jabalpur, India, 2018, only look once: unified, real-time object detection,
1-4, doi: 10.1109/INFOCOMTECH.2018.8722369. Proceedings CVPR'16, 2016, 779-788, doi:
[8] O. M. Olaniyi, E. Daniya, J. G. Kolo, J. A. Bala, A. E. 10.48550/arXiv.1506.02640.
Olanrewaju, A computer vision-based weed control system [22] J. Redmon, A. Farhadi, YOLOv3: An incremental
for low-land rice precision farming, International Journal of improvement, arXiv preprint arXiv:1804.02767, 2018.
Advances in Applied Sciences, 2020, 9, 51-61, doi: [23] O. L. Garcia-Navarrete, A. Correa-Guimaraes,
10.11591/ijaas.v9.i1.pp51-61. Application of convolutional neural networks in weed
[9] M. D. Bah, A. Hafiane, R. Canals, Deep learning with detection and identification: A systematic review, Computers
unsupervised data labeling for weed detection in line crops and Electronics in Agriculture, 2024, 216, 108520.
in UAV images, Remote Sensing, 2018, 10, 1690, doi: [24] M. Ofori, O. El-Gayar, An approach for weed detection
10.3390/rs10111690. using CNNs and transfer learning, Proceedings of the 54th
[10] V. Partel, S. C. Kakaria, Y. Ampatzidis, Development Hawaii International Conference on System Sciences, 2021,
and evaluation of a low-cost and smart technology for 888-895.
precision weed management utilizing artificial intelligence, [25] R. Sapkota, J. Stenger, M. Ostlie, P. Flores, Towards
Computers and Electronics in Agriculture, 2019, 157, 339- reducing chemical usage for weed control in agriculture
350, doi: 10.1016/j.compag.2018.12.048. using UAS imagery analysis and computer vision
[11] Y. Wang, H. Liu, D. Wang, D. Liu, Image processing in techniques, Scientific Reports, 2020, 13, 6548, doi:
fault identification for power equipment based on improved 10.1038/s41598-023-33042-0.
super green algorithm, Computers & Electrical Engineering, [26] B. B. Sapkota, C. Hu, M. V. Bagavathiannan, Evaluating
2020, 87, 106753, doi: 10.1016/j.compeleceng.2020.106753. cross-applicability of weed detection models across different
[12] J. Zhang, Weed recognition method based on hybrid crops in similar production environments, Frontiers in Plant
CNN-transformer model, Frontiers in Computing and Science, 2022, 13, doi: 10.3389/fpls.2022.837726
Intelligent Systems, 2023, 4, 72-77, doi: [27] O. E. Apolo-Apolo, M. Fontanelli, C. Frasconi, M.
10.54097/fcis.v4i2.10209. Raffaelli, A. Peruzzi, M. P. Ruiz, Evaluation of YOLO object
[13] L. Moldvai, P. Ákos Mesterházi, G. Teschner, A. Nyéki, detectors for weed detection in different turfgrass scenarios,
Weed detection and classification with computer vision using Applies Sciences, 2023, 13, 8502, doi:10.3390/app13148502.
a limited image dataset, Computers and Electronics in [28] M. A. Saqib, M. Aqib, M. N. Tahir, Y. Hafeez, Towards
Agriculture, 2024, 214, 108301, doi: 10.3390/app14114839. deep learning based smart farming for intelligent weeds
[14] H. Jiang, C. Zhang, Y. Qiao, Z. Zhang, W. Zhang, C. management in crops, Frontiers in Plant Science, 2023, 14,
Song, CNN feature-based graph convolutional network for doi: 10.3389/fpls.2023.1211235.
weed and crop recognition in smart farming, Computers and [29] V. S. Babu, N. Venkatram, Weed detection and
Electronics in Agriculture, 2020, 174, 105450, doi: localization in soybean crops using YOLOv4 deep learning
10.1016/j.compag.2020.105450. model, Traitement du Signal, 2023, 41, 1019-1025, doi:
[15] M. A. Haq, CNN based automated weed detection 10.18280/ts.410242.
system using UAV imagery, Computer Systems Science and
Engineering, 2022, 42, 837-849, doi: Publisher Note: The views, statements, and data in all
[16] P. K. Reddy, R. A. Reddy, M. A. Reddy, K. S. Teja, publications solely belong to the authors and contributors.
K. Rohith, K. Rahul, Detection of weeds by using machine GR Scholastic is not responsible for any injury resulting from
learning, Proceedings of International Conference on the ideas, methods, or products mentioned. GR Scholastic
Emerging Trends in Engineering, B. Raj et al., Eds., Springer, remains neutral regarding jurisdictional claims in published
2023, 882–892, doi: 10.2991/978-94-6463-252-1_89. maps and institutional affiliations.
[17] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, R. Fergus,
Regularization of neural networks using DropConnect, Open Access
Proceedings of the 30th International Conference on This article is licensed under a Creative Commons
Machine Learning, 2013, 28, 1058-1066. Attribution-NonCommercial 4.0 International License,
[18] B. Jabir, L. Rabhi, N. Falih, RNN- and CNN-based weed which permits the non-commercial use, sharing, adaptation,
detection for crop improvement: An overview, Foods and distribution and reproduction in any medium or format, as
Raw Materials, 2021, 9, 387–396, doi: 10.21603/2308-4057- long as appropriate credit to the original author(s) and the
2021-2-387-396.h. source is given by providing a link to the Creative Commons
[19] Y. Tang, Deep learning using linear support vector licence and changes need to be indicated if there are any. The
machines, arXiv preprint arXiv:1306.0239, 2015. images or other third-party material in this article are
[20] A. Bochkovskiy, C. -Y. Wang, H. -Y. M. Liao, YOLOv4: included in the article's Creative Commons licence, unless

GR Scholastic J. Smart Sens. Comput., 2025, 1, 25205 | 13


Research article

indicated otherwise in a credit line to the material. If material


is not included in the article's Creative Commons licence and
your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission
directly from the copyright holder. To view a copy of this
licence, visit: https://creativecommons.org/licenses/by-
nc/4.0/

©The Author 2025

14 | J. Smart Sens. Comput., 2025, 1, 25205 GR Scholastic

You might also like