Batch 1
Batch 1
OF
Dr. M. Sridhar
Associate Professor, Department of CSE (DS & IOT)
i
BALAJI INSTITUTE OF TECHNOLOGY &
SCIENCE
Accredited by NBA (UG-CE, ECE, ME, CSE Programs) & NAAC A+ Grade
(Affiliated by JNTU Hyderabad and Approved by the AICTE, New Delhi)
NARSAMPET, WARANGAL – 506331
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
(DATA SCIENCE)
CERTIFICATE
External Examiner
i
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to our HoD & Guide, Dr. M. Sridhar
whose knowledge and guidance have motivated us to achieve goals. We never thought
possible. He has consistently been a source of motivation, encouragement and inspiration.
The time we have spent working under his supervision has truly been a pleasure.
We heartily thank to our Principal Dr. V. S. Hariharan for giving this great
opportunity and his support to complete our project.
We thank all our senior faculty members for their effort, guidance and help during our
course. Thanks to programmers and non - teaching staff of CSD Department of our college.
Finally Special thanks to our parents for their support and encouragement throughout
our life and this course. Thanks to all my friends and well-wishers for their constant support.
[Link] (22C31A6705)
MUBASHIR HUSSAIN SHARIEF (22C31A6744)
[Link] (22C31A6736)
[Link] (22C31A6715)
ii
ABSTRACT
The growing global population has intensified the demand for increased
agricultural productivity and sustainability. Traditional farming practices often rely on
general assumptions about crop selection and fertilizer application, which can lead to
inefficient resource utilization and reduced crop yields. In this context, technological
innovations such as machine learning offer immense potential to revolutionize agriculture
by enabling data-driven decision-making. This project, titled "Crop and Fertilizer
Recommendation System Using Machine Learning", is developed with the objective
of assisting farmers in making intelligent decisions regarding suitable crop cultivation
and fertilizer usage based on their specific soil and environmental conditions.
iii
TABLE OF CONTENTS
i
CERTIFICATE
AKCNOWLEDGEMENT ii
ABSTRACT iii
TABLE OF CONTENTS iv
1 Introduction
1.2) Objective 2
1.4) Motivation 3
2 Literature Survey 4
3 System Development 6
3.3) Approaches 11 8
3.4) Dataset 11 8
4 Performance Analysis 30 19
iv
4.1) About The Data 19
4.4) Output 25
5 Conclusion 26
References 28
v
1 INTRODUCTION
As we know snice the humans have started practicing or doing agriculture activities
―Agriculture‖ has become the most important activity for humans. In today’s era or
world agriculture is not only for surviving it’s also play huge part or role in economy of
any country. Agriculture plays vital role in India’s economy and in human future too. In
India it also provides large portion of employment for Indians. As a result, with passage
of time the need for production has been accumulated exponentially. thus, on
manufacture in mass amount individuals are exploitation technology in associate degree
extremely wrong method.
With the improvement of the technologies day by day there is creation of hybrid
varieties day by day. In comparison with naturally created crop these hybrid varieties
don’t offer or provide essential contents. Depending more on unnatural techniques may
lead to soil acidification and crust. These types of activities all lead up to environmental
pollution. These types of activities (unnatural activities) are for avoiding or reducing
losses. However, once the farmer or producer get or grasp the correct data on the crop
yield, it will help the farmer in avoiding or reducing the loss
Around the globe India is the second largest country in terms of population. Many
people are dependent on agriculture, but the sector lacks efficiency and technology
especially in our country. By bridging the gap between traditional agriculture and data
science, effective crop cultivation can be achieved. It is important to have a good
production of crops. The crop yield is directly influenced by the factors such as soil type,
composition of soil, seed quality, lack of technical facilities etc.
In India agriculture plays important role in economic sector and also plays the most
important role in global development. A lot more than 60% of the country's land is used
for agriculture to meet the needs of 1.3 billion people. So, adopting new technologies for
agriculture plays important role. This is going to lead our country's farmers to make a
profit. Crop prediction and fertilizer prediction in most part of part India is done on by
farmers will prefer previous or neighboring crops or most prone to the surrounding
region only because of their land and do not have sufficient information about the content
of soil like phosphorus, potassium, nitrogen.
1
and over again without trying new varieties and randomly fertilize without knowing the
amount and content that is missing. Therefore, it directly affects crop yield and acidifies
the soil result in reducing soil fertility.
We are designing the system using machine learning to help farmers in crop and
fertilizer prediction. Right crop will be recommended for a specific soil and also keeping
in mind of climatic boundaries. Also, the system provides information about the
fertilizer, the seeds needed for planting.
With the help of our system farmers can try to grow or cultivate different varieties with right
1.2) Objective
• Recommend crops that should be planted by farmers based on several criteria and
help them make an informed decision before planting.
• Recommend the most suitable fertilizer, based on the same criteria.
• In this project, we are launching a website where the following applications
are made:
• Crop recommendations, fertilizer recommendations, respectively.
• In the crop recommendation app, the user can provide soil data on his side and
the app will predict which crop the user should grow.
• With the fertilizer application, the user can enter soil data and the type of crop
they are planting, and the application will predict what the soil is lacking.
2
1.4) Motivation
Farming is a major Indian occupation. About 70% of small and medium enterprises are
based on agriculture. So, to improve farming many farmers have started using new
technologies and methods. In this case the concept of identifying crop suitability and
yield based on various production factors can increase crop quality and yield, thereby
increasing economic growth and profitability.
For agriculture to continue to grow, many farmers have begun to use the latest technology
and methods. However, there is a huge gap in knowledge about crop production and how
it can affect farm profitability.
Choosing a crop to plant is one of the biggest challenges farmers faces in growing crops.
There are several factors involved. By recommending the most suitable crops and
promoting the right crop fertilizer, a crop recommendation system can help farmers
choose the right crop yield crop.
3
2 LITERATURE SURVEY
Recommendation system for crop and fertilizer are present in market and also many
are on developing stage which consider various factors such as climate condition at the
time of plantation, rainfall, humidity or soil contents. Many research has been done in
this field and following are some of the research and paper that has been carried out in
this field.
The article ―Prediction of crop yield and fertilizer recommendation using machine
learning algorithms ―[1] concludes that the prediction of crop for all intents and
purposes yield based on location and proper implementation of algorithms have
essentially proved that the pretty much higher crop actually, kind-of yield can generally
particularly be achieved, which definitely is quite significant, or so they generally
thought. From above work I particularly conclude that for soil classification Random
Forest is definitely good with accuracy of 86.35% compared to Support Vector Machine,
which definitely really is quite significant, or so they for the most part thought.
For crop essentially yield prediction Support Vector Machine generally specifically
is particularly very good with accuracy 99.47% mostly compare to fairly kind of Random
Forest algorithm in a for all intents and purposes major way, sort of contrary to popular
belief. The work can basically literally be extended particularly further to mostly for the
most part add following functionality, particularly contrary to popular belief. Mobile
application can essentially be kind of for the most part build to generally particularly
help farmers by uploading image of farms. Crop diseases detection using image
processing in which user get pesticides based on disease images, which generally is quite
significant. Implement actually fairly Smart Irrigation System for farms to for all intents
and purposes mostly get pretty sort of much kind of higher yield, or so they kind of for all
intents and purposes thought.
Paper introduced [2] by Rakesh Kumar, M.P. Singh, Prabhat Kumar and J.P. Singh
proposed utilization of seven AI procedures i.e., ANN, SVM, KNN, Decision Tree,
Random Forest, GBDT and Regularized Gradient Forest for crop determination. The
framework is intended to recover every one of the harvests planted and season of
developing at a specific season. Yield pace of each harvest is gotten and the harvests
giving better returns are chosen. The framework likewise proposes an arrangement of
harvests to be planted to get the more significant returns
4
Leo Brieman [3], is gaining practical experience in the precision and strength and
connection of arbitrary woods calculation. Arbitrary woods calculation makes choice
trees on various information tests and afterward foresee the information from every
subset and afterward by casting a ballot offers better the response for the framework.
Irregular Forest utilized the stowing strategy to prepare the information. To support the
exactness, the arbitrariness infused needs to limit the connection ρ while keeping up with
strength.
5
3 SYSTEM DEVELOPMENT
3.1) System architecture
6
3.2) Introduction to Machine Learning
Supervised
In this learning machine learning model is provide with dataset having inputs as well as
their correct outputs too. Or we can say that labelled datasets are provided to algorithms
in machine learning model for training (guided training). Applications of supervised
learning speech recognition, spam detection, bioinformatic etc.
7
Unsupervised
In this learning labelled datasets is not provided. It tries to find pattern between the data
in the datasets. In this type of learning involvement of human or human supervision is
required less compared to the supervised learning. It can manage or handle unstructured
data and unlabeled data more easily. Though, it make easier to analyzing, finding pattern
in complex data.
3.3) Approaches
As a field, the information gadget is closely associated with computer knowledge, so
having a mathematical legacy helps you to better see and apply machine management
techniques.
For those who have never studied mathematics before, the definition of relatability
and regression, the two most commonly used methods of assessing the correlation
between quantitative statistics, is a good place to start. The relationship of degree of
communication between unstructured or independent variables to each other. Reversal is
used to look for the correlation between a single supported variable and a neutral one at
its basic level. Because they can be used for fixed variable predictions while neutral
variables are understood, retrospective facts provide predictive capabilities.
3.4) Dataset
We have considered 2 datasets. One helps recommendation of crops, and second dataset
helps in prediction or recommendation of fertilizer
Dataset for crop recommendation
As we all know that good crop production or good yield of crop depends on various
factor, in this dataset we are provided with various factors that is involved in production
of crop. With the help of this data set crop recommendation model can be created.
d) Temperature: in Celsius
8
f) Ph: tells either soil is acidic or basic
g) Rainfall: in mm
9
c) K: tells about the ratio of Potassium
d) Ph
e) soil moisture
f) crop
10
3.5) Data Preprocessing
Data is collected from various sources therefore it may contain many missing values or
raw data which is collected is processed in a manner so that it can be easily process in
different tasks like in machine learning model, data science tasks.
Model Building
Model building is a process to create a mathematical model which will help in predicting
or calculating the outcomes in future based on data collected in the past.
E.g.-
A retail wants to know the default behavior of its credit card customers. They want to
predict the probability of default for each customer in next three months.
Probability of default would lie between 0 and 1.
Assume every customer has a 10% default rate.
Probability of default for each customer in next 3 months=0.1
It moves the probability towards one of the extremes based on attributes of past
information. A customer with volatile income is more likely (closer to) to default.
A customer with healthy credit history for last years has low chances of default (closer to
0).
Steps in Model
Building Algorithm
Selection Training
Model Prediction
Scoring
11
Algorithm Selection
Example
Yes
No
Supervised Unsupervised
Learning Learning
Is dependent
variable
continuous?
Yes No
Regression Classification
Training Model
It is a process to learn relationship / correlation between independent and dependent
variables. We use dependent variable of train data set to predict/estimate. Dataset
Train Past data (known dependent variable). Used to train model. Test Future data
12
(unknown dependent variable) Used to score.
We apply training learning to test data set for prediction/estimation.
Predictive Modelling
By using former data we forecast the future
using former data E.g.-
Past Horror Movies
Future Unwatched Horror Movies
Unsupervised learning :
It is a branch of machine learning that deals with unlabeled data. Unlike supervised
learning, where the data is labeled with a specific category or outcome, unsupervised
learning algorithms are tasked with finding patterns and relationships within the data
without any prior knowledge of the data's meaning. Unsupervised machine learning
algorithms find hidden patterns and data without any human intervention, i.e., we don't
give output to our model. The training model has only input parameter values and
discovers the groups or patterns on its own.
13
Clustering:
A clustering problem is where you want to discover the inherent groupings in the data,
such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
14
• Naive Bayes
This algorithm thinks that the dataset features are all independent of each other.
Larger the dataset it works better. DAG (directed acyclic graph) is used for classification
in this or naïve bayes algorithm.
• Random forest
Random Forest has the ability to analyze crop growth related to the current climatic
conditions and biophysical change. Random forest algorithm creates decision trees on
different data samples and then predict the data from each subset and then by voting gives
better solution for the system. Random Forest uses the bagging method to train the data
which increases the accuracy of the result.
Decision Tree
Decision tree is the most powerful and popular tool for classification and prediction. A
Decision tree is a flowchart like tree structure, where each internal node denotes a test on
an attribute, each branch represents an outcome of the test, and each leaf node (terminal
node) holds a class label.
15
Figure 7 :Support vector machine
Python is a Python interpreter that transforms Python scripts to C and uses the C-
1evel API to call the Python interpreter directly. Python's creators attempt to make the
language as fun to use as possible. Python's architecture supports Lisp culture in terms of
functionality. Filters, maps, and job reduction, as well as a list comprehension,
16
dictionaries, sets, and generator expressions, are all included. Two modules (itertools and
functools) in the standard library use realistic Haskell and Standard ML tools.
NumPy library:
NumPy is a Python program library, which adds support for large, multi-dimensional
collections and matrices, as well as a large collection of mathematical functions
developed to work in these components.
The use of NumPy in Python is basically the same as that of MATLAB as they both
translate and allow customers to create projects faster as long as multiple tasks are
focused on clusters or networks rather than scales. Along with these critical ones, there
are several options:
Pandas’ library:
It is a software library in python to decrypt and analyze data. It provides data structures
and functions to manage number tables and time series. Free software released under a
three-phase BSD license. The term is taken from the term "panel data", an econometrics
term for data sets that incorporates visibility into many identical people.
Adding or modifying data engines by a robust community that allows different
applications to be integrated into data sets. High output of a combination of data and a
combination. Hierarchical indexing provides an accurate way of dealing with large-scale
data in a small data structure.
Matplotlib:
John Hunter and many others built a matplotlib Python library to create graphs, charts,
and high-quality statistics. The library can change very little information about
mathematics, and it is great. Some of the key concepts and activities in matplotlib are:
17
Picture
Every image is called an image, and every image is an axis. Drawing can be considered
as a way to draw multiple episodes.
Structure
Data is the first thing that a graph should be drawn. A keyword dictionary with keys and
values such as x and y values can be declared. Next, scatter (), bar (), and pie () can be
used to create a structure and a host of other functions.
Axis
Adjustments are possible using the number and axes obtained using the sub-sections ().
Uses a set () function to adjust x-axis and y-axis features.
Scikit learn:
The best Python Scikit-learn machine library. The SKlearn library contains many
practical machine learning tools and mathematical modeling methods, including division,
deceleration, integration and size reduction. Machine learning models used by SKlearn.
Scikit-Learn charges for tons of features and should not be used to read data or trick or
summarize it. Some of them are there to help you translate the spread.
Scikit-learn comes with many features. Some of them are here to help us explain
the spread:
• Supervised learning algorithms: Consider any professional reading algorithms you may
have studied and may be part of science. Starting with the standard line models, SVM,
decision trees are all in the science toolbox. One of the main reasons for the high level of
use of scientists is the proliferation of machine learning algorithms. I started using scikit,
and I would recommend young people to learn the scikit / machine. I will solve
supervised learning problems.
• Unchecked learning algorithms: There are also a wide variety of machine learning
algorithms ranging from compilation, feature analysis, key component analysis to
unchecked neural networks.
• Contrary verification: a variety of methods are used by SKlearn to ensure the accuracy
of the models followed with invisible details.
• Feature removal: Scientific learning to remove images and text elements.
• Datasets for different toys: This was useful when studying science. I have studied SAS
for different educational data sets. It helped them a lot to support when they read the new
library.
18
4 PERFORMANCE ANALYSIS
19
• Naive Bayes
On applying it on dataset it gives accuracy of 99.09%.
20
• Random Forest
On applying it on dataset it gives accuracy of 99.09%.
21
• Decision tree
On applying it on dataset it gives accuracy of 90%.
22
• SVM: (Support vector machine)
On applying it on dataset it gives accuracy of 97.95%.
23
4.3) Accuracy Comparison of Algorithms
24
4.4) Output:
Crop Recommended
Fertilizer Recommended
25
5 CONCLUSIONS
In this project we try to get best crop and fertilizer recommendation with the help of
machine learning. For the calculation of accuracy many machine learning techniques
were imposed or used. Numerous algorithms were used on datasets to get the best output
which leads to best crop and fertilizer recommendation for particular soil of particular
region.
This system will help farmers to visualize crop yields based on that climatic and
subsistence boundaries
Using this farmer can decide whether to plant that crop or to look for another crop if
yield forecasts are incorrect.
This tool can help the farmer to make the best decisions when it comes to growing
something harvest. It may also predict the negative effects of the plant.
Currently our farmers use outdated technology or not use effectively, so there can be an
opportunity of the wrong choice of cultivated crops that will reduce the profit by
production.
To reduce these types of loss we try to create a farmer-friendly system, which will help in
predicting which crop is best for a specific soil and this project will give the
recommendation about the fertilizer needed by the soil for cultivation, seeds needed for
cultivation, expectations yield and market price. Thus, this enables farmers to make the
right choice in choosing a crop farming so that the agricultural sector can develop with
new ideas
26
• Plant Disease Detection is used to process images where the user finds
pesticides based on their pictures of diseases.
27
REFERENCES
1. Bondre, D. A., & Mahagaonkar, S. (2019). PREDICTION OF CROP YIELD
AND FERTILIZER RECOMMENDATION USING MACHINE LEARNING
ALGORITHMS. International Journal of Engineering Applied Science and
Technology,
04(05), 371–376. [Link]
4. Priya, P., Muthaiah, U., Balamurugan, M.‖Predicting Yield of the Crop Using
Machine Learning Algorithm‖,2015
28