Finaldoc
Finaldoc
OF
Dr. M. Sridhar
Associate Professor, Department of CSE(DS & IOT)
CERTIFICATE
External Examiner
1
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to our guide, Dr. M. Sridhar
whose knowledge and guidance have motivated us to achieve goals We never thought
possible He has consistently been a source of motivation, encouragement and
inspiration The time we have spent working under his supervision has truly been a
pleasure
We thank our H.O.D, Dr. M. Sridhar of his effort and guidance and all senior
faculty members for their help during our course Thanks to programmers and non-
teaching staff of CSE Department of our college
Finally Special thanks to our parents for their support and encouragement
throughout our life and this course Thanks to all my friends and well-wishers for their
constant support
A.DEVARAJ (22C31A6705)
MUBASHIR HUSSAIN SHARIEF (22C31A6744)
L.BALAJI (22C31A6736)
D.RAHUL (22C31A6715)
2
ABSTRACT
The growing global population has intensified the demand for increased agricultural
productivity and sustainability. Traditional farming practices often rely on general assumptions
about crop selection and fertilizer application, which can lead to inefficient resource utilization
and reduced crop yields. In this context, technological innovations such as machine learning
offer immense potential to revolutionize agriculture by enabling data-driven decision-making.
This project, titled "Crop and Fertilizer Recommendation System Using Machine
Learning", is developed with the objective of assisting farmers in making intelligent decisions
regarding suitable crop cultivation and fertilizer usage based on their specific soil and
environmental conditions.
3
TABLE OF CONTENTS
CERTIFICATE 1
AKCNOWLEDGEMENT 2
ABSTRACT 3
TABLE OF CONTENTS 4
Chapter-1 Introduction 6
1.1) Introduction 6
1.3) Objective 8
1.5) Motivation 9
3.3) Approaches 11
3.4) Dataset 11
4
4.2) Accuracy comparison between different algorithms 30
4.4) Output 36
Chapter-5 Conclusions 43
5.1) Conclusion 43
References 45
5
CHAPTER-1 INTRODUCTION
1.1) Introduction
As we know snice the humans have started practicing or doing agriculture activities
“Agriculture” has become the most important activity for humans. In today’s era or world
agriculture is not only for surviving it’s also play huge part or role in economy of any country.
Agriculture plays vital role in India’s economy and in human future too. In India it also
provides large portion of employment for Indians. As a result, with passage of time the need
for production has been accumulated exponentially. thus, on manufacture in mass amount
individuals are exploitation technology in associate degree extremely wrong method.
With the improvement of the technologies day by day there is creation of hybrid varieties day
by day. In comparison with naturally created crop these hybrid varieties don’t offer or provide
essential contents. Depending more on unnatural techniques may lead to soil acidification and
crust. These types of activities all lead up to environmental pollution. These types of activities
(unnatural activities) are for avoiding or reducing losses. However, once the farmer or
producer get or grasp the correct data on the crop yield, it will help the farmer in avoiding or
reducing the loss
Around the globe India is the second largest country in terms of population. Many people are
dependent on agriculture, but the sector lacks efficiency and technology especially in our
country. By bridging the gap between traditional agriculture and data science, effective crop
cultivation can be achieved. It is important to have a good production of crops. The crop yield
is directly influenced by the factors such as soil type, composition of soil, seed quality, lack of
technical facilities etc.
In India agriculture plays important role in economic sector and also plays the most important
role in global development. A lot more than 60% of the country's land is used for agriculture
to meet the needs of 1.3 billion people. So, adopting new technologies for agriculture plays
important role. This is going to lead our country's farmers to make a profit. Crop prediction
and fertilizer prediction in most part of part India is done on by the farmers experience. Most
6
farmers will prefer previous or neighboring crops or most prone to the surrounding region only
because of their land and do not have sufficient information about the content of soil like
phosphorus, potassium, nitrogen.
"An ML based website that recommends the best crop you can plant, the fertilizer you can
use."
In this project, we are launching a website where the following applications are used:
Crop recommendations, fertilizer recommendations, respectively.
Most of the Indians have farming as their occupation. Farmers plant the same crop over and
over again without trying new varieties and randomly fertilize without knowing the amount
and content that is missing. Therefore, it directly affects crop yield and acidifies the soil result
in reducing soil fertility.
We are designing the system using machine learning to help farmers in crop and fertilizer
prediction. Right crop will be recommended for a specific soil and also keeping in mind of
climatic boundaries. Also, the system provides information about the fertilizer, the seeds
needed for planting.
With the help of our system farmers can try to grow or cultivate different varieties with right
technique, which will help farmers in maximizing their profit.
1.3) Objective
• Recommend crops that should be planted by farmers based on several criteria and help
them make an informed decision before planting.
• In this project, we are launching a website where the following applications are made:
7
• In the crop recommendation app, the user can provide soil data on his side and the app
will predict which crop the user should grow.
• With the fertilizer application, the user can enter soil data and the type of crop they are
planting, and the application will predict what the soil is lacking or overgrown and will
recommend improvement.
In the system, we propose testing of multiple algorithms and by reading the classification
report we compare the algorithms and select the best one.
It should find accuracy for the given datasets, test database accuracy, precision and recall by
comparing algorithms.
The following steps will be there in process:
1.5) Motivation
Farming is a major Indian occupation. About 70% of small and medium enterprises are based
on agriculture. So, to improve farming many farmers have started using new technologies and
methods. In this case the concept of identifying crop suitability and yield based on various
production factors can increase crop quality and yield, thereby increasing economic growth
and profitability.
For agriculture to continue to grow, many farmers have begun to use the latest technology and
methods. However, there is a huge gap in knowledge about crop production and how it can
affect farm profitability.
Choosing a crop to plant is one of the biggest challenges farmers faces in growing crops.
There are several factors involved. By recommending the most suitable crops and promoting
the right crop fertilizer, a crop recommendation system can help farmers choose the right crop
yield crop.
8
CHAPTER-2 LITERATURE SURVEY
Recommendation system for crop and fertilizer are present in market and also many are on
developing stage which consider various factors such as climate condition at the time of
plantation, rainfall, humidity or soil contents.Many research has been done in this field and
following are some of the researches and paper that has been carried out in this field.
The article “Prediction of crop yield and fertilizer recommendation using machine
learning algorithms “[1] concludes that the prediction of crop for all intents and purposes yield
based on location and proper implementation of algorithms basically essentially have
essentially proved that the pretty much higher crop actually kind of yield can generally
particularly be achieved, which definitely definitely is quite significant, or so they generally
thought. From above work I particularly particularly conclude that for soil classification really
Random Forest basically literally is definitely kind of good with accuracy 86.35% literally
essentially compare to Support Vector Machine, which definitely really is quite significant, or
so they for the most part thought.
For crop essentially yield prediction Support Vector Machine generally specifically is
particularly very good with accuracy 99.47% mostly compare to fairly kind of Random Forest
algorithm in a for all intents and purposes major way, sort of contrary to popular belief. The
work can basically literally be extended particularly particularly further to mostly for the most
part add following functionality, particularly contrary to popular belief. Mobile application
can essentially be kind of for the most part build to generally particularly help farmers by
uploading image of farms. Crop diseases detection using image processing in which user get
pesticides based on disease images, which generally is quite significant. Implement actually
fairly Smart Irrigation System for farms to for all intents and purposes mostly get pretty sort of
much kind of higher yield, or so they kind of for all intents and purposes thought.
Paper introduced [2] by Rakesh Kumar, M.P. Singh, Prabhat Kumar and J.P. Singh proposed
utilization of seven AI procedures i.e., ANN, SVM, KNN, Decision Tree, Random Forest,
GBDT and Regularized Gradient Forest for crop determination. The framework is intended to
recover every one of the harvests planted and season of developing at a specific season. Yield
pace of each harvest is gotten and the harvests giving better returns are chosen. The
framework likewise proposes an arrangement of harvests to be planted to get the more
significant returns.
Leo Brieman [3], is gaining practical experience in the precision and strength and
9
connection of arbitrary woods calculation. Arbitrary woods calculation makes choice trees on
various information tests and afterward foresee the information from every subset and
afterward by casting a ballot offers better the response for the framework. Irregular Forest
utilized the stowing strategy to prepare the information. To support the exactness, the
arbitrariness infused needs to limit the connection ρ while keeping up with strength.
10
CHAPTER-3 SYSTEM DEVELOPMENT
In machine gaining knowledge of, obligations are regularly divided into large categories.
These classifications are based totally on how data is acquired and the way the system responds
to it.
Two of the maximum widely used gadget getting to know methods are unsupervised
mastering, which gives the algorithm without a labelled information so as for it to find
structure inside its input statistics, and supervised mastering, which trains algorithms primarily
based on example input and output facts that is labelled via humans. Let's take a deeper study
each of those strategies.
Supervised
In this learning machine learning model is provide with dataset having inputs as well as their
correct outputs too. Or we can say that labelled datasets are provided to algorithms in machine
learning model for training (guided training). Applications of supervised learning speech
recognition, spam detection, bioinformatic etc.
12
Unsupervised
In this learning labelled datasets is not provided. It tries to find pattern between the data in the
datasets. In this type of learning involvement of human or human supervision is required less
compared to the supervised learning. It can manage or handle unstructured data and unlabeled
data more easily. Though, it make easier to analyzing, finding pattern in complex data.
3.3) Approaches
As a field, the information gadget is closely associated with computer knowledge, so having a
mathematical legacy helps you to better see and apply machine management techniques.
For those who have never studied mathematics before, the definition of relatability and
regression, the two most commonly used methods of assessing the correlation between
quantitative statistics, is a good place to start. The relationship of degree of communication
between unstructured or independent variables to each other. Reversal is used to look for the
correlation between a single supported variable and a neutral one at its basic level. Because
they can be used for fixed variable predictions while neutral variables are understood,
retrospective facts provide predictive capabilities.
3.3) Dataset
We have considered 2 datasets. One helps recommendation of crops, and second dataset helps
in prediction or recommendation of fertilizer
As we all know that good crop production or good yield of crop depends on various factor, in
this dataset we are provided with various factors that is involved in production of crop. With
the help of this data set crop recommendation model can be created.
13
f) Ph: tells either soil is acidic or basic
g) Rainfall: in mm
14
Figure 2 dataset for fertilizer prediction
15
3.4) Data Preprocessing
Data is collected from various sources therefore it may contain many missing values or raw
data which is collected is processed in a manner so that it can be easily process in different
tasks like in machine learning model, data science tasks.
Model Building
Model building is a process to create a mathematical model which will help in predicting or
calculating the outcomes in future based on data collected in the past.
E.g.-
A retail wants to know the default behavior of its credit card customers. They want to predict
the probability of default for each customer in next three months.
A customer with healthy credit history for last years has low chances of default (closer to 0).
Algorithm Selection
Training Model
Prediction / Scoring
16
Algorithm Selection
Example
Yes No
Supervised Unsupervised
Learning Learning
Yes No
Is dependent variable
continuous?
Regression Classification
Algorithms
Logistic Regression
Decision Tree
Random Forest
Training Model
17
We apply training learning to test data set for prediction/estimation.
Predictive Modelling
data E.g.-
Types
Supervised Learning
Unsupervised learning
Clustering:
A clustering problem is where you want to discover the inherent groupings in the data, such
as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules that
describe large portions of your data, such as people that buy X also tend to buy Y.
i. Problem definition
18
iii. Data Extraction/Collection
v. Predictive Modelling
• Logistic Regression
• Naive Bayes
This algorithm thinks that the the dataset features are all independent of each other.
Larger the dataset it works better. DAG (directed acyclic graph) is used for classification in
this or naïve bayes algorithm.
19
• Random forest
Random Forest has the ability to analyze crop growth related to the current climatic conditions
and biophysical change. Random forest algorithm creates decision trees on different data
samples and then predict the data from each subset and then by voting gives better solution for
the system. Random Forest uses the bagging method to train the data which increases the
accuracy of the result.
• Decision Tree
Decision tree is the most powerful and popular tool for classification and prediction. A
Decision tree is a flowchart like tree structure, where each internal node denotes a test on an
attribute, each branch represents an outcome of the test, and each leaf node (terminal node)
holds a class label.
Support Vector Machine (SVM)
Support Vector Machine is a relatively simple Supervised Machine Learning Algorithm used
for classification and/or regression. It is more preferred for classification but is sometimes
very useful for regression as well. Basically, SVM finds a hyper-plane that creates a boundary
between the types of data. In 2-dimensional space, this hyper-plane is nothing but a line.
In SVM, we plot each data item in the dataset in an N-dimensional space, where N is the
number of features/attributes in the data. Next, find the optimal hyperplane to separate the
data. So, by this, you must have understood that inherently, SVM can only perform binary
classification (i.e., choose between two classes).
20
3.2) Tools and libraries used
Python:
For carrying out this project in the best possible manner, we decided on using Python
Language, which comes with several pre-built libraries (such as pandas, NumPy, SciPy, and
etc.) and is loaded with numerous features for implementing data science and machine
learning techniques which allowed us to design the model in the most efficient manner
possible. For building this project we utilized numerous python libraries for executing
different operations.
● Python - Python is a robust programming language with a wide range of capabilities. Its
broad features make working with targeted programs (including meta-programming and meta-
objects) simple. Python takes advantage of power typing as well as the integration of reference
computation and waste management waste collecting. It also supports advanced word
processing (late binding), which binds the way the words change during the process.
Patches to fewer essential sections of C Python that can give a minor improvement in
performance at an obvious price are rejected by Python developers who try to prevent
premature execution. When speed is crucial, the Python program developer can use mod-
written modules in C-languages or PyPy, a timely compiler, to submit time-sensitive jobs.
Cython is a Python interpreter that transforms Python scripts to C and uses the C-1evel API to
call the Python interpreter directly. Python's creators attempt to make the language as fun to
use as possible. Python's architecture supports Lisp culture in terms of functionality. Filters,
maps, and job reduction, as well as a list comprehension, dictionaries, sets, and generator
expressions, are all included.Two modules (itertools and functools) in the standard library use
realistic Haskell and Standard ML tools
21
Why use Python?
We're using Python because it works on a wide range of platforms. Python is a language with
no stages. Python is a as simple as English. Python have many libraries and has a simple
linguistic structure similar to English, whereas Java and C++ have complicated codes. Python
applications contain less lines than programs written in other languages. That is why we
choose Python for artificial intelligence, artificial consciousness, and dealing with massive
volumes of data. Python is an article-oriented programming language. Classes, objects,
polymorphism, exemplification, legacy, and reflection are all concepts in Python.
NumPy library:
NumPy is a Python program library, which adds support for large, multi-dimensional
collections and matrices, as well as a large collection of mathematical functions developed to
work in these components.
The use of NumPy in Python is basically the same as that of MATLAB as they both translate
and allow customers to create projects faster as long as multiple tasks are focused on clusters
or networks rather than scales. Along with these critical ones, there are several options:
Pandas’ library:
It is a software library in python to decrypt and analyze data. It provides data structures and
functions to manage number tables and time series. Free software released under a three-phase
BSD license. The term is taken from the term "panel data", an econometrics term for data sets
that incorporates visibility into many identical people.
Adding or modifying data engines by a robust community that allows different applications to
be integrated into data sets. High output of a combination of data and a combination.
22
Hierarchical indexing provides an accurate way of dealing with large-scale data in a small data
structure.
Matplotlib:
John Hunter and many others built a matplotlib Python library to create graphs, charts, and
high-quality statistics. The library can change very little information about mathematics, and it
is great. Some of the key concepts and activities in matplotlib are:
Picture
Every image is called an image, and every image is an axis. Drawing can be considered as a
way to draw multiple episodes.
Structure
Data is the first thing that a graph should be drawn. A keyword dictionary with keys and
values such as x and y values can be declared. Next, scatter (), bar (), and pie () can be used to
create a structure and a host of other functions.
Axis
Adjustments are possible using the number and axes obtained using the sub-sections (). Uses a
set () function to adjust x-axis and y-axis features.
23
Scikit learn:
The best Python Scikit-learn machine library. The sklearn library contains many practical
machine learning tools and mathematical modeling methods, including division, deceleration,
integration and size reduction. Machine learning models used by sklearn. Scikit-Learn charges
for tons of features and should not be used to read data or trick or summarize it. Some of them
are there to help you translate the spread.
Scikit-learn comes with many features. Some of them are here to help us explain the spread:
• Supervised learning algorithms: Consider any professional reading algorithms you may have
studied and may be part of science. Starting with the standard line models, SVM, decision
trees are all in the science toolbox. One of the main reasons for the high level of use of
scientists is the proliferation of machine learning algorithms. I started using scikit, and I would
recommend young people to learn the scikit / machine. I will solve supervised learning
problems.
• Unchecked learning algorithms: There are also a wide variety of machine learning
algorithms ranging from compilation, feature analysis, key component analysis to unchecked
neural networks.
• Contrary verification: a variety of methods are used by sklearn to ensure the accuracy of the
models followed with invisible details.
• Datasets for different toys: This was useful when studying science. I have studied SAS for
different educational data sets. It helped them a lot to support when they read the new library.
24
Chapter-4 PERFORMANCE ANALYSIS
The data used in this project is made by enlarging and consolidating India’s publicly available
data sets such as weather, soil, etc. This data is simple compared to very few factors but useful
as opposed to complex factors that affect crop yields.
The data are rich in Nitrogen, Phosphorus, Potassium, and soil pH. Also, it contains humidity,
temperature and rainfall required for a particular plant.
• Logistic regression
25
• Naive Bayes
On applying it on dataset it gives accuracy of 99.09%.
26
• Random Forest
27
• Decision tree
28
• SVM: (Support vector machine)
29
4.3) Accuracy Comparison of Algorithms
30
4.4) Output:
Crop Recommended
Fertilizer Recommended
31
Chapter-5 CONCLUSIONS
5.1) Conclusions
In this project we try to get best crop and fertilizer recommendation with the help of machine
learning. For the calculation of accuracy many machine learning techniques were imposed or
used. Numerous algorithms were used on datasets to get the best output which leads to best
crop and fertilizer recommendation for particular soil of particular region.
This system will help farmers to visualize crop yields based on that climatic and subsistence
boundaries
Using this farmer can decide whether to plant that crop or to look for another crop if yield
forecasts are incorrect.
This tool can help the farmer to make the best decisions when it comes to growing something
harvest. It may also predict the negative effects of the plant.
Currently our farmers use outdated technology or not use effectively, so there can be an
opportunity of the wrong choice of cultivated crops that will reduce the profit by production.
To reduce these types of loss we try to create a farmer-friendly system, which will help in
predicting which crop is best for a specific soil and this project will give the recommendation
about the fertilizer needed by the soil for cultivation, seeds needed for cultivation,
expectations yield and market price. Thus, this enables farmers to make the right choice in
choosing a crop farming so that the agricultural sector can develop with new ideas
32
5.2) Future Scope
For the upcoming updates in this project we can use deep learning techniques for plant
diseases prediction with the help of images and we can also implement IOT techniques for
getting contents of soil directly from the fields.
• Current Market Conditions and analysis for information on crop market rates, production
costs, fertilizer.
• The mobile app can be developed to assist farmers with uploading farm photos.
• Plant Disease Detection is used to process images where the user finds pesticides based on
their pictures of diseases.
33
REFERENCES
[4] Priya, P., Muthaiah, U., Balamurugan, M.”Predicting Yield of the Crop Using Machine
Learning Algorithm”,2015
[5] Mishra, S., Mishra, D., Santra, G. H.,“Applications of machine learning techniques in
agricultural crop production”,2016
[6] Ramesh Medar,Vijay S, Shweta, “Crop Yield Prediction using Machine Learning
Techniques”, 2019
[ 7 ] ht t p s : / / w w w. d a t a . go v . i n
[ 9 ] ht t p s : / / e n . wi ki pe d i a . or g/ wi ki/ Ag r i cu l t ur e
34
35