0% found this document useful (0 votes)
16 views74 pages

B.Tech Project: Airline Fare Prediction

The document presents a project report on 'Airline Fare Prediction Using Machine Learning' submitted by students of Bharat Institute of Engineering and Technology for their B.Tech degree. It discusses the factors influencing airline ticket prices and utilizes three datasets to apply seven different machine learning models for price prediction. The project aims to help customers understand fare fluctuations and save money when booking flights.

Uploaded by

21e11a0502
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views74 pages

B.Tech Project: Airline Fare Prediction

The document presents a project report on 'Airline Fare Prediction Using Machine Learning' submitted by students of Bharat Institute of Engineering and Technology for their B.Tech degree. It discusses the factors influencing airline ticket prices and utilizes three datasets to apply seven different machine learning models for price prediction. The project aims to help customers understand fare fluctuations and save money when booking flights.

Uploaded by

21e11a0502
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

AIRLINE FARE PREDICTION USING

MACHINE LERANING
An industry oriented Major Project Report Submitted to

Jawaharlal Nehru Technological University Hyderabad

In partial fulfillment of the requirements for the award


of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted By

MALOTH GUNA SIDDARTH (21E11A0520)


ALAKUNTLA ANIL (21E11A0502)
ALETI AJAY KUMAR (22E15A0501)
SRINAGARAM VAMSHI (21E11A0531)

Under the Supervision of

G.RAGHAVENDER
Assistant professor, CSE department.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


BHARAT INSTITUTE OF ENGINEERING AND TECHNOLOGY
Accredited by NAAC, accredited by NBA (UG Programmers: CSE, ECE)
Approved by AICTE, Affiliated to JNTUH Hyderabad
Hyderabad-501 510, Telangana.

Jan 2025
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BHARAT INSTITUTE OF ENGINEERING AND TECHNOLOGY
Accredited by NAAC, accredited by NBA (UG Programmes: CSE, ECE) Approved
by AICTE, Affiliated to JNTUH Hyderabad
Hyderabad-501 510, Telangana.

Certificate
This is to certify that the project work entitled “AIRLINE FARE PREDICITION
USING MACHINE LEARNING” is the Bonafide work done
By

MALOTH GUNA SIDDATH (21E11A0521)


ALAKUNTLA ANIL (21E11A0502)
ALETI AJAJY KUMAR (22E15A0501)
SRINAGARAM VAMSHI (21E11A0531)

In the Department of Computer Science and Engineering, BHARAT INSTITUTE OF


ENGINEERING AND TECHNOLOGY, Hyderabad is submitted to Jawaharlal Nehru
Technological University, Hyderabad in partial fulfillment of the requirements for the
award of B. Tech degree in Computer Science and Engineering during 2024-2025.

Guide: Head of the Department:


G.Raghavendar Dr. Deepak Kachave
Assistant Professor Associate Professor &Academic I/C,
Dept of CSE, Dept of CSE,
Bharat Institute of Engineering and Bharat Institute of Engineering and
Technology, Technology,
Hyderabad – 501 510 Hyderabad – 501 510

Viva-Voce held on…………………………………………

Internal Examiner External Examiner


ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of the task would be put
incomplete without the mention of the people who made it possible, whose constant
guidance and encouragement crown all the efforts with success.

We avail this opportunity to express our deep sense of gratitude and hearty thanks
to Shri CH. Venugopal Reddy, Secretary & Correspondent of BIET, for providing
congenial atmosphere and encouragement.

We would like to thank Prof. G. Kumaraswamy Rao, Director, Former Director


& O.S. of DLRL Ministry of Defence, and Dr. V. Srinivas Rao, Dean CSE for
having provided all the facilities and support.

We would like to thank our Academic Incharge Dr. Deepak Kachave, Associate
Professor of CSE,
for their expert guidance and encouragement at various levels of Project.

We are thankful to our Project Supervisor Mr.G.Raghavender, Assistant


Professor, Computer Science and Engineering for his support and cooperation
throughout the process of this Project.

We are thankful to Project Coordinator Dr. Rama Prakasha Reddy Ch, Assistant
Professor, Computer Science and Engineering for his support and cooperation
throughout the process of this project.

We place highest regards to our Parent, our Friends and Well-wishers who helped a lot in
making the report of this project.

i
DECLARATION
We hereby declare that this Industry oriented Mini Project report is titled “AIRLINE FARE

PREDICTION USING MACHINE LEARNING” is a genuine project work


carried out by us, in B. Tech (Computer Science and Engineering) degree course of Jawaharlal
Nehru Technology University Hyderabad, and has not been submitted to any other course or
university for the award of my degree by us.

Signature of the Student

MALOTH GUNA SIDDARTH


ALAKUNTLA ANIL
ALETI AJAY KUMAR
SRINAGARAM VAMSHI

ii
ABSTRACT
This paper discusses the issue of airfare. A set of characteristics defining a typical flight is chosen
for this purpose, with the assumption that these characteristics influence the price of an airline
ticket. Flight ticket prices fluctuate depending on different parameters such as flight schedule,
destination, and duration, a variety of occasions such as vacations or the holiday season. As a
result, having a basic understanding of flight rates before booking a vacation will undoubtedly
save many individuals money and time. Analyzing 3 datasets to get insights about the airline fare
and the features of the three datasets are applied to the seven different machine learning (ML)
models which are used to predict airline ticket prices, and their performance is compared. The goal
is to investigate the factors that determine the cost of a flight. The data can then be used to create
a system that predicts flight prices.

iii
CONTENTS

S. No. Chapters Page No.

i. List of contents .........................................................................................iv

ii. List of Figures .......................................................................................... vi


iii. List of Screenshots.................................................................................... vii
1. INTRODUCTION
1. INTRODUCTION TO PROJECT ……………………………………………….. 01

2. LITERATURE SURVEY………………………………………………………….02

3. MODULES…………………………………………………………………………05

2. SYSTEM ANALYSIS
1. EXISTING SYSTEM & ITS DISADVANTAGES ............................... ……….. 07

2. PROPOSED SYSTEM & ITS ADVANTAGES……………………………. ……08

3. SYSTEM REQUIREMENTS…………………………………………………...…09

3. SYSTEM STUDY
1. FEASIBILITY STUDY…………………………………………………………..10

4. SYSTEM DESIGN
1. ARCHITECTURE…………………………………………………………………...13

2. UML DIAGRAMS………………………………………………………………..13

1. USECASE DIAGRAM…………………………………………………...14

2. CLASS DIAGRAM…………………………………………………….....15

3. SEQUENCE DIAGRAM…………………………………………….....16

4. ACTIVITY DIAGRAM………………………………………………...17

5. DEPLOYMENT DIAGRA……………………………………………..18

5. TECHNOLOGIES USED
1. WHAT IS PYTHON……………………………………………………………..20
1. ADVANTAGRS & DISADVANTAGES OF PYTHON……………….…20
iv
5.1.2 HISTORY ................................................................................................. 21
2. WHAT IS MACHINE LEARNING ?......................................................................... 22
1. CATEGORIES OF ML.................................................................................. 22

2. NEED FOR ML ............................................................................................ 22

3. CHALLENGES IN ML ................................................................................ 23

4. APPLICATIONS............................................................................................ 24

5. HOW TO START LEARNING ML? ........................................................... 24

6. ADVANTAGES & DISADVANTAGES OF ML .......................................25

3. PYTHON DEVELOPMENT STEPS ......................................................................... 26

4. MODULES USED IN PYTHON .............................................................................. 27

5. INSTALL PYTHON STEP BY STEP IN WINDOWS & MAC ................................. 28

6. IMPLEMENTATION
1. SOFTWARE ENVIRONMENT ................................................................................... 38

1. PYTHON ................................................................................................... 38

2. SAMPLE CODE ...................................................................................... 39

7. SYSTEM TESTING
1. INTRODUCTION TO TESTING ............................................................................. 51
2. TESTING STRATEGIES ............................................................................................ 51

8. SCREENSHOTS ...................................................................................... 57
9. CONCLUSION .......................................................................................... 63
10. REFERENCES......................................................................................... 64

v
LIST OF FIGURES

Fig No Name Page No


4.1.1 System Architecture 13

4.2.1 Use Case diagram 14

4.2.2 Class diagram 15

4.2.3 Sequence diagram 16

4.2.4 Activity diagram 17

4.2.5 Deployment diagram 18

vi
LIST OF SCREENSHOTS

Fig No Name Page No


8.1 Home Page 57

8.2 Admin Page 57

8.3 Register Page 58

8.4 59
User Page

8.5 60
Contact Information

8.6 61
About Page

8.7 62
Uploading Data

8.8 62
Data Formed

8.9 62
Predicted page

8.10 62
About Page Success
About page 62
8.11

Predicted page 63
8.12

Predicted page 63
8.13

vii
1. INTRODUCTION
1.1 Introduction: In today's world, airlines attempt to control flight ticket costs in order to
maximize profits. Most people who fly regularly know the best times to buy cheap tickets.
However, many customers who are not good at booking tickets fall into the discount trap set
by the company, causing them to spend their money. The main goal of airline companies is
to make a profit, while the customer is looking for the best purchase. Customers frequently
aim to purchase tickets far in advance of the departure date in order to prevent price increases
as the departure date approaches. Due to the great complexity of the fare models used by
airlines, it is very difficult for a customer to buy an airline ticket at a very low price because
the price is constantly fluctuating. Airlines can lower their ticket prices when they need to
create a market and when tickets are harder to obtain. These tactics consider a number of
financial, marketing, commercial, and social factors that are all linked to ultimate flight
pricing. They might be able to get the most profit possible. As a result, costs may be
influenced by various factors. The price model used by airlines is so complex that prices
fluctuate constantly, making it very difficult for customers to buy tickets at very low prices.
Surveys of customers and airlines have grown steadily over the last two decades. From a
customer point of view, it is an important question to establish a low price or a good time to
buy a ticket. In this paper, we will be using the collected data from three different sources to
build the models using Machine Learning algorithms. Customers can save millions of rupees
by using the proposed method to get the information they need to order tickets at the proper
moment.

1
1.2 Literature Survey

TITLE: "Robust Dynamic Pricing With Strategic Customers,"


ABSTRACT: We consider the canonical revenue management (RM) problem wherein a
seller must sell an inventory of some product over a finite horizon via an anonymous, posted
price mechanism. Unlike typical models in RM, we assume that customers are forward
looking. In particular, customers arrive randomly over time and strategize about their times of
purchases. The private valuations of these customers decay over time and the customers incur
monitoring costs; both the rates of decay and these monitoring costs are private information.
This setting has resisted the design of optimal dynamic mechanisms heretofore. Optimal
pricing schemes—an almost necessary mechanism format for practical RM considerations—
have been similarly elusive.

TITLE: "Airline ticket price and demand prediction: A survey"


ABSTRACT: Nowadays, airline ticket prices can vary dynamically and significantly for
the same flight, even for nearby seats within the same cabin. Customers are seeking to get the
lowest price while airlines are trying to keep their overall revenue as high as possible and
maximize their profit. Airlines use various kinds of computational techniques to increase their
revenue such as demand prediction and price discrimination. From the customer side, two
kinds of models are proposed by different researchers to save money for customers: models
that predict the optimal time to buy a ticket and models that predict the minimum ticket price.
In this paper, we present a review of customer side and airlines side prediction models. Our
review analysis shows that models on both sides rely on limited set of features such as
historical ticket price data, ticket purchase date and departure date. Features extracted from
external factors such as social media data and search engine query are not considered.
Therefore, we introduce and discuss the concept of using social media data for ticket/demand
prediction.

2
TITLE: "Data-driven Modeling of Airlines Pricing"
ABSTRACT: The popularity of travelling by airplanes is constantly growing. Much of
existing research describe the global flight market. At the same time, Russian air market is
characterized by its peculiarities that have to be identified to build proper models of airfare. The
objective of this study is to analyze Russian air transportation market and compare the behavior
of prices on local and global flights. Using these data, collected from two independent ticket
price information aggregators (Avia Sales and Sabre) for the period of spring-summer 2015, an
empirical data-driven model was built for air prices prediction for different flight directions. We
found that the form of price dependency on purchase earliness differs dramatically between local
and international flights in two largest Russian cities (Moscow and Saint-Petersburg).

TITLE: "Airfare prices prediction using machine learning techniques,"


ABSTRACT: This paper deals with the problem of airfare prices prediction. For this
purpose a set of features characterizing a typical flight is decided, supposing that these features
affect the price of an air ticket. The features are applied to eight state of the art machine
learning (ML) models, used to predict the air tickets prices, and the performance of the models
is compared to each other. Along with the prediction accuracy of each model, this paper
studies the dependency of the accuracy on the feature set used to represent an airfare. For the
experiments a novel dataset consisting of 1814 data flights of the Aegean Airlines for a
specific international destination (from Thessaloniki to Stuttgart) is constructed and used to
train each ML model. The derived experimental results reveal that the ML models are able to
handle this regression problem with almost 88% accuracy, for a certain type of flight features.

TITLE: "A Bayesian Approach for Flight Fare Prediction Based on


Kalman Filter,"

ABSTRACT: Decision-making under uncertainty is one of the major issues faced by


recent computer-aided solutions and applications. Bayesian prediction techniques come handy
in such areas of research. In this paper, we have tried to predict flight fares using Kalman filter
which is a famous Bayesian estimation technique. This approach presents an algorithm based on
the linear model of the Kalman Filter. This model predicts the fare of a flight based on the input
provided from an observation of previous fares. The observed data is given as input in the form
of a matrix as required to the linear model, and an estimated fare for a specific upcoming flight
is calculated.

3
TITLE: "A regression model for predicting optimal purchase timing for
airline tickets,"

ABSTRACT: Optimal timing for airline ticket purchasing from the consumer’s perspective
is challenging principally because buyers have insufficient information for reasoning about
future price movements. This paper presents a model for computing expected future prices and
reasoning about the risk of price changes The proposed model is used to predict the future
expected minimum price of all available flights on specific routes and dates based on a corpus of
historical price quotes. Also, we apply our model to predict prices of flights with specific
desirable properties such as flights from a specific airline, non-stop only flights, or multi-
segment flights. By comparing models with different target properties, buyers can determine the
likely cost of their preferences. We present the expected costs of various preferences for two
high-volume routes . Performance of the prediction models presented is achieved by including
instances of time-delayed features , by imposing a class hierarchy among the raw features based
on feature similarity, and by pruning the classes of features used in prediction based on in-situ
performance. Our results show that purchase policy guidance using these models can lower the
average cost of purchases in the 2 month period prior to a desired departure. The proposed
method compares favorably with a deployed commercial web site providing similar purchase
policy recommendations.

TITLE: "Credit Card Fraud Detection Using Machine Learning,"


ABSTRACT: Credit card frauds are easy and friendly targets. E-commerce and many
other online sites have increased the online payment modes, increasing the risk for online
frauds. Increase in fraud rates, researchers started using different machine learning methods
to detect and analysis frauds in online transactions. The main aim of the paper is to design and
develop a novel fraud detection method for Streaming Transaction Data, with an objective, to
analysis the past transaction details of the customers and extract the behavioral patterns.
Where cardholders are clustered into different groups based on their transaction amount. Then
using sliding window strategy [1], to aggregate the transaction made by the cardholders.

4
1.3 MODULES
Machine learning introduces several techniques for predicting aircraft ticket pricing.
Algorithms that we have used include:

• Linear Regression.

• K-Neighbor Regression.

• Support Vector Machine.

• Decision Tree.

• Random Forest.

These models have been implemented using the sci-kit learn python library. In order to verify
the performance of these models, parameters such as R-square, MAE, MSE, and RMSE are
used.

KNN Regression
A k-neighbor regression analysis gives the average of its k nearest neighbors. Like SVM, this
is a non-parametric approach. The results are obtained using only a few values to get the
best value. KNN is a supervised classification technique used as a regressor. It adds a new
data point to the class. Since no assumptions are made, it is not parametric. It calculates the
distance between each training example and a new data set. The model selects K elements
from the data set that are near the new data point. The distance is calculated using the
Euclidean distance, the Manhattan distance or the Hamilton distance.

Linear Regression
Linear regression is a supervised learning (ML) technique. It performs regression tasks. It
is a linear model, assuming that there is a linear relationship between the input variable
(x) and a single output variable (y). Y can be calculated by linear inclusion of input variables,
especially (x). Because our data set contains many independent features that prices may
depend on, we will use multiple linear regression (MLR) to estimate the relationship between
two or more independent variables and a dependent variable.

5
Decision Tree Regression
A decision tree is a tree structure used to build regression or classification models. In
addition, a decision tree is generated for each data set that is reduced in size. This generates
solutions and leaf nodes. The decision tree selects independent variables from the dataset
as decision nodes for making a decision. When test data is entered into the model, the result
is determined by looking at which segment the data point belongs to. And the decision tree
will output the average of all data points in the subsection of the section that the data point
belongs to.

Random Forest Regression


The random forest algorithm combines less accurate models to create more accurate models.
It combines the base model with another model to create a larger model. The features are
scanned and passed on to the trees without replacement in order to generate strongly
uncorrelated decision trees. It is necessary to have a lower correlation between trees in order
to choose the best split. The main principle that distinguishes the random forest from the
decision tree is the aggregated uncorrelated trees. A random forest is an ensemble learning
technique in which the training model uses a variety of learning algorithms that are then
combined to produce a final predicted result. When the output of the random forest model is
examined, a random number of features and data sets will average the predicted values, which
falls within the bagging area of ensemble learning.

Support Vector Machine


A support vector machine (SVM) is a supervised machine learning algorithm that classifies
data by finding an optimal line or hyperplane that maximizes the distance between each class
in an N-dimensional space.

There are two approaches to calculating the margin, or the maximum distance between
classes, which are hard-margin classification and soft-margin classification.

6
2. SYSTEM ANALYSIS

1. Existing System & its Disadvantages:


Airlines can lower their ticket prices when they need to create a market and when tickets are
harder to obtain. These tactics consider a number of financial, marketing, commercial, and social
factors that are all linked to ultimate flight pricing.

They might be able to get the most profit possible. As a result, costs may be influenced by various
factors. The price model used by airlines is so complex that prices fluctuate constantly, making it
very difficult for customers to buy tickets at very low prices. Surveys of customers and airlines
have grown steadily over the last two decades.

Regression machine learning models for airline ticket price prediction have been developed by
[4]. Data from 1814 flights on a single international route was used in the development of this
model, including departure and arrival times, bag allowance, and the number of free baggage
allowances per flight. They used eight different regression machine learning models, which are
Extreme Learning Machine (ELM), Multilayer Perceptron (MLP), Generalized Regression Neural
Network, Random Forest Regression Tree, Regression Tree, Linear Regression (LR), Regression
SVM (Polynomial and Linear), Bagging Regression Tree. The model produced the following
performance results: The Bagging Regression is accurate to 87.42% and 85.91% accuracy for
Random Forest Regression Tree.

DISADVANTAGES:

• Increased Dependency: Visually-impaired individuals may become more dependent on


others for assistance in identifying objects and navigating their environment of the limiting
their independence and autonomy.
• Safety Risks: Without an object detection and recognition system, visually-impaired
individuals may be more prone to accidents and injuries due to obstacles and hazards
that they are unable to detect.

7
2.2 Proposed System & it’s Advantages:
The proposed system aims to address the issue of airfare by analysis a set of characteristics that
define a typical flight, assuming that these features significantly influence the price of an airline
ticket. The fluctuation in flight ticket prices is attributed to various parameters, including flight
schedule, destination, duration, and occasions such as vacations or holiday seasons.
Data Collection: Gather a dataset comprising historical flight information, including departure
and arrival locations, dates, times, airlines, ticket prices, and other relevant features. This dataset
should cover a wide range of routes, airlines, and time periods to capture diverse patterns. This
involves cleaning the data, handling missing values, encoding categorical variables, and possibly
feature scaling or normalization. Create new features or transform existing ones that might better
represent the relationships between the input variables and the target variable (fare). For example,
you might extract features such as day of the week, time of the day, distance between departure
and arrival locations, and any seasonal trends.
Choose appropriate machine learning algorithms for regression tasks. Common choices include
linear regression, decision trees, random forests, gradient boosting methods (like XG Boost or
Light GBM), and neural networks. Split the dataset into training and testing sets. Train the
selected model(s) on the training data.

ADVANTAGES:

Improved Accuracy: Machine learning models can analyze vast amounts of historical data and
complex patterns to make more accurate fare predictions compared to traditional methods. This can
help both airlines and travelers make better-informed decisions regarding ticket prices.
Dynamic Pricing: Airlines can leverage machine learning models to implement dynamic pricing
strategies, adjusting fares in real-time based on factors such as demand, time until departure,
competitor pricing, and seat availability. This flexibility can maximize revenue for airlines while
offering competitive prices to travelers.
Personalized Pricing: Machine learning algorithms can analyze individual traveler preferences,
booking history, and browsing behavior to offer personalized fare recommendations. This can
enhance customer satisfaction and increase loyalty by providing tailored pricing options.

8
2.3 SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

• System : i3 Processor 5th Gen.


• Hard Disk : 200 GB.
• RAM : 4GB.

SOFTWARE REQUIREMENTS:

• Operating System : Windows 10/11


• Development Software : Python 3.8
• Programming Language : Python
• Integrated Development
Environment (IDE) : Visual Studio Code
• Front End Technologies : HTML5, CSS3, Java Script
• Database Language : SQL
• Design/Modelling : Rational Rose
• machine learning models : Scikit-learn
• Data Manipulation : Pandas and Numpy
• Web application frame : Django
work

9
3. SYSTEM STUDY

3.1 FEASIBILITY STUDY


1. TECHNICAL FEASIBILITY

2. OPERATIONAL FEASIBILITY

3. ECONOMIC FEASIBILITY

INTRODUCTION

A feasibility study assesses the operational, technical and economic merits of the proposed project.
The feasibility study is intended to be a preliminary review of the facts to see if it is worthy of
proceeding to the analysis phase. From the systems analyst perspective, the feasibility analysis
is the primary tool for recommending whether to proceed to the next phase or to discontinue the
project.

The feasibility study is a management-oriented activity. The objective of a feasibility study is


to find out if an information system project can be done and to suggest possible alternative
solutions.

Projects are initiated for two broad reasons:

1. Problems that lend themselves to systems solutions

2. Opportunities for improving through:

(a) upgrading systems

(b) altering systems

(c) installing new systems

TECHNICAL FEASIBILITY

A large part of determining resources has to do with assessing technical feasibility. It considers
the technical requirements of the proposed project. The technical requirements are then
compared to the technical capability of the organization. The systems project is considered
technically feasible if the internal technical capability is sufficient to support the project
requirements. 10
The analyst must find out whether current technical resources can be upgraded or added to in a
manner that fulfils the request under consideration. This is where the expertise of system analysts
is beneficial, since using their own experience and their contact with vendors they will be able to
answer the question of technical feasibility.

The essential questions that help in testing the operational feasibility of a system include the
following:

• Is the project feasible within the limits of current technology?

• Does the technology exist at all?

• Is it available within given resource constraints?

• Is it a practical proposition?

OPERATIONAL FEASIBILITY

Operational feasibility is dependent on human resources available for the project and involves
projecting whether the system will be used if it is developed and implemented.

Operational feasibility is a measure of how well a proposed system solves the problems, and takes
advantage of the opportunities identified during scope definition and how it satisfies the
requirements identified in the requirements analysis phase of system development.

The essential questions that help in testing the operational feasibility of a system include the
following:

• Does current mode of operation provide adequate throughput and response time?

• Does current mode provide end users and managers with timely, pertinent, accurate and
useful formatted information?

• Does current mode of operation provide cost-effective information services to the


business?

• Could there be a reduction in cost and or an increase in benefits?

• Does current mode of operation offer effective controls to protect against fraud and to
guarantee accuracy and security of data and information?

11
• Does current mode of operation make maximum use of available resources, including
people, time, and flow of forms?

• Does current mode of operation provide reliable services

• Are the services flexible and expandable?

• Are the current work practices and procedures adequate to support the new system?

• If the system is developed, will it be used?

ECONOMIC FEASIBILITY

Economic analysis could also be referred to as cost/benefit analysis. It is the most frequently used
method for evaluating the effectiveness of a new system. In economic analysis the procedure is to
determine the benefits and savings that are expected from a candidate system and compare them
with costs. If benefits outweigh costs, then the decision is made to design and implement the
system. An entrepreneur must accurately weigh the cost versus benefits before taking an action.

Possible questions raised in economic analysis are:

• Is the system cost effective?

• Do benefits outweigh costs?

• The cost of doing full system study

• The cost of business employee time

• Estimated cost of hardware

• Estimated cost of software/software development

• Is the project possible, given the resource constraints?

• What are the savings that will result from the system?

• Cost of employees' time for study

• Cost of packaged software/software development.

12
4. SYSTEM DESIGN
4.1 DATA FLOW DIAGRAM:

1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing carried out
on this data, and the output data is generated by this system.

2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by the
process, an external entity that interacts with the system and the information flows in the system.

4.1.1 System Architecture

4.2 UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling
language in the field of object-oriented software engineering. The standard is managed, and
was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object oriented
computer software. In its current form UML is comprised of two major components: a Meta-
model and a notation. In the future, some form of method or process may also be added to; or
associated with, UML.

13
4.2.1 USE CASE DIAGRAM:

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis. Its purpose is to present a graphical overview of
the functionality provided by a system in terms of actors, their goals (represented as use cases),
and any dependencies between those use cases. The main purpose of a use case diagram is to show
what system functions are performed for which actor. Roles of the actors in the system can be
depicted.

Fig-4.2.1

14
4.2.2 CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of
static structure diagram that describes the structure of a system by showing the system's classes,
their attributes, operations (or methods), and the relationships among the classes. It explains which
class contains information.

Fig-4.2.2

4.2.3 SEQUENCE DIAGRAM:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram


that shows how processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams.

15
Fig-4.2.3

16
4.2.4 ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.

Fig-4.2.4

17
4.2.5 DEPLOYMENT DIAGRAM:

Deployment Diagram is a type of diagram that specifies the physical hardware on which the
software system will execute. It also determines how the software is deployed on the underlying
hardware. It maps software pieces of a system to the device that are going to execute it.

The deployment diagram maps the software architecture created in design to the physical system
architecture that executes it. In distributed systems, it models the distribution of the software
across the physical nodes.

The software systems are manifested using various artifacts, and then they are mapped to the
execution environment that is going to execute the software such as nodes. Many nodes are
involved in the deployment diagram; hence, the relation between them is represented using
communication paths.

Fig-4.2.5

18
There are two forms of a deployment diagram.

• Descriptor form

• It contains nodes, the relationship between nodes and artifacts.

• Instance form

• It contains node instance, the relationship between node instances and artifact instance.

• An underlined name represents node instances.

Purpose of a deployment diagram

Deployment diagrams are used with the sole purpose of describing how software is deployed into
the hardware system. It visualizes how software interacts with the hardware to execute the
complete functionality. It is used to describe software to hardware interaction and vice versa.

Deployment Diagram Symbol and notations

Deployment Diagram Notations

19
5. TECHNOLOGIES

1. WHAT IS PYTHION

Below are some facts about Python.

Python is currently the most widely used multi-purpose, high-level programming language.

Python allows programming in Object-Oriented and Procedural paradigms. Python


programs generally are smaller than other programming languages like Java.

Programmers have to type relatively less and indentation requirement of the language,
makes them readable all the time.

Python language is being used by almost all tech-giant companies like – Google, Amazon,
Facebook, Instagram, Dropbox, Uber… etc.

1. ADVANTAGES & DIADVANTAGES OF PYTHON


1. Extensible

As we have seen earlier, Python can be extended to other languages. You can write some of your
code in languages like C++ or C. This comes in handy, especially in projects.

2. Embeddable

Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities to
our code in the other language.

3. Improved Productivity

The language’s need to be in simplicity and extensive libraries render programmers more
productive than languages like Java and C++ do. Also, the fact that you need to write less and get
more things done.

4. Extensible

As we have seen earlier, Python can be extended to other languages. You can write some of your
code in languages like C++ or C. This comes in handy, especially in projects.

20
5. Embeddable

Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities to
our code in the other language.

6. Improved Productivity

The language’s need to be in simplicity and extensive libraries render programmers more
productive than languages like Java and C++ do. Also, the fact that you need to write less and get
more things done.

7. IOT Opportunities

Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright for the
Internet Of Things. This is a way to connect the language with the real world.

5.1.2 HISTORY OF PYTHON


What do the alphabet and the programming language Python have in common? Right, both start
with ABC. If we are talking about ABC in the Python context, it's clear that the programming
language ABC is meant. ABC is a general-purpose programming language and programming
environment, which had been developed in the Netherlands, Amsterdam, at the CWI (Centrum
Wiskunde &Informatica). The greatest achievement of ABC was to influence the design of Python.
Python was conceptualized in the late 1980s. Guido van Rossum worked that time in a project at the
CWI, called Amoeba, a distributed operating system. In an interview with Bill Venners1, Guido van
Rossum said: "In the early 1980s, I worked as an implementer on a team building a language called
ABC at Centrum Wiskundeen Informatica (CWI). I don't know how well people know ABC's
influence on Python. I try to mention ABC's influence because I'm indebted to everything I learned
during that project and to the people who worked on it. Later on in the same Interview, Guido van
Rossum continued: "I remembered all my experience and some of my frustration with ABC. I
decided to try to design a simple scripting language that possessed some of ABC's better properties,
but without its problems. So I started typing. I created a simple virtual machine, a simple parser, and
a simple runtime. I made my own version of the various ABC parts that I liked. I created a basic
syntax, used indentation for statement grouping instead of curly braces or begin-end blocks, and
developed a small number of powerful data types: a hash table (or dictionary, as we call it), a list,
strings, and numbers."

21
5.2 WHAT IS MACHINE LEARNING
Before we take a look at the details of various machine learning methods, let's start by looking
at what machine learning is, and what it isn't. Machine learning is often categorized as a
subfield of artificial intelligence, but I find that categorization can often be misleading at first
brush. The study of machine learning certainly arose from research in this context, but in the
data science application of machine learning methods, it's more helpful to think of machine
learning as a means of building models of data.

Fundamentally, machine learning involves building mathematical models to help understand


data. "Learning" enters the fray when we give these models tunable parameters that can be
adapted to observed data; in this way the program can be considered to be "learning" from the
data. Once these models have been fit to previously seen data, they can be used to predict and
understand aspects of newly observed data. I'll leave to the reader the more philosophical
digression regarding the extent to which this type of mathematical, model-based "learning" is
similar to the "learning" exhibited by the human brain. Understanding the problem setting in
machine learning is essential to using these tools effectively, and so we will start with some
broad categorizations of the types of approaches we'll discuss here.

1. Categories Of Machine Leaning

At the most fundamental level, machine learning can be categorized into two main types:
supervised learning and unsupervised learning.

Supervised learning involves somehow modeling the relationship between measured features
of data and some label associated with the data; once this model is determined, it can be used
to apply labels to new, unknown data. This is further subdivided into

2. Need for Machine Learning


Human beings, at this moment, are the most intelligent and advanced species on earth because
they can think, evaluate and solve complex problems. On the other side, AI is still in its initial
stage and haven’t surpassed human intelligence in many aspects. Then the question is that
what is the need to make machine learn? The most suitable reason for doing this is, “to make
decisions, based on data, with efficiency and scale”.

22
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence,
Machine Learning and Deep Learning to get the key information from data to perform several
real-world tasks and solve problems. We can call it data-driven decisions taken by machines,
particularly to automate the process. These data-driven decisions can be used, instead of using
programing logic, in the problems that cannot be programmed inherently. The fact is that we
can’t do without human intelligence, but other aspect is that we all need to solve real- world
problems with efficiency at a huge scale. That is why the need for machine learning arises.
Quality of data − Having good-quality data for ML algorithms is one of the biggest

challenges. Use of low-quality data leads to the problems related to data preprocessing.

Time-Consuming task − Another challenge faced by ML models is the consumption of time


especially for data acquisition, feature extraction and retrieval.

Lack of specialist persons − As ML technology is still in its infancy stage, availability of expert
resources is a tough job.

No clear objective for formulating business problems − Having no clear objective and well
-defined goal for business problems is another key challenge for ML because this technology
is not that mature yet.

5.2.3 Applications of Machines Learning :-


Machine Learning is the most rapidly growing technology and according to researchers we are
in the golden year of AI and ML. It is used to solve many real-world complex problems which
cannot be solved with traditional approach. Following are some real-world applications of ML

• Emotion analysis
• Sentiment analysis
• Error detection and prevention
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Customer segmentation

23
5.2.4 How to Start Learning Machine Learning?
Arthur Samuel coined the term “Machine Learning” in 1959 and defined it as a “Field of
study that gives computers the capability to learn without being explicitly
programmed”.
And that was the beginning of Machine Learning! In modern times, Machine Learning is one of
the most popular (if not the most!) career choices. According to Indeed, Machine Learning
Engineer Is The Best Job of 2019 with a 344% growth and an average base salary of $146,085
per year.
But there is still a lot of doubt about what exactly is Machine Learning and how to start learning
it? So this article deals with the Basics of Machine Learning and also the path you can follow
to eventually become a full-fledged Machine Learning Engineer. Now let’s get started!!!

5.2.5 How to start learning ML?

This is a rough roadmap you can follow on your way to becoming an insanely talented Machine
Learning Engineer. Of course, you can always modify the steps according to your needs to
reach your desired end-goal!

Step 1 – Understand the Prerequisites


In the case, you are a genius, you could start ML directly but normally, there are some
prerequisites that you need to know which include Linear Algebra, Multivariate Calculus,
Statistics, and Python. And if you don’t know these, never fear! You don’t need Ph.D.degree in
these topics to get started but you do need a basic understanding.

(a) Learn Linear Algebra and Multivariate Calculus

Both Linear Algebra and Multivariate Calculus are important in Machine Learning. However,
the extent to which you need them depends on your role as a data scientist. If you

are more focused on application heavy machine learning, then you will not be that heavily
focused on maths as there are many common libraries available. But if you want to focus on
R&D in Machine Learning, then mastery of Linear Algebra and Multivariate Calculus is very
important as you will have to implement many ML algorithms from scratch.

24
5.2.6 ADVANTAGES & DISADVANTAGES OF ML

Advantages of Machine learning :-

1. Easily identifies trends and patterns -

Machine Learning can review large volumes of data and discover specific trends and patterns that
would not be apparent to humans. For instance, for an e-commerce website like Amazon, it serves
to understand the browsing behaviors and purchase histories of its users to help cater to the right
products, deals, and reminders relevant to them.

2. No human intervention needed (automation)


With ML, you don’t need to babysit your project every step of the way. Since it means giving
machines the ability to learn, it lets them make predictions and also improve the algorithms
on their own. A common example of this is anti-virus softwares. they learn to filter new threats
as they are recognized. ML is also good at recognizing spam.

2. Continuous Improvement
As ML algorithms gain experience, they keep improving in accuracy and efficiency. This
lets them make better decisions. Say you need to make a weather forecast model. As the amount
of data you have keeps growing, your algorithms learn to make more accurate predictions faster.

Disadvantages of Machine Learning :-

1. Data Acquisition

Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased,
and of good quality. There can also be times where they must wait for new data to be
generated.

2. Time and Resources


ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose
with a considerable amount of accuracy and relevancy. It also needs massive resources to
function. This can mean additional requirements of computer power for you.

3. Interpretation of Results
Another major challenge is the ability to accurately interpret results generated by the algorithms.
You must also carefully choose the algorithms for your purpose.

25
5.3 PYTHON DEVELOPMENT STEPS

Guido Van Rossum published the first version of Python code (version 0.9.0) at alt
.sources in February 1991. This release included already exception handling, functions, and the
core data types of list, dict, str and others. It was also object oriented and had a module system.
Python version 1.0 was released in January 1994. The major new features included in this
release were the functional programming tools lambda, map, filter and reduce, which Guido
Van Rossum never liked. Six and a half years later in October 2000, Python 2.0
This release included list comprehensions, a full garbage collector and it was supporting
Unicode Python flourished for another 8 years in the versions 2.x before the next major release
as Python 3.0 (also known as "Python 3000" and "Py3K") was released. Python3 is not
backwards compatible with Python 2.x. The emphasis in Python 3 had been on the removal
of duplicate programming constructs and modules, thus fulfilling or coming close to
fulfilling the 13th law of the Zen of Python: "There should be one -- and preferably only one

-- obvious way to do it. Some changes in Python 7.3:

• Print is now a function


• Views and iterators instead of lists
• The rules for ordering comparisons have been simplified. E.g. a heterogeneous list
cannot be sorted, because all the elements of a list must be comparable to each other.
• There is only one integer type left, i.e. int. long is int as well.
• The division of two integers returns a float instead of an integer. "//" can be used to have
the "old" behaviour.
• Text Vs. Data Instead Of Unicode Vs. 8-bit

Purpose :-
We demonstrated that our approach enables successful segmentation of intra-retinal layers—
even with low-quality images containing speckle noise, low contrast, and different intensity
ranges throughout—with the assistance of the ANIS feature.

Python
Python is an interpreted high-level programming language for general-purpose programming.
Created by Guido van Rossum and first released in 1991, Python has a design philosophy
that emphasizes code readability, notably using significant whitespace.

26
5.4 MODULES USED IN PROJECT

Tensor flow
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for
machine learning applications such as neural networks. It is used for both research and
production at Google.

Numpy
Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays.

It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:

• A powerful N-dimensional array object


• Sophisticated (broadcasting) functions
• Tools for integrating C/C++ and Fortran code
Pandas
Pandas is an open-source Python Library providing high-performance data manipulation and
analysis tool using its powerful data structures. Python was majorly used for data munging
and preparation. It had very little contribution towards data analysis. Pandas solved this
problem. Using Pandas, we can accomplish five typical steps in the processing and analysis
of data, regardless of the origin of data load, prepare, manipulate, model, and analyze. Python
with Pandas is used in a wide range of fields including academic and commercial domains
including finance, economics, Statistics, analytics, etc.

Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures in a
variety of hardcopy formats and interactive environments across platforms. Matplotlib can be
used in Python scripts, the Python and IPython shells, the Jupyter Notebook, web application
servers, and four graphical user interface toolkits. Matplotlib tries to make easy things easy
and hard things possible. You can generate plots, histograms, power spectra, bar charts, error
charts, scatter plots, etc., with just a few lines of code.

27
5.5 INSTALL PYTHON STEP-BY-STEP IN WINDOWS AND MAC
Python a versatile programming language doesn’t come pre-installed on your computer
devices. Python was first released in the year 1991 and until today it is a very popular
high-level programming language. Its style philosophy emphasizes code readability with
its notable use of great whitespace.
The object-oriented approach and language construct provided by Python enables
programmers to write both clear and logical code for projects.
First, download the latest version of Python from the download page.
Second, double-click the installer file to launch the setup wizard.
In the setup window, you need to check the Add Python 3.8 to PATH and click Install Now
to begin the installation.

It’ll take a few minutes to complete the setup.

28
Once the setup completes, you’ll see the following window:

Verify the installation

29
To verify the installation, you open the Run window and type cmd and press Enter:

In the Command Prompt, type python command as follows:

If you see the output like the above screenshot, you’ve successfully installed Python on
your computer.

To exit the program, you type Ctrl-Z and press Enter.


If you see the following output from the Command Prompt after typing the python
command:

'python' is not recognized as an internal or external command,


operable program or batch file.

Likely, you didn’t check the Add Python 3.8 to PATH checkbox when you install Python.

30
Install Python on macOS
It’s recommended to install Python on macOS using an official installer. Here are the steps:
• First, download a Python release for macOS.
• Second, run the installer by double-clicking the installer file.
• Third, follow the instruction on the screen and click the Next button until the installer
completes.

Install Python on Linux


Before installing Python 3 on your Linux distribution, you check whether Python 3 was
already installed by running the following command from the terminal:

python3 --version
If you see a response with the version of Python, then your computer already has Python 3
installed. Otherwise, you can install Python 3 using a package management system.

For example, you can install Python 3.10 on Ubuntu using apt:
sudo apt install python3.10
To install the newer version, you replace 3.10 with that version. A
quick introduction to the Visual Studio Code
Visual Studio Code is a lightweight source code editor. The Visual Studio Code is often called
VS Code. The VS Code runs on your desktop. It’s available for Windows, macOS, and Linux.
VS Code comes with many features such as IntelliSense, code editing, and extensions that
allow you to edit Python source code effectively. The best part is that the VS Code is open-
source and free.
Besides the desktop version, that you can use directly in your web browser without installing
it.
This tutorial teaches you how to set up Visual Studio Code for a Python environment so
that you can edit, run, and debug Python code.

Setting up Visual Studio Code


To set up the VS Code, you follow these steps:
First, navigate to the website and download the VS code based on your platform
(Windows, macOS, or Linux).

Second, launch the setup wizard and follow the steps.


Once the installation completes, you can launch the VS code application:

31
EE

Install Python Extension


To make the VS Code works with Python, you need to install the Python extension from
the Visual Studio Marketplace.
The following picture illustrates the steps:

• First, click the Extensions tab.


• Second, type the python extension pack keyword on the search input.
• Third, click the Python extension pack. It’ll show detailed information on the right
pane.
• Finally, click the Install button to install the Python extension.
Now, you’re ready to develop the first program in Python.

32
Creating a new Python project
First, create a new folder called hello world.
Second, launch the VS code and open the hello world folder.
Third, create a new app.py file and enter the following code and save the file:
print('Hello, World!')
Code language: Python (python)
The print() is a built-in function that displays a message on the screen. In this example, it’ll
show the message 'Hello, Word!'.

What is a function
When you sum two numbers, that’s a function. And when you multiply two numbers, that’s
also a function.

Each function takes your inputs, applies some rules, and returns a result.
In the above example, the print() is a function. It accepts a string and shows it on the screen.
Python has many built-in functions like the print() function to use them out of the box in your
program.
In addition, Python allows you to define your functions, which you’ll learn how to do it later.

Executing the Python Hello World program


To execute the app.py file, you first launch the Command Prompt on Windows or Terminal
on macOS or Linux.

Then, navigate to the hello world folder.


After that, type the following command to execute the app.py file:
python app.py
Code language: Python (python)
If you use macOS or Linux, you use python3 command instead:
python3 app.py
Code language: CSS (css)
If everything is fine, you’ll see the following message on the screen:
Hello, World!
Code language: Python (python)
If you use VS Code, you can also launch the Terminal within the VS code by:
• Accessing the menu Terminal > New Terminal
• Or using the keyboard shortcut Ctrl+Shift+`.
Typically, the backtick key (`) locates under the Esc key on the keyboard.
33
Python IDLE
Python IDLE is the Python Integration Development Environment (IDE) that comes with
the Python distribution by default.
The Python IDLE is also known as an interactive interpreter. It has many features such as:
• Code editing with syntax highlighting
• Smart indenting
• And auto-completion
In short, the Python IDLE helps you experiment with Python quickly in a trial-and-error
manner.
The following shows you step by step how to launch the Python IDLE and use it to execute
the Python code:

First, launch the Python IDLE program:

A new Python Shell window will display as follows:

Now, you can enter the Python code after the cursor >>> and press Enter to execute it.
34
For example, you can type the code print('Hello, World!') and press Enter, you’ll see the message Hello,
World! immediately on the screen:

Python Syntax
Whitespace and indentation
If you’ve been working in other programming languages such as Java, C#, or C/C++, you know
that these languages use semicolons (;) to separate the statements.
However, Python uses whitespace and indentation to construct the code structure.
The following shows a snippet of Python code:

# define main function to print out something


def main():
i=1
max = 10
while (i < max):
print(i)
i= i+1
# call function main
main()
The meaning of the code isn’t important to you now. Please pay attention to the code structure
instead.

35
At the end of each line, you don’t see any semicolon to terminate the statement. And the code
uses indentation to format the code.
By using indentation and whitespace to organize the code, Python code gains the following
advantages:
• First, you’ll never miss the beginning or ending code of a block like in other programming
languages such as Java or C#.
• Second, the coding style is essentially uniform. If you have to maintain another
developer’s code, that code looks the same as yours.
• Third, the code is more readable and clearer in comparison with other programming
languages.

Comments
The comments are as important as the code because they describe why a piece of code was
written.
When the Python interpreter executes the code, it ignores the comments.
In Python, a single-line comment begins with a hash (#) symbol followed by the comment. For
example:
# This is a single line comment in Python

Continuation of statements
Python uses a newline character to separate statements. It places each statement on one line.
However, a long statement can span multiple lines by using the backslash (\) character.
The following example illustrates how to use the backslash (\) character to continue a statement
in the second line:

if (a == True) and (b == False) and \


(c == True):
print("Continuation of statements")
Identifiers
Identifiers are names that identify variables, functions, modules, classes, and other objects in
Python.
The name of an identifier needs to begin with a letter or underscore (_). The following
characters can be alphanumeric or underscore.
Python identifiers are case-sensitive. For example, the counter and Counter are different
identifiers.
In addition, you cannot use Python keywords for naming identifiers.
Keywords

36
Some words have special meanings in Python. They are called keywords.
The following shows the list of keywords in Python:

False class finally is return


None continue for lambda try
True def from nonlocal while
and del global not with

as elif if or yield
assert else import pass
break except in raise

Python is a growing and evolving language. So, its keywords will keep increasing and
changing.
Python provides a special module for listing its keywords called keyword.
To find the current keyword list, you use the following code:
import keyword

print(keyword.kwlist)

String literals

Python uses single quotes ('), double quotes ("), triple single quotes (''') and triple-double quotes
(""") to denote a string literal.
The string literal need to be surrounded with the same type of quotes. For example, if you use
a single quote to start a string literal, you need to use the same single quote to end it.

The following shows some examples of string literals:


s = 'This is a string'
print(s)
s = "Another string using double quotes"
print(s)
s = ''' string can span
multiple line '''
print(s)

37
6 IMPLEMENTATIONS

6.1 SOFTWARE ENVIRONMENT

Python is a high-level, general-purpose, interpreted programming language.


1) High-level
Python is a high-level programming language that makes it easy to learn. Python doesn’t
require you to understand the details of the computer in order to develop programs efficiently.
2) General-purpose
Python is a general-purpose language. It means that you can use Python in various domains
including:
• Web applications
• Big data applications
• Testing
• Automation
• Data science, machine learning, and AI
• Desktop software
• Mobile apps
The targeted language like SQL which can be used for querying data from relational databases.
3) Interpreted
Python is an interpreted language. To develop a Python program, you write Python code into a
file called source code.

6.1.1 PYTHON
Python increases your productivity. Python allows you to solve complex problems in less time
and fewer lines of code. It’s quick to make a prototype in Python.
Python becomes a solution in many areas across industries, from web applications to data
science and machine learning.
Python is quite easy to learn in comparison with other programming languages. Python syntax
is clear and beautiful.

Python has a large ecosystem that includes lots of libraries and frameworks.
Python is cross-platform. Python programs can run on Windows, Linux, and macOS.
Python has a huge community. Whenever you get stuck, you can get help from an active
community.
Python developers are in high demand.

38
6.1.2 SAMPLE CODE

from django. shortcuts import render,

redirect from main app .models import

from Django . contrib import

messages from user app. models

import *

from admina pp.models import

* import pandas as pd

# Create your views here.

def user_index(request):

user_id = request.session['user_id']

user = UserModel.objects.get(user_id=user_id)

if request.method == 'POST':

source= request.POST.get("source")

to=request.POST.get('to')

airline= request.POST.get("airline")

dept_time = request.POST.get("dept_time")

stops=request.POST.get('stops')

arr_time=request.POST.get('arr_time')

print(source,to,airline,dept_time,stops,arr_time)

obj = PredModel.objects.create

(source=source,to=to,airline=airline,dept_time=dept_time,stops=stops,arr_time=arr_time)

print(obj,'kkkkkkkkkkkkkkkkkk')

return redirect("Predict",id=obj.id)

39
return render(request,'user/user-index.html')

def user_myprofile(request):

user_id = request.session['user_id']

user = UserModel.objects.get(user_id=user_id)

if request.method == 'POST':

username = request.POST.get("user_username")

userppnum=request.POST.get('user_passportnumber')

email = request.POST.get("user_email")

contact = request.POST.get("user_contact")

password = request.POST.get("user_password")

address=request.POST.get('user_address')

print(username,userppnum,email,contact,password,address)

if len(request.FILES) != 0:

image = request.FILES["user_image"]

user.user_passportnumber=userppnum

user.user_username = username

user.user_contact = contact

user.user_email=email

user.user_password = password

user.user_image = image

user.user_address=address

user.save()

messages.success(request,'Updated Successfully')

else:
40
user.user_username = username

user.user_passportnumber=userppnum

user.user_contact = contact

user.user_contact = contact

user.user_email=email

# user.user_image=image

user.user_password = password

user.user_address=address

user.save()

messages.success(request,'Updated Successfully')

return redirect('user_myprofile')

return render(request,'user/user-myprofile.html',{'user':user})

def Predict(request,id):

data = Dataset.objects.all().first()

user_data = PredModel.objects.get(pk=id)

if(user_data.source == 'Chennai'):

Chennai=1

Delhi=0

Kolkata=0

Mumbai=0

Cochin=0

Hyderabad=0

elif(user_data.source == 'Delhi'):

Chennai=0

41
Delhi=1

Kolkata=0

Mumbai=0

Cochin=0

Hyderabad=0

elif(user_data.source == 'Kolkata'):

Chennai=0

Delhi=0

Kolkata=1

Mumbai=0

Cochin=0

Hyderabad=0

elif(user_data.source == 'Mumbai'):

Chennai=0

Delhi=0

Kolkata=0

Mumbai=1

Cochin=0

Hyderabad=0

elif(user_data.source == 'Cochin'):

Chennai=0

Delhi=0

Kolkata=0

Mumbai=0

42
Cochin=1

Hyderabad=0

elif(user_data.source == 'Hyderabad'):

Chennai=0

Delhi=0

Kolkata=0

Mumbai=0

Cochin=0

Hyderabad=1

else:

Chennai=0

Delhi=0

Kolkata=0

Mumbai=0

Cochin=0

Hyderabad=0

if(user_data.to == 'Chennai'):

Chennai=1

Delhi=0

Kolkata=0

Mumbai=0

Cochin=0

Hyderabad=0

elif(user_data.to == 'Delhi'):

43
Chennai=0

Delhi=1

Kolkata=0

Mumbai=0

Cochin=0

Hyderabad=0

elif(user_data.to == 'Kolkata'):

Chennai=0

Delhi=0

Kolkata=1

Mumbai=0

Cochin=0

Hyderabad=0

elif(user_data.to == 'Mumbai'):

Chennai=0

Delhi=0

Kolkata=0

Mumbai=1

Cochin=0

Hyderabad=0

elif(user_data.to == 'Cochin'):

Chennai=0

Delhi=0

Kolkata=0

44
Mumbai=0

Cochin=1

Hyderabad=0

elif(user_data.to == 'Hyderabad'):

Chennai=0

Delhi=0

Kolkata=0

Mumbai=0

Cochin=0

Hyderabad=1

else:

Chennai=0

Delhi=0

Kolkata=0

Mumbai=0

Cochin=0

Hyderabad=0

if(user_data.airline == 'Air_India'):

Air_India=1

GoAir=0

IndiGo=0

Jet_Airways=0

Jet_Airways_Business=0

Multiple_carriers=0

45
Multiple_carriers_Premium_economy=0

SpiceJet=0

Trujet=0

Vistara=0

Vistara_Premium_economy=0

elif(user_data.airline == 'GoAir'):

Air_India=0

GoAir=1

IndiGo=0

Jet_Airways=0

Jet_Airways_Business=0

Multiple_carriers=0

Multiple_carriers_Premium_economy=0

SpiceJet=0

Trujet=0

Vistara=0

Vistara_Premium_economy=0

elif(user_data.airline =='IndiGo'):

Air_India=0

GoAir=0

IndiGo=1

Jet_Airways=0

Jet_Airways_Business=0

Multiple_carriers=0

46
Vistara=0

Vistara_Premium_economy=0

elif(user_data.airline == 'Vistara'):

Air_India=0

GoAir=0

IndiGo=0

Jet_Airways=0

Jet_Airways_Business=0

Multiple_carriers=0

Multiple_carriers_Premium_economy=0

SpiceJet=0

Trujet=0

Vistara=1

Vistara_Premium_economy=0

elif(user_data.airline == 'Vistara_Premium_economy'):

Air_India=0

GoAir=0

IndiGo=0

Jet_Airways=0

Jet_Airways_Business=0

Multiple_carriers=0

Multiple_carriers_Premium_economy=0

SpiceJet=0

Trujet=0

47
Vistara=0

Vistara_Premium_economy=1

else:

Air_India=0

GoAir=0

Jet_Airways=0

Jet_Airways_Business=0

Multiple_carriers=0

Multiple_carriers_Premium_economy=0

SpiceJet=0

Trujet=0

Vistara=0

Vistara_Premium_economy=0

journey_day=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-

%dT%H:%M").day)

journey_month=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-

%dT%H:%M").month)

Dep_Time_hour=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-

%dT%H:%M").hour)

Dep_Time_min=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-

%dT%H:%M").minute)

Arrival_Time_hour=int(pd.to_datetime(user_data.arr_time,format="%Y-%m-

%dT%H:%M").hour)

Arrival_Time_min=int(pd.to_datetime(user_data.arr_time,format="%Y-%m-

48
%dT%H:%M").minute)

dur_hour=abs(Arrival_Time_hour-Dep_Time_hour)

dur_min=abs(Arrival_Time_min-Dep_Time_min)

lp=[journey_day,Chennai,Hyderabad,Cochin,Mumbai,Air_India,Jet_Airways,Jet_Airways

_Business,Multiple_carriers,Multiple_carriers_Premium_economy,IndiGo,Vistara_Premiu

m_economy,Vistara,

Trujet,SpiceJet,dur_hour,dur_min,int(user_data.stops),journey_month,Dep_Time_hour,De

p_Time_min,Arrival_Time_hour,Arrival_Time_min,Delhi,GoAir,Kolkata]

# output=lp

print(lp,'lllllllllllllllllllllll')

# print(Predict,'llllllllllll')

# from sklearn.ensemble import RandomForestRegressor

# reg_rf=RandomForestRegressor()

# data = Dataset.objects.get(data_id = data)

# # y=reg_rf.fit(output)

# y_pred=reg_rf.predict(lp)

# print(y_pred)

# output1=round(y_pred,2)

# # print(lp)

# id = request.session['id']

# user = PredictModel.objects.get(pk=id)

test = TestingModel.objects.create(Total_Stops=lp[17],Air_India=lp[5]

,Jet_Airways=lp[6],journey_day =lp[0],

Chennai=lp[1],Hyderabad=lp[2],Cochin=lp[3],
49
Mumbai=lp[4],Jet_Airways_Business=lp[7],Multiple_carriers=lp[8],

Multiple_carriers_Premium_economy=lp[9],

IndiGo=lp[10],Vistara_Premium_economy=lp[11],Vistara=lp[12],Trujet=lp[13]

SpiceJet=lp[14],dur_hour=lp[15],dur_min=lp[16],journey_month=lp[18],Dep_Time_hour

=lp[19],Dep_Time_min=lp[20],

Arrival_Time_hour=lp[21],Arrival_Time_min=lp[22],

Delhi=lp[23],GoAir=lp[24],Kolkata=lp[25])

print(test,'kkkkkkkkkkkkkkkkkk')

print(test.id,'jjjjjjjj')

return redirect('button',id=test.id)

# file = str(data.data_set)

# df = pd.read_csv('./media/'+ file)

# from sklearn.preprocessing import LabelEncoder

# le = LabelEncoder()

# for col in lp:

# if type(lp[col]) == 'str':

# lp[col] = le.transform(lp[col])

# return render(request,'user/user-index.html')

50
7. SYSTEM TESTING

7.1 INTRODUCTION TO TESTNG

Types of Software Testing: Different Testing Types with Details


We, as testers, are aware of the various types of Software Testing like Functional Testing, Non-
Functional Testing, Automation Testing, Agile Testing, and their sub-types, etc.
Each type of testing has its own features, advantages, and disadvantages as well. However, in
this tutorial, we have covered mostly each and every type of software testing which we usually
use in our day-to-day testing life.

Different Types of Software Testing

7.2 Testing Strategies:-

There are four main types of functional testing.

#1) Unit Testing


Unit testing is a type of software testing which is done on an individual unit or
component to test its corrections. Typically, Unit testing is done by the developer
at the application development phase. Each unit in unit testing can be viewed as a
method, function, procedure, or object. Developers often use test automation

51
For example, there is a simple calculator application. The developer can write the
unit test to check if the user can enter two numbers and get the correct sum for
addition functionality.

a) White Box Testing


White box testing is a test technique in which the internal structure or code of an
application is visible and accessible to the tester. In this technique, it is easy to find
loopholes in the design of an application or fault in business logic. Statement
coverage and decision coverage/branch coverage are examples of white box test
techniques.

b) Gorilla Testing
Gorilla testing is a test technique in which the tester and/or developer test the
module of the application thoroughly in all aspects. Gorilla testing is done to check
how robust your application is.

For example, the tester is testing the pet insurance company’s website, which
provides the service of buying an insurance policy, tag for the pet, Lifetime
membership. The tester can focus on any one module, let’s say, the insurance
policy module, and test it thoroughly with positive and negative test scenarios.

#2) Integration Testing


Integration testing is a type of software testing where two or more modules of an
application are logically grouped together and tested as a whole. The focus of this
type of testing is to find the defect on interface, communication, and data flow
among modules. Top-down or Bottom-up approach is used while integrating
modules into the whole system.

52
This type of testing is done on integrating modules of a system or between systems.
For example, a user is buying a flight ticket from any airline website. Users can
see flight details and payment information while buying a ticket, but flight details
and payment processing are two different systems. Integration testing should be
done while integrating of airline website and payment processing system.

a) Gray box testing


As the name suggests, gray box testing is a combination of white-box testing and
black-box testing. Testers have partial knowledge of the internal structure or code
of an application.

#3) System Testing


System testing is types of testing where tester evaluates the whole system against
the specified requirements.

a) End to End Testing


It involves testing a complete application environment in a situation that mimics
real-world use, such as interacting with a database, using network
communications, or interacting with other hardware, applications, or systems if
appropriate.

For example, a tester is testing a pet insurance website. End to End testing involves
testing of buying an insurance policy, LPM, tag, adding another pet, updating credit
card information on users’ accounts, updating user address information, receiving
order confirmation emails and policy documents.

b) Black Box Testing


Blackbox testing is a software testing technique in which testing is performed
without knowing the internal structure, design, or code of a system under test.
Testers should focus only on the input and output of test objects.

Detailed information about the advantages, disadvantages, and types of Black.

53
c) Smoke Testing
Smoke testing is performed to verify that basic and critical functionality of the
system under test is working fine at a very high level.

Whenever a new build is provided by the development team, then the Software
Testing team validates the build and ensures that no major issue exists. The testing
team will ensure that the build is stable, and a detailed level of testing will be
carried out further.

#4) Acceptance Testing


Acceptance testing is a type of testing where client/business/customer test the
software with real time business scenarios.

The client accepts the software only when all the features and functionalities work
as expected. This is the last phase of testing, after which the software goes into
production. This is also called User Acceptance Testing (UAT).

a) Alpha Testing
Alpha testing is a type of acceptance testing performed by the team in an
organization to find as many defects as possible before releasing software to
customers.

For example, the pet insurance website is under UAT. UAT team will run real-
time scenarios like buying an insurance policy, buying annual membership,
changing the address, ownership transfer of the pet in a same way the user uses the
real website. The team can use test credit card information to process payment-
related scenarios.

b) Beta Testing
Beta Testing is a type of software testing which is carried out by the
clients/customers. It is performed in the Real Environment before releasing the
product to the market for the actual end-users.

Beta Testing is carried out to ensure that there are no major failures in the software
54
or product, and it satisfies the business requirements from an end-user perspective.
Beta Testing is successful when the customer accepts the software.

Usually, this testing is typically done by the end-users. This is the final testing done
before releasing the application for commercial purposes. Usually, the Beta version
of the software or product released is limited to a certain number of users in a
specific area.

So, the end-user uses the software and shares the feedback with the company. The
company then takes necessary action before releasing the software worldwide.

c) Operational acceptance testing (OAT)


Operational acceptance testing of the system is performed by operations or system
administration staff in the production environment. The purpose of operational
acceptance testing is to make sure that the system administrators can keep the
system working properly for the users in a real-time environment.

The focus of the OAT is on the following points:

• Testing of backup and restore.

• Installing, uninstalling, upgrading software.

• The recovery process in case of natural disaster.

• User management.

• Maintenance of the software.

Non-Functional Testing

There are four main types of functional testing.

#1) Security Testing


It is a type of testing performed by a special team. Any hacking method can
penetrate the system.

Security Testing is done to check how the software, application, or website is


secure from internal and/or external threats. This testing includes how much

55
software is secure from malicious programs, viruses and how secure & strong the
authorization and authentication processes are.It also checks how software behaves
for any hacker’s attack & malicious programs and how software.

a) Penetration Testing
Penetration Testing or Pen testing is the type of security testing performed as an
authorized cyberattack on the system to find out the weak points of the system in
terms of security.

Pen testing is performed by outside contractors, generally known as ethical


hackers. That is why it is also known as ethical hacking. Contractors perform
different operations like SQL injection, URL manipulation, Privilege Elevation,
session expiry, and provide reports to the organization.

Notes: Do not perform the Pen testing on your laptop/computer. Always take
written permission to do pen tests.

#2) Performance Testing


Performance testing is testing of an application’s stability and response time by
applying load.

The word stability means the ability of the application to withstand in the presence
of load. Response time is how quickly an application is available to users.
Performance testing is done with the help of tools. Loader.IO, JMeter,
LoadRunner, etc. are good tools available in the market.

a) Load testing
Load testing is testing of an application’s stability and response time by applying
load, which is equal to or less than the designed number of users for an application.

b) Stress Testing
Stress testing is testing an application’s stability and response time by applying
load, which is more than the designed number of users for an application.

56
8. SCREENSHOTS
a. Home Page:-

Fig-8.1
b. Admin :-

Fig-8.2

57
c. Register Page

Fig-8.3

58
d. User Page:-

Fig-8.4

59
e. Contact Information:-

Fig-8.5

60
k. About page:-

Fig-8.11

61
l. Predicted Page:-

Fig-8.12

Fig-8.13

62
9. CONCLUSION
To estimate the dynamic fare of flights, three different datasets from three different sources
have been used. Many insights have been found while visualizing the dataset. Seven different
machine learning algorithms have been used to build the model. Only limited information
can be obtained because data is acquired from websites that sell flight tickets. The correctness
of the model is determined by the evaluation metrics table I values obtained from the
procedure. The Random Forest Regressor outperformed the other algorithms with good
accuracy. So, Random Forest Regressor works fine for predicting the airline fare price. If
more data, such as actual seat availability, could be obtained in the future, the anticipated
results would be more accurate. Prediction-based services are currently employed in a variety
of sectors, including stock price predictor programs used by stock brokers and services like
Zestimate, which provides an estimate of housing values. As a result, in the aviation business,
a service like this is required to assist clients in reserving tickets. There have been numerous
studies conducted on this topic using various methodologies, and additional research is
required to increase the accuracy of prediction utilizing various algorithms. To acquire more
reliable findings, more accurate data with greater features might be employed.

63
10 REFERENCES
 T. Janssen, "A linear quantile mized regression model for prediction of airline ticket
prices," in A Treatise on Electricity and Magnetism 3rd ed., vol. 2, 2014, pp. 68- 73.
 Yiwei Chen and F. Vivek Farias, " Robust Dynamic Pricing With Strategic Customers,"
Mathematics of Operations Research 43, pp. 1119-1142, 2018.
 Juhar Ahmed Abdella, Nazar Zaki, Khaled Shuaib and Fahad Khan, "Airline ticket price
and demand prediction: A survey.," Journal od King Saud University - Computer and

Information Sciences, vol. 33, no. 4, pp. 375-391, 2021.


 Lantseva, Anastasia, Mukhina, Ksenia, Nikishova, Anna, Ivanov, Sergey, Knyazkov and
Konstantin, "Data-driven Modeling of Airlines Pricing," Procedia Computer Science, vol.
66, pp. 267-276, 2015.
 K. Tziridis, T. Kalampokas, G. A. Papakostas and K. I. Diamantaras, "Airfare prices
predictiono using machine learning techniques," in 25th European Signal Processing
Conference (EUSIPCO). Kos 2017, 2017.
 A. Boruah, K. Baruah, B. Das, M. Das and N. Gohain, "A Bayesian Approach for Flight Fare
Prediction Based on Kalman Filter," in Progress in Advanced Computing and Intelligent
Engineering, Singapore, 2019, pp. 191-203.
 William Groves and Maria Gini, "A regression model for predicting optimal purchase
timing for airline tickets.," Technical report, University of Minnesota, Minneapolis, USA,
Report number 11-025, 2011.
 D. Tanouz, R. R. Subramanian, D. Eswar, G. V. P. Reddy, A. R. Kumar and C. V. N. M.
Praneeth, "Credit Card Fraud Detection Using Machine Learning," in 5th International
Conference on Intelligent Computing and Control Systems (ICICCS), 2021.
 R. R. Subramanian, N. Akshith, G. N. Murthy, M. Vikas, S. Amara and K. Balaji, "A Survey
on Sentiment Analysis," in 11th International Conference on Cloud Computing, Data
Science & Engineering (Confluence), 2021, 2021.
 S. Amara and R. R. Subramanian, " Collaborating personalized recommender system and
content-based recommender system using Text Corpus," in 6th International Conference on
Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2020.
 Andi and Hari Kirshnan, "An Accurate Bitcoin Price Prediction using logistic regression
with LSTM Machine Learning model," Journal of Soft Computing Paradigm 3, pp. 205-
217, 2021.

64
 Manoharan and J. Samuel, "Study of Variants of Extreme Learning Machine (ELM) Brands
and its Performance Measure on Classification Algorithm," Journal of Soft Computing
Paradigm (JSCP) 3, pp. 83- 95, 2021.
 V. Suma and Shavige Malleshwara Hills, "Data Mining based Prediction of Demand in
Indian Market for Refurbished Electronics," Journal of Soft Computing Paradigm (JSCP)
2, pp. 101-110, 2020.
 W. K. Michael and A. G. Thomas, "A Framework for the Evaluation of Statistical
Prediction Models," CHEST, vol. 158, no. 1, pp. S29-S38, 2020.
 L. Yuling and L. Zhichao, "Design and implementation of ticket price forecasting
system," in AIP Conference Proceedings, 2018.
 Elizaveta Stavinova, Petr Chunaev and Klavdiya Bochenina, "Forecasting railway ticket
dynamic price with Google Trends open data," Procedia Computer Science, vol. 193, pp.
333-342, 2021.
 S. Deepa, A. Alli, Sheetac and S. Gokila, "Machine learning regression model for
material synthesis prices prediction in agriculture," in materialstoday, 2021.
 S. Matthew and Lewis, "Identifying airline price discrimination and the effect of
competition," International Journal of Industrial Organization, vol. 78, 2021.
 Ismail Koc and Emel Arslan, "Dynamic ticket pricing of airlines using variant batch size
interpretable multivariable long short-term memory," Expert Systems with Applications,
vol. 175, 2021.
 Rian Mehta, Stephen Rice, John Deaton and Scott R. Winter, "Creating a prediction model
of passenger preference between low cost and legacy airlines," Transportation

Research Interdisciplinary Perspectives, vol. 3, 2019.

65

You might also like