B.Tech Project: Airline Fare Prediction
B.Tech Project: Airline Fare Prediction
MACHINE LERANING
An industry oriented Major Project Report Submitted to
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted By
G.RAGHAVENDER
Assistant professor, CSE department.
Jan 2025
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BHARAT INSTITUTE OF ENGINEERING AND TECHNOLOGY
Accredited by NAAC, accredited by NBA (UG Programmes: CSE, ECE) Approved
by AICTE, Affiliated to JNTUH Hyderabad
Hyderabad-501 510, Telangana.
Certificate
This is to certify that the project work entitled “AIRLINE FARE PREDICITION
USING MACHINE LEARNING” is the Bonafide work done
By
We avail this opportunity to express our deep sense of gratitude and hearty thanks
to Shri CH. Venugopal Reddy, Secretary & Correspondent of BIET, for providing
congenial atmosphere and encouragement.
We would like to thank our Academic Incharge Dr. Deepak Kachave, Associate
Professor of CSE,
for their expert guidance and encouragement at various levels of Project.
We are thankful to Project Coordinator Dr. Rama Prakasha Reddy Ch, Assistant
Professor, Computer Science and Engineering for his support and cooperation
throughout the process of this project.
We place highest regards to our Parent, our Friends and Well-wishers who helped a lot in
making the report of this project.
i
DECLARATION
We hereby declare that this Industry oriented Mini Project report is titled “AIRLINE FARE
ii
ABSTRACT
This paper discusses the issue of airfare. A set of characteristics defining a typical flight is chosen
for this purpose, with the assumption that these characteristics influence the price of an airline
ticket. Flight ticket prices fluctuate depending on different parameters such as flight schedule,
destination, and duration, a variety of occasions such as vacations or the holiday season. As a
result, having a basic understanding of flight rates before booking a vacation will undoubtedly
save many individuals money and time. Analyzing 3 datasets to get insights about the airline fare
and the features of the three datasets are applied to the seven different machine learning (ML)
models which are used to predict airline ticket prices, and their performance is compared. The goal
is to investigate the factors that determine the cost of a flight. The data can then be used to create
a system that predicts flight prices.
iii
CONTENTS
2. LITERATURE SURVEY………………………………………………………….02
3. MODULES…………………………………………………………………………05
2. SYSTEM ANALYSIS
1. EXISTING SYSTEM & ITS DISADVANTAGES ............................... ……….. 07
3. SYSTEM REQUIREMENTS…………………………………………………...…09
3. SYSTEM STUDY
1. FEASIBILITY STUDY…………………………………………………………..10
4. SYSTEM DESIGN
1. ARCHITECTURE…………………………………………………………………...13
2. UML DIAGRAMS………………………………………………………………..13
1. USECASE DIAGRAM…………………………………………………...14
2. CLASS DIAGRAM…………………………………………………….....15
3. SEQUENCE DIAGRAM…………………………………………….....16
4. ACTIVITY DIAGRAM………………………………………………...17
5. DEPLOYMENT DIAGRA……………………………………………..18
5. TECHNOLOGIES USED
1. WHAT IS PYTHON……………………………………………………………..20
1. ADVANTAGRS & DISADVANTAGES OF PYTHON……………….…20
iv
5.1.2 HISTORY ................................................................................................. 21
2. WHAT IS MACHINE LEARNING ?......................................................................... 22
1. CATEGORIES OF ML.................................................................................. 22
3. CHALLENGES IN ML ................................................................................ 23
4. APPLICATIONS............................................................................................ 24
6. IMPLEMENTATION
1. SOFTWARE ENVIRONMENT ................................................................................... 38
1. PYTHON ................................................................................................... 38
7. SYSTEM TESTING
1. INTRODUCTION TO TESTING ............................................................................. 51
2. TESTING STRATEGIES ............................................................................................ 51
8. SCREENSHOTS ...................................................................................... 57
9. CONCLUSION .......................................................................................... 63
10. REFERENCES......................................................................................... 64
v
LIST OF FIGURES
vi
LIST OF SCREENSHOTS
8.4 59
User Page
8.5 60
Contact Information
8.6 61
About Page
8.7 62
Uploading Data
8.8 62
Data Formed
8.9 62
Predicted page
8.10 62
About Page Success
About page 62
8.11
Predicted page 63
8.12
Predicted page 63
8.13
vii
1. INTRODUCTION
1.1 Introduction: In today's world, airlines attempt to control flight ticket costs in order to
maximize profits. Most people who fly regularly know the best times to buy cheap tickets.
However, many customers who are not good at booking tickets fall into the discount trap set
by the company, causing them to spend their money. The main goal of airline companies is
to make a profit, while the customer is looking for the best purchase. Customers frequently
aim to purchase tickets far in advance of the departure date in order to prevent price increases
as the departure date approaches. Due to the great complexity of the fare models used by
airlines, it is very difficult for a customer to buy an airline ticket at a very low price because
the price is constantly fluctuating. Airlines can lower their ticket prices when they need to
create a market and when tickets are harder to obtain. These tactics consider a number of
financial, marketing, commercial, and social factors that are all linked to ultimate flight
pricing. They might be able to get the most profit possible. As a result, costs may be
influenced by various factors. The price model used by airlines is so complex that prices
fluctuate constantly, making it very difficult for customers to buy tickets at very low prices.
Surveys of customers and airlines have grown steadily over the last two decades. From a
customer point of view, it is an important question to establish a low price or a good time to
buy a ticket. In this paper, we will be using the collected data from three different sources to
build the models using Machine Learning algorithms. Customers can save millions of rupees
by using the proposed method to get the information they need to order tickets at the proper
moment.
1
1.2 Literature Survey
2
TITLE: "Data-driven Modeling of Airlines Pricing"
ABSTRACT: The popularity of travelling by airplanes is constantly growing. Much of
existing research describe the global flight market. At the same time, Russian air market is
characterized by its peculiarities that have to be identified to build proper models of airfare. The
objective of this study is to analyze Russian air transportation market and compare the behavior
of prices on local and global flights. Using these data, collected from two independent ticket
price information aggregators (Avia Sales and Sabre) for the period of spring-summer 2015, an
empirical data-driven model was built for air prices prediction for different flight directions. We
found that the form of price dependency on purchase earliness differs dramatically between local
and international flights in two largest Russian cities (Moscow and Saint-Petersburg).
3
TITLE: "A regression model for predicting optimal purchase timing for
airline tickets,"
ABSTRACT: Optimal timing for airline ticket purchasing from the consumer’s perspective
is challenging principally because buyers have insufficient information for reasoning about
future price movements. This paper presents a model for computing expected future prices and
reasoning about the risk of price changes The proposed model is used to predict the future
expected minimum price of all available flights on specific routes and dates based on a corpus of
historical price quotes. Also, we apply our model to predict prices of flights with specific
desirable properties such as flights from a specific airline, non-stop only flights, or multi-
segment flights. By comparing models with different target properties, buyers can determine the
likely cost of their preferences. We present the expected costs of various preferences for two
high-volume routes . Performance of the prediction models presented is achieved by including
instances of time-delayed features , by imposing a class hierarchy among the raw features based
on feature similarity, and by pruning the classes of features used in prediction based on in-situ
performance. Our results show that purchase policy guidance using these models can lower the
average cost of purchases in the 2 month period prior to a desired departure. The proposed
method compares favorably with a deployed commercial web site providing similar purchase
policy recommendations.
4
1.3 MODULES
Machine learning introduces several techniques for predicting aircraft ticket pricing.
Algorithms that we have used include:
• Linear Regression.
• K-Neighbor Regression.
• Decision Tree.
• Random Forest.
These models have been implemented using the sci-kit learn python library. In order to verify
the performance of these models, parameters such as R-square, MAE, MSE, and RMSE are
used.
KNN Regression
A k-neighbor regression analysis gives the average of its k nearest neighbors. Like SVM, this
is a non-parametric approach. The results are obtained using only a few values to get the
best value. KNN is a supervised classification technique used as a regressor. It adds a new
data point to the class. Since no assumptions are made, it is not parametric. It calculates the
distance between each training example and a new data set. The model selects K elements
from the data set that are near the new data point. The distance is calculated using the
Euclidean distance, the Manhattan distance or the Hamilton distance.
Linear Regression
Linear regression is a supervised learning (ML) technique. It performs regression tasks. It
is a linear model, assuming that there is a linear relationship between the input variable
(x) and a single output variable (y). Y can be calculated by linear inclusion of input variables,
especially (x). Because our data set contains many independent features that prices may
depend on, we will use multiple linear regression (MLR) to estimate the relationship between
two or more independent variables and a dependent variable.
5
Decision Tree Regression
A decision tree is a tree structure used to build regression or classification models. In
addition, a decision tree is generated for each data set that is reduced in size. This generates
solutions and leaf nodes. The decision tree selects independent variables from the dataset
as decision nodes for making a decision. When test data is entered into the model, the result
is determined by looking at which segment the data point belongs to. And the decision tree
will output the average of all data points in the subsection of the section that the data point
belongs to.
There are two approaches to calculating the margin, or the maximum distance between
classes, which are hard-margin classification and soft-margin classification.
6
2. SYSTEM ANALYSIS
They might be able to get the most profit possible. As a result, costs may be influenced by various
factors. The price model used by airlines is so complex that prices fluctuate constantly, making it
very difficult for customers to buy tickets at very low prices. Surveys of customers and airlines
have grown steadily over the last two decades.
Regression machine learning models for airline ticket price prediction have been developed by
[4]. Data from 1814 flights on a single international route was used in the development of this
model, including departure and arrival times, bag allowance, and the number of free baggage
allowances per flight. They used eight different regression machine learning models, which are
Extreme Learning Machine (ELM), Multilayer Perceptron (MLP), Generalized Regression Neural
Network, Random Forest Regression Tree, Regression Tree, Linear Regression (LR), Regression
SVM (Polynomial and Linear), Bagging Regression Tree. The model produced the following
performance results: The Bagging Regression is accurate to 87.42% and 85.91% accuracy for
Random Forest Regression Tree.
DISADVANTAGES:
7
2.2 Proposed System & it’s Advantages:
The proposed system aims to address the issue of airfare by analysis a set of characteristics that
define a typical flight, assuming that these features significantly influence the price of an airline
ticket. The fluctuation in flight ticket prices is attributed to various parameters, including flight
schedule, destination, duration, and occasions such as vacations or holiday seasons.
Data Collection: Gather a dataset comprising historical flight information, including departure
and arrival locations, dates, times, airlines, ticket prices, and other relevant features. This dataset
should cover a wide range of routes, airlines, and time periods to capture diverse patterns. This
involves cleaning the data, handling missing values, encoding categorical variables, and possibly
feature scaling or normalization. Create new features or transform existing ones that might better
represent the relationships between the input variables and the target variable (fare). For example,
you might extract features such as day of the week, time of the day, distance between departure
and arrival locations, and any seasonal trends.
Choose appropriate machine learning algorithms for regression tasks. Common choices include
linear regression, decision trees, random forests, gradient boosting methods (like XG Boost or
Light GBM), and neural networks. Split the dataset into training and testing sets. Train the
selected model(s) on the training data.
ADVANTAGES:
Improved Accuracy: Machine learning models can analyze vast amounts of historical data and
complex patterns to make more accurate fare predictions compared to traditional methods. This can
help both airlines and travelers make better-informed decisions regarding ticket prices.
Dynamic Pricing: Airlines can leverage machine learning models to implement dynamic pricing
strategies, adjusting fares in real-time based on factors such as demand, time until departure,
competitor pricing, and seat availability. This flexibility can maximize revenue for airlines while
offering competitive prices to travelers.
Personalized Pricing: Machine learning algorithms can analyze individual traveler preferences,
booking history, and browsing behavior to offer personalized fare recommendations. This can
enhance customer satisfaction and increase loyalty by providing tailored pricing options.
8
2.3 SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
SOFTWARE REQUIREMENTS:
9
3. SYSTEM STUDY
2. OPERATIONAL FEASIBILITY
3. ECONOMIC FEASIBILITY
INTRODUCTION
A feasibility study assesses the operational, technical and economic merits of the proposed project.
The feasibility study is intended to be a preliminary review of the facts to see if it is worthy of
proceeding to the analysis phase. From the systems analyst perspective, the feasibility analysis
is the primary tool for recommending whether to proceed to the next phase or to discontinue the
project.
TECHNICAL FEASIBILITY
A large part of determining resources has to do with assessing technical feasibility. It considers
the technical requirements of the proposed project. The technical requirements are then
compared to the technical capability of the organization. The systems project is considered
technically feasible if the internal technical capability is sufficient to support the project
requirements. 10
The analyst must find out whether current technical resources can be upgraded or added to in a
manner that fulfils the request under consideration. This is where the expertise of system analysts
is beneficial, since using their own experience and their contact with vendors they will be able to
answer the question of technical feasibility.
The essential questions that help in testing the operational feasibility of a system include the
following:
• Is it a practical proposition?
OPERATIONAL FEASIBILITY
Operational feasibility is dependent on human resources available for the project and involves
projecting whether the system will be used if it is developed and implemented.
Operational feasibility is a measure of how well a proposed system solves the problems, and takes
advantage of the opportunities identified during scope definition and how it satisfies the
requirements identified in the requirements analysis phase of system development.
The essential questions that help in testing the operational feasibility of a system include the
following:
• Does current mode of operation provide adequate throughput and response time?
• Does current mode provide end users and managers with timely, pertinent, accurate and
useful formatted information?
• Does current mode of operation offer effective controls to protect against fraud and to
guarantee accuracy and security of data and information?
11
• Does current mode of operation make maximum use of available resources, including
people, time, and flow of forms?
• Are the current work practices and procedures adequate to support the new system?
ECONOMIC FEASIBILITY
Economic analysis could also be referred to as cost/benefit analysis. It is the most frequently used
method for evaluating the effectiveness of a new system. In economic analysis the procedure is to
determine the benefits and savings that are expected from a candidate system and compare them
with costs. If benefits outweigh costs, then the decision is made to design and implement the
system. An entrepreneur must accurately weigh the cost versus benefits before taking an action.
• What are the savings that will result from the system?
12
4. SYSTEM DESIGN
4.1 DATA FLOW DIAGRAM:
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing carried out
on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by the
process, an external entity that interacts with the system and the information flows in the system.
UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling
language in the field of object-oriented software engineering. The standard is managed, and
was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object oriented
computer software. In its current form UML is comprised of two major components: a Meta-
model and a notation. In the future, some form of method or process may also be added to; or
associated with, UML.
13
4.2.1 USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis. Its purpose is to present a graphical overview of
the functionality provided by a system in terms of actors, their goals (represented as use cases),
and any dependencies between those use cases. The main purpose of a use case diagram is to show
what system functions are performed for which actor. Roles of the actors in the system can be
depicted.
Fig-4.2.1
14
4.2.2 CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of
static structure diagram that describes the structure of a system by showing the system's classes,
their attributes, operations (or methods), and the relationships among the classes. It explains which
class contains information.
Fig-4.2.2
15
Fig-4.2.3
16
4.2.4 ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.
Fig-4.2.4
17
4.2.5 DEPLOYMENT DIAGRAM:
Deployment Diagram is a type of diagram that specifies the physical hardware on which the
software system will execute. It also determines how the software is deployed on the underlying
hardware. It maps software pieces of a system to the device that are going to execute it.
The deployment diagram maps the software architecture created in design to the physical system
architecture that executes it. In distributed systems, it models the distribution of the software
across the physical nodes.
The software systems are manifested using various artifacts, and then they are mapped to the
execution environment that is going to execute the software such as nodes. Many nodes are
involved in the deployment diagram; hence, the relation between them is represented using
communication paths.
Fig-4.2.5
18
There are two forms of a deployment diagram.
• Descriptor form
• Instance form
• It contains node instance, the relationship between node instances and artifact instance.
Deployment diagrams are used with the sole purpose of describing how software is deployed into
the hardware system. It visualizes how software interacts with the hardware to execute the
complete functionality. It is used to describe software to hardware interaction and vice versa.
19
5. TECHNOLOGIES
1. WHAT IS PYTHION
Python is currently the most widely used multi-purpose, high-level programming language.
Programmers have to type relatively less and indentation requirement of the language,
makes them readable all the time.
Python language is being used by almost all tech-giant companies like – Google, Amazon,
Facebook, Instagram, Dropbox, Uber… etc.
As we have seen earlier, Python can be extended to other languages. You can write some of your
code in languages like C++ or C. This comes in handy, especially in projects.
2. Embeddable
Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities to
our code in the other language.
3. Improved Productivity
The language’s need to be in simplicity and extensive libraries render programmers more
productive than languages like Java and C++ do. Also, the fact that you need to write less and get
more things done.
4. Extensible
As we have seen earlier, Python can be extended to other languages. You can write some of your
code in languages like C++ or C. This comes in handy, especially in projects.
20
5. Embeddable
Complimentary to extensibility, Python is embeddable as well. You can put your Python code
in your source code of a different language, like C++. This lets us add scripting capabilities to
our code in the other language.
6. Improved Productivity
The language’s need to be in simplicity and extensive libraries render programmers more
productive than languages like Java and C++ do. Also, the fact that you need to write less and get
more things done.
7. IOT Opportunities
Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright for the
Internet Of Things. This is a way to connect the language with the real world.
21
5.2 WHAT IS MACHINE LEARNING
Before we take a look at the details of various machine learning methods, let's start by looking
at what machine learning is, and what it isn't. Machine learning is often categorized as a
subfield of artificial intelligence, but I find that categorization can often be misleading at first
brush. The study of machine learning certainly arose from research in this context, but in the
data science application of machine learning methods, it's more helpful to think of machine
learning as a means of building models of data.
At the most fundamental level, machine learning can be categorized into two main types:
supervised learning and unsupervised learning.
Supervised learning involves somehow modeling the relationship between measured features
of data and some label associated with the data; once this model is determined, it can be used
to apply labels to new, unknown data. This is further subdivided into
22
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence,
Machine Learning and Deep Learning to get the key information from data to perform several
real-world tasks and solve problems. We can call it data-driven decisions taken by machines,
particularly to automate the process. These data-driven decisions can be used, instead of using
programing logic, in the problems that cannot be programmed inherently. The fact is that we
can’t do without human intelligence, but other aspect is that we all need to solve real- world
problems with efficiency at a huge scale. That is why the need for machine learning arises.
Quality of data − Having good-quality data for ML algorithms is one of the biggest
challenges. Use of low-quality data leads to the problems related to data preprocessing.
Lack of specialist persons − As ML technology is still in its infancy stage, availability of expert
resources is a tough job.
No clear objective for formulating business problems − Having no clear objective and well
-defined goal for business problems is another key challenge for ML because this technology
is not that mature yet.
• Emotion analysis
• Sentiment analysis
• Error detection and prevention
• Weather forecasting and prediction
• Stock market analysis and forecasting
• Speech synthesis
• Speech recognition
• Customer segmentation
23
5.2.4 How to Start Learning Machine Learning?
Arthur Samuel coined the term “Machine Learning” in 1959 and defined it as a “Field of
study that gives computers the capability to learn without being explicitly
programmed”.
And that was the beginning of Machine Learning! In modern times, Machine Learning is one of
the most popular (if not the most!) career choices. According to Indeed, Machine Learning
Engineer Is The Best Job of 2019 with a 344% growth and an average base salary of $146,085
per year.
But there is still a lot of doubt about what exactly is Machine Learning and how to start learning
it? So this article deals with the Basics of Machine Learning and also the path you can follow
to eventually become a full-fledged Machine Learning Engineer. Now let’s get started!!!
This is a rough roadmap you can follow on your way to becoming an insanely talented Machine
Learning Engineer. Of course, you can always modify the steps according to your needs to
reach your desired end-goal!
Both Linear Algebra and Multivariate Calculus are important in Machine Learning. However,
the extent to which you need them depends on your role as a data scientist. If you
are more focused on application heavy machine learning, then you will not be that heavily
focused on maths as there are many common libraries available. But if you want to focus on
R&D in Machine Learning, then mastery of Linear Algebra and Multivariate Calculus is very
important as you will have to implement many ML algorithms from scratch.
24
5.2.6 ADVANTAGES & DISADVANTAGES OF ML
Machine Learning can review large volumes of data and discover specific trends and patterns that
would not be apparent to humans. For instance, for an e-commerce website like Amazon, it serves
to understand the browsing behaviors and purchase histories of its users to help cater to the right
products, deals, and reminders relevant to them.
2. Continuous Improvement
As ML algorithms gain experience, they keep improving in accuracy and efficiency. This
lets them make better decisions. Say you need to make a weather forecast model. As the amount
of data you have keeps growing, your algorithms learn to make more accurate predictions faster.
1. Data Acquisition
Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased,
and of good quality. There can also be times where they must wait for new data to be
generated.
3. Interpretation of Results
Another major challenge is the ability to accurately interpret results generated by the algorithms.
You must also carefully choose the algorithms for your purpose.
25
5.3 PYTHON DEVELOPMENT STEPS
Guido Van Rossum published the first version of Python code (version 0.9.0) at alt
.sources in February 1991. This release included already exception handling, functions, and the
core data types of list, dict, str and others. It was also object oriented and had a module system.
Python version 1.0 was released in January 1994. The major new features included in this
release were the functional programming tools lambda, map, filter and reduce, which Guido
Van Rossum never liked. Six and a half years later in October 2000, Python 2.0
This release included list comprehensions, a full garbage collector and it was supporting
Unicode Python flourished for another 8 years in the versions 2.x before the next major release
as Python 3.0 (also known as "Python 3000" and "Py3K") was released. Python3 is not
backwards compatible with Python 2.x. The emphasis in Python 3 had been on the removal
of duplicate programming constructs and modules, thus fulfilling or coming close to
fulfilling the 13th law of the Zen of Python: "There should be one -- and preferably only one
Purpose :-
We demonstrated that our approach enables successful segmentation of intra-retinal layers—
even with low-quality images containing speckle noise, low contrast, and different intensity
ranges throughout—with the assistance of the ANIS feature.
Python
Python is an interpreted high-level programming language for general-purpose programming.
Created by Guido van Rossum and first released in 1991, Python has a design philosophy
that emphasizes code readability, notably using significant whitespace.
26
5.4 MODULES USED IN PROJECT
Tensor flow
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for
machine learning applications such as neural networks. It is used for both research and
production at Google.
Numpy
Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays.
It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:
Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures in a
variety of hardcopy formats and interactive environments across platforms. Matplotlib can be
used in Python scripts, the Python and IPython shells, the Jupyter Notebook, web application
servers, and four graphical user interface toolkits. Matplotlib tries to make easy things easy
and hard things possible. You can generate plots, histograms, power spectra, bar charts, error
charts, scatter plots, etc., with just a few lines of code.
27
5.5 INSTALL PYTHON STEP-BY-STEP IN WINDOWS AND MAC
Python a versatile programming language doesn’t come pre-installed on your computer
devices. Python was first released in the year 1991 and until today it is a very popular
high-level programming language. Its style philosophy emphasizes code readability with
its notable use of great whitespace.
The object-oriented approach and language construct provided by Python enables
programmers to write both clear and logical code for projects.
First, download the latest version of Python from the download page.
Second, double-click the installer file to launch the setup wizard.
In the setup window, you need to check the Add Python 3.8 to PATH and click Install Now
to begin the installation.
28
Once the setup completes, you’ll see the following window:
29
To verify the installation, you open the Run window and type cmd and press Enter:
If you see the output like the above screenshot, you’ve successfully installed Python on
your computer.
Likely, you didn’t check the Add Python 3.8 to PATH checkbox when you install Python.
30
Install Python on macOS
It’s recommended to install Python on macOS using an official installer. Here are the steps:
• First, download a Python release for macOS.
• Second, run the installer by double-clicking the installer file.
• Third, follow the instruction on the screen and click the Next button until the installer
completes.
python3 --version
If you see a response with the version of Python, then your computer already has Python 3
installed. Otherwise, you can install Python 3 using a package management system.
For example, you can install Python 3.10 on Ubuntu using apt:
sudo apt install python3.10
To install the newer version, you replace 3.10 with that version. A
quick introduction to the Visual Studio Code
Visual Studio Code is a lightweight source code editor. The Visual Studio Code is often called
VS Code. The VS Code runs on your desktop. It’s available for Windows, macOS, and Linux.
VS Code comes with many features such as IntelliSense, code editing, and extensions that
allow you to edit Python source code effectively. The best part is that the VS Code is open-
source and free.
Besides the desktop version, that you can use directly in your web browser without installing
it.
This tutorial teaches you how to set up Visual Studio Code for a Python environment so
that you can edit, run, and debug Python code.
31
EE
32
Creating a new Python project
First, create a new folder called hello world.
Second, launch the VS code and open the hello world folder.
Third, create a new app.py file and enter the following code and save the file:
print('Hello, World!')
Code language: Python (python)
The print() is a built-in function that displays a message on the screen. In this example, it’ll
show the message 'Hello, Word!'.
What is a function
When you sum two numbers, that’s a function. And when you multiply two numbers, that’s
also a function.
Each function takes your inputs, applies some rules, and returns a result.
In the above example, the print() is a function. It accepts a string and shows it on the screen.
Python has many built-in functions like the print() function to use them out of the box in your
program.
In addition, Python allows you to define your functions, which you’ll learn how to do it later.
Now, you can enter the Python code after the cursor >>> and press Enter to execute it.
34
For example, you can type the code print('Hello, World!') and press Enter, you’ll see the message Hello,
World! immediately on the screen:
Python Syntax
Whitespace and indentation
If you’ve been working in other programming languages such as Java, C#, or C/C++, you know
that these languages use semicolons (;) to separate the statements.
However, Python uses whitespace and indentation to construct the code structure.
The following shows a snippet of Python code:
35
At the end of each line, you don’t see any semicolon to terminate the statement. And the code
uses indentation to format the code.
By using indentation and whitespace to organize the code, Python code gains the following
advantages:
• First, you’ll never miss the beginning or ending code of a block like in other programming
languages such as Java or C#.
• Second, the coding style is essentially uniform. If you have to maintain another
developer’s code, that code looks the same as yours.
• Third, the code is more readable and clearer in comparison with other programming
languages.
Comments
The comments are as important as the code because they describe why a piece of code was
written.
When the Python interpreter executes the code, it ignores the comments.
In Python, a single-line comment begins with a hash (#) symbol followed by the comment. For
example:
# This is a single line comment in Python
Continuation of statements
Python uses a newline character to separate statements. It places each statement on one line.
However, a long statement can span multiple lines by using the backslash (\) character.
The following example illustrates how to use the backslash (\) character to continue a statement
in the second line:
36
Some words have special meanings in Python. They are called keywords.
The following shows the list of keywords in Python:
as elif if or yield
assert else import pass
break except in raise
Python is a growing and evolving language. So, its keywords will keep increasing and
changing.
Python provides a special module for listing its keywords called keyword.
To find the current keyword list, you use the following code:
import keyword
print(keyword.kwlist)
String literals
Python uses single quotes ('), double quotes ("), triple single quotes (''') and triple-double quotes
(""") to denote a string literal.
The string literal need to be surrounded with the same type of quotes. For example, if you use
a single quote to start a string literal, you need to use the same single quote to end it.
37
6 IMPLEMENTATIONS
6.1.1 PYTHON
Python increases your productivity. Python allows you to solve complex problems in less time
and fewer lines of code. It’s quick to make a prototype in Python.
Python becomes a solution in many areas across industries, from web applications to data
science and machine learning.
Python is quite easy to learn in comparison with other programming languages. Python syntax
is clear and beautiful.
Python has a large ecosystem that includes lots of libraries and frameworks.
Python is cross-platform. Python programs can run on Windows, Linux, and macOS.
Python has a huge community. Whenever you get stuck, you can get help from an active
community.
Python developers are in high demand.
38
6.1.2 SAMPLE CODE
import *
* import pandas as pd
def user_index(request):
user_id = request.session['user_id']
user = UserModel.objects.get(user_id=user_id)
if request.method == 'POST':
source= request.POST.get("source")
to=request.POST.get('to')
airline= request.POST.get("airline")
dept_time = request.POST.get("dept_time")
stops=request.POST.get('stops')
arr_time=request.POST.get('arr_time')
print(source,to,airline,dept_time,stops,arr_time)
obj = PredModel.objects.create
(source=source,to=to,airline=airline,dept_time=dept_time,stops=stops,arr_time=arr_time)
print(obj,'kkkkkkkkkkkkkkkkkk')
return redirect("Predict",id=obj.id)
39
return render(request,'user/user-index.html')
def user_myprofile(request):
user_id = request.session['user_id']
user = UserModel.objects.get(user_id=user_id)
if request.method == 'POST':
username = request.POST.get("user_username")
userppnum=request.POST.get('user_passportnumber')
email = request.POST.get("user_email")
contact = request.POST.get("user_contact")
password = request.POST.get("user_password")
address=request.POST.get('user_address')
print(username,userppnum,email,contact,password,address)
if len(request.FILES) != 0:
image = request.FILES["user_image"]
user.user_passportnumber=userppnum
user.user_username = username
user.user_contact = contact
user.user_email=email
user.user_password = password
user.user_image = image
user.user_address=address
user.save()
messages.success(request,'Updated Successfully')
else:
40
user.user_username = username
user.user_passportnumber=userppnum
user.user_contact = contact
user.user_contact = contact
user.user_email=email
# user.user_image=image
user.user_password = password
user.user_address=address
user.save()
messages.success(request,'Updated Successfully')
return redirect('user_myprofile')
return render(request,'user/user-myprofile.html',{'user':user})
def Predict(request,id):
data = Dataset.objects.all().first()
user_data = PredModel.objects.get(pk=id)
if(user_data.source == 'Chennai'):
Chennai=1
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.source == 'Delhi'):
Chennai=0
41
Delhi=1
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.source == 'Kolkata'):
Chennai=0
Delhi=0
Kolkata=1
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.source == 'Mumbai'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=1
Cochin=0
Hyderabad=0
elif(user_data.source == 'Cochin'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
42
Cochin=1
Hyderabad=0
elif(user_data.source == 'Hyderabad'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=1
else:
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
if(user_data.to == 'Chennai'):
Chennai=1
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.to == 'Delhi'):
43
Chennai=0
Delhi=1
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.to == 'Kolkata'):
Chennai=0
Delhi=0
Kolkata=1
Mumbai=0
Cochin=0
Hyderabad=0
elif(user_data.to == 'Mumbai'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=1
Cochin=0
Hyderabad=0
elif(user_data.to == 'Cochin'):
Chennai=0
Delhi=0
Kolkata=0
44
Mumbai=0
Cochin=1
Hyderabad=0
elif(user_data.to == 'Hyderabad'):
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=1
else:
Chennai=0
Delhi=0
Kolkata=0
Mumbai=0
Cochin=0
Hyderabad=0
if(user_data.airline == 'Air_India'):
Air_India=1
GoAir=0
IndiGo=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
45
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
Vistara=0
Vistara_Premium_economy=0
elif(user_data.airline == 'GoAir'):
Air_India=0
GoAir=1
IndiGo=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
Vistara=0
Vistara_Premium_economy=0
elif(user_data.airline =='IndiGo'):
Air_India=0
GoAir=0
IndiGo=1
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
46
Vistara=0
Vistara_Premium_economy=0
elif(user_data.airline == 'Vistara'):
Air_India=0
GoAir=0
IndiGo=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
Vistara=1
Vistara_Premium_economy=0
elif(user_data.airline == 'Vistara_Premium_economy'):
Air_India=0
GoAir=0
IndiGo=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
47
Vistara=0
Vistara_Premium_economy=1
else:
Air_India=0
GoAir=0
Jet_Airways=0
Jet_Airways_Business=0
Multiple_carriers=0
Multiple_carriers_Premium_economy=0
SpiceJet=0
Trujet=0
Vistara=0
Vistara_Premium_economy=0
journey_day=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-
%dT%H:%M").day)
journey_month=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-
%dT%H:%M").month)
Dep_Time_hour=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-
%dT%H:%M").hour)
Dep_Time_min=int(pd.to_datetime(user_data.dept_time,format="%Y-%m-
%dT%H:%M").minute)
Arrival_Time_hour=int(pd.to_datetime(user_data.arr_time,format="%Y-%m-
%dT%H:%M").hour)
Arrival_Time_min=int(pd.to_datetime(user_data.arr_time,format="%Y-%m-
48
%dT%H:%M").minute)
dur_hour=abs(Arrival_Time_hour-Dep_Time_hour)
dur_min=abs(Arrival_Time_min-Dep_Time_min)
lp=[journey_day,Chennai,Hyderabad,Cochin,Mumbai,Air_India,Jet_Airways,Jet_Airways
_Business,Multiple_carriers,Multiple_carriers_Premium_economy,IndiGo,Vistara_Premiu
m_economy,Vistara,
Trujet,SpiceJet,dur_hour,dur_min,int(user_data.stops),journey_month,Dep_Time_hour,De
p_Time_min,Arrival_Time_hour,Arrival_Time_min,Delhi,GoAir,Kolkata]
# output=lp
print(lp,'lllllllllllllllllllllll')
# print(Predict,'llllllllllll')
# reg_rf=RandomForestRegressor()
# # y=reg_rf.fit(output)
# y_pred=reg_rf.predict(lp)
# print(y_pred)
# output1=round(y_pred,2)
# # print(lp)
# id = request.session['id']
# user = PredictModel.objects.get(pk=id)
test = TestingModel.objects.create(Total_Stops=lp[17],Air_India=lp[5]
,Jet_Airways=lp[6],journey_day =lp[0],
Chennai=lp[1],Hyderabad=lp[2],Cochin=lp[3],
49
Mumbai=lp[4],Jet_Airways_Business=lp[7],Multiple_carriers=lp[8],
Multiple_carriers_Premium_economy=lp[9],
IndiGo=lp[10],Vistara_Premium_economy=lp[11],Vistara=lp[12],Trujet=lp[13]
SpiceJet=lp[14],dur_hour=lp[15],dur_min=lp[16],journey_month=lp[18],Dep_Time_hour
=lp[19],Dep_Time_min=lp[20],
Arrival_Time_hour=lp[21],Arrival_Time_min=lp[22],
Delhi=lp[23],GoAir=lp[24],Kolkata=lp[25])
print(test,'kkkkkkkkkkkkkkkkkk')
print(test.id,'jjjjjjjj')
return redirect('button',id=test.id)
# file = str(data.data_set)
# df = pd.read_csv('./media/'+ file)
# le = LabelEncoder()
# if type(lp[col]) == 'str':
# lp[col] = le.transform(lp[col])
# return render(request,'user/user-index.html')
50
7. SYSTEM TESTING
51
For example, there is a simple calculator application. The developer can write the
unit test to check if the user can enter two numbers and get the correct sum for
addition functionality.
b) Gorilla Testing
Gorilla testing is a test technique in which the tester and/or developer test the
module of the application thoroughly in all aspects. Gorilla testing is done to check
how robust your application is.
For example, the tester is testing the pet insurance company’s website, which
provides the service of buying an insurance policy, tag for the pet, Lifetime
membership. The tester can focus on any one module, let’s say, the insurance
policy module, and test it thoroughly with positive and negative test scenarios.
52
This type of testing is done on integrating modules of a system or between systems.
For example, a user is buying a flight ticket from any airline website. Users can
see flight details and payment information while buying a ticket, but flight details
and payment processing are two different systems. Integration testing should be
done while integrating of airline website and payment processing system.
For example, a tester is testing a pet insurance website. End to End testing involves
testing of buying an insurance policy, LPM, tag, adding another pet, updating credit
card information on users’ accounts, updating user address information, receiving
order confirmation emails and policy documents.
53
c) Smoke Testing
Smoke testing is performed to verify that basic and critical functionality of the
system under test is working fine at a very high level.
Whenever a new build is provided by the development team, then the Software
Testing team validates the build and ensures that no major issue exists. The testing
team will ensure that the build is stable, and a detailed level of testing will be
carried out further.
The client accepts the software only when all the features and functionalities work
as expected. This is the last phase of testing, after which the software goes into
production. This is also called User Acceptance Testing (UAT).
a) Alpha Testing
Alpha testing is a type of acceptance testing performed by the team in an
organization to find as many defects as possible before releasing software to
customers.
For example, the pet insurance website is under UAT. UAT team will run real-
time scenarios like buying an insurance policy, buying annual membership,
changing the address, ownership transfer of the pet in a same way the user uses the
real website. The team can use test credit card information to process payment-
related scenarios.
b) Beta Testing
Beta Testing is a type of software testing which is carried out by the
clients/customers. It is performed in the Real Environment before releasing the
product to the market for the actual end-users.
Beta Testing is carried out to ensure that there are no major failures in the software
54
or product, and it satisfies the business requirements from an end-user perspective.
Beta Testing is successful when the customer accepts the software.
Usually, this testing is typically done by the end-users. This is the final testing done
before releasing the application for commercial purposes. Usually, the Beta version
of the software or product released is limited to a certain number of users in a
specific area.
So, the end-user uses the software and shares the feedback with the company. The
company then takes necessary action before releasing the software worldwide.
• User management.
Non-Functional Testing
55
software is secure from malicious programs, viruses and how secure & strong the
authorization and authentication processes are.It also checks how software behaves
for any hacker’s attack & malicious programs and how software.
a) Penetration Testing
Penetration Testing or Pen testing is the type of security testing performed as an
authorized cyberattack on the system to find out the weak points of the system in
terms of security.
Notes: Do not perform the Pen testing on your laptop/computer. Always take
written permission to do pen tests.
The word stability means the ability of the application to withstand in the presence
of load. Response time is how quickly an application is available to users.
Performance testing is done with the help of tools. Loader.IO, JMeter,
LoadRunner, etc. are good tools available in the market.
a) Load testing
Load testing is testing of an application’s stability and response time by applying
load, which is equal to or less than the designed number of users for an application.
b) Stress Testing
Stress testing is testing an application’s stability and response time by applying
load, which is more than the designed number of users for an application.
56
8. SCREENSHOTS
a. Home Page:-
Fig-8.1
b. Admin :-
Fig-8.2
57
c. Register Page
Fig-8.3
58
d. User Page:-
Fig-8.4
59
e. Contact Information:-
Fig-8.5
60
k. About page:-
Fig-8.11
61
l. Predicted Page:-
Fig-8.12
Fig-8.13
62
9. CONCLUSION
To estimate the dynamic fare of flights, three different datasets from three different sources
have been used. Many insights have been found while visualizing the dataset. Seven different
machine learning algorithms have been used to build the model. Only limited information
can be obtained because data is acquired from websites that sell flight tickets. The correctness
of the model is determined by the evaluation metrics table I values obtained from the
procedure. The Random Forest Regressor outperformed the other algorithms with good
accuracy. So, Random Forest Regressor works fine for predicting the airline fare price. If
more data, such as actual seat availability, could be obtained in the future, the anticipated
results would be more accurate. Prediction-based services are currently employed in a variety
of sectors, including stock price predictor programs used by stock brokers and services like
Zestimate, which provides an estimate of housing values. As a result, in the aviation business,
a service like this is required to assist clients in reserving tickets. There have been numerous
studies conducted on this topic using various methodologies, and additional research is
required to increase the accuracy of prediction utilizing various algorithms. To acquire more
reliable findings, more accurate data with greater features might be employed.
63
10 REFERENCES
T. Janssen, "A linear quantile mized regression model for prediction of airline ticket
prices," in A Treatise on Electricity and Magnetism 3rd ed., vol. 2, 2014, pp. 68- 73.
Yiwei Chen and F. Vivek Farias, " Robust Dynamic Pricing With Strategic Customers,"
Mathematics of Operations Research 43, pp. 1119-1142, 2018.
Juhar Ahmed Abdella, Nazar Zaki, Khaled Shuaib and Fahad Khan, "Airline ticket price
and demand prediction: A survey.," Journal od King Saud University - Computer and
64
Manoharan and J. Samuel, "Study of Variants of Extreme Learning Machine (ELM) Brands
and its Performance Measure on Classification Algorithm," Journal of Soft Computing
Paradigm (JSCP) 3, pp. 83- 95, 2021.
V. Suma and Shavige Malleshwara Hills, "Data Mining based Prediction of Demand in
Indian Market for Refurbished Electronics," Journal of Soft Computing Paradigm (JSCP)
2, pp. 101-110, 2020.
W. K. Michael and A. G. Thomas, "A Framework for the Evaluation of Statistical
Prediction Models," CHEST, vol. 158, no. 1, pp. S29-S38, 2020.
L. Yuling and L. Zhichao, "Design and implementation of ticket price forecasting
system," in AIP Conference Proceedings, 2018.
Elizaveta Stavinova, Petr Chunaev and Klavdiya Bochenina, "Forecasting railway ticket
dynamic price with Google Trends open data," Procedia Computer Science, vol. 193, pp.
333-342, 2021.
S. Deepa, A. Alli, Sheetac and S. Gokila, "Machine learning regression model for
material synthesis prices prediction in agriculture," in materialstoday, 2021.
S. Matthew and Lewis, "Identifying airline price discrimination and the effect of
competition," International Journal of Industrial Organization, vol. 78, 2021.
Ismail Koc and Emel Arslan, "Dynamic ticket pricing of airlines using variant batch size
interpretable multivariable long short-term memory," Expert Systems with Applications,
vol. 175, 2021.
Rian Mehta, Stephen Rice, John Deaton and Scott R. Winter, "Creating a prediction model
of passenger preference between low cost and legacy airlines," Transportation
65