Project Name:
FLIGHT FARE PREDICTION
NAME: RISHABH GIRI (1906191)
NAME : RITIK SINGH (1906193)
NAME : SEJAL BARNWAL (1906207)
NAME : AKRITI SINHA (1906010)
ABOUT PROJECT
This model predicts the price of the flight based on some parameters like
total stops, journey Day, journey month, Air India, Indigo, source,
destination, etc. I have trained this model using the random forest
regressor and after training, fine-tune the model which is also known as
hyper parameter tuning. Then save a model and deploy this Flight Fare
Prediction model using the Flask application on the localhost.
OVERVIEWS
We have 2 datasets here — training set and test set.
The training set contains the features, along with the prices of the flights. It
contains 10683 records, 10 input features and 1 output column — ‘Price’.
The test set contains 2671 records and 10 input features. The output ‘Price’
column needs to be predicted in this set. We will use Regression techniques here,
since the predicted output will be a continuous value.
Following is the features available in the dataset – Airline, Date_of_Journey,
Source, Destination, Route, Dep_Time, Arrival_Time ,Duration, Total_Stops,
Additional_Info, Price.
01
PYTHON
Language use
02
Jupyter Notebook
Platform Used
Technology Used
In Project
03
Machine Learning
Algorithm
04
FLASK FRAMEWORK
PYTHON
Diffrent Type Of Process
1) Install Jupyter Notebook : Ide where we used to code
2) Install liberary : Install all the important python liberary.
Tools:-
Pandas- This library is used for data analysis.
NumPy-It is used for mathematical calculations.
Diffrent Type Of Process
Seaborn/Matplotlib- It is used for data visualization.
Scikit-learn- It isused to train validate and test our ML model.
XGBoost-used in supervised learning(regression and
classification problems).
3) Dataset : Download a dataset from kaggle website.
Diffrent Type Of Process
CLEAN DATASET: Delete unnecessary data from dataset
1. Missing Values in the dataset.
2. All the Numerical variables and Distribution of the numerical
variables
3. Categorical Variables
4. Outliers
5. Relationship between an independent and dependent feature(price)
Diffrent Type Of Process
5) Perform EDA
From description we can see that Date_of_Journey is a object
data type,
Therefore, we have to convert this datatype into timestamp so as
to use this column properly for prediction
For this we require pandas to_datetime to convert object data
type to datetime dtype.
.dt.day method will extract only day of that date
.dt.month method will extract only month of that date
AFTER CONVERTING
Diffrent Type Of Process
6) Feature Engineering : We add ,delete and combine the
dataset for better performance.
To prepare proper input data so that it is compatible with ML
algorithm.
List of Feature Engineering Techniques:-
Encoding
Grouping Operations
Feature Split
Diffrent Type Of Process
7) Feature Selection : Where we find the corelation value through heat map.
Diffrent Type Of Process
Fitting model using Random Forest
1. Split dataset into train and test set in order to
prediction w.r.t X_test
2. If needed do scaling of data
Scaling is not done in Random forest
3. Import model
4. Fit the data
5. Predict w.r.t X_test
6. In regression check RSME Score
7. Plot graph
Checking accuracy of the model:
Evaluating the model accuracy is an essential part of
the process of creating machine learning models to describe
how well the model is performing in its predictions. The MSE,
MAE, and RMSE metrics are mainly used to evaluate the prediction
error rates and model performance in regression analysis.
• MAE (Mean absolute error)
• MSE (Mean Squared Error)
• RMSE (Root Mean Squared Error)
Model Deployment
Model Deployment is one of the last stages of any machine learning project. Here, we will
design a user interface. we used a flask to make an HTML file for flight price prediction. this will
take the input value for each feature and calculate the price for a flight as shown in the image
below.
THANK YOU