Flight Price Prediction
Flight Price Prediction
Prediction
Saurabh Yadav
ABSTRACT
Flight ticket prices can be something hard to guess, today we might see a price, check out the
price of the same flight tomorrow, it will be a different story. We might have often heard
travellers saying that flight ticket prices are so unpredictable.
Anyone who has booked a flight ticket knows how unexpectedly the prices vary. Airlines use
using sophisticated quasi-academic tactics which they call "revenue management" or "yield
management". The cheapest available ticket on a given flight gets more and less expensive
over time. This usually happens as an attempt to maximize revenue based on -
1. Time of purchase patterns (making sure last-minute purchases are expensive)
2. Keeping the flight as full as they want it (raising prices on a flight which is filling up in order to
reduce sales and hold back inventory for those expensive last-minute expensive purchases)
OBJECTIVE
To Scrape the data from website (I have scrapped the data from yatra.com) and
then build a machine learning model to predict the price of the flights.
DATA COLLECTION
Data Collection is one of the most important aspect of this project. There are
various sources of airfare data on the Web, which I could use to train our models.
A multitude of consumer travel sites supply fare information for multiple routes,
times, and airlines.
I tried various sources ranging from many APIs to scraping consumer travel
websites like yatra.com .
DATA CLEANING
There were no missing values in the dataset.
All the features are of object data type, hence no need to check for outliers and
skewness.
There are few repetitive tuples but they are in acceptable range.
The target is tightly distributed.
Exploratory Data Analysis
Distribution Plot Violin Plot
Count Plot
Total Stops Additional Info
Airlines
Pie Plot
Bar Plot
Heat Map
MODEL BUILDING
I have applied 7 algorithms here :
Ridge Regression
Elastic Net
K Neighbours Regression
SGD Regression
Decision Tree Regression
Random Forest Regression
Gradient Boosting Regression
CROSS VALIDATION SCORE
The least difference between r2_score and cross validation score is for Random Forest
Regressor, hence we will use that model.
HYPER PARAMETER TUNING
CONCLUSION
The Jet Airway Airlines are more costly than others whereas SpiceJet and IndiGo
are quite affordable. Flights from metro cities are more in number and hence few
are in budget and few are way too expensive. The expensive flights usually come
with layover(long/short), free meal and some other additional facilities as well.
The trend of flight prices vary over various months and across the holiday. There
are two groups of airlines: the economic group and the luxurious group. Spicejet,
AirAsia, IndiGo, Go Air are in the economical class, whereas Jet Airways and Air
India in the other. Vistara has a more spread out trend.