Shinkansen Travel Experience -
Hackathon
1 Shinkansen Bullet Train - Japan
I am glad to share that recently I participated in a Hackathon organized by Great
Learning in collaboration with McCombs School of Business and Great Lakes
Institute of Management as a Part of my Course PGP - DSBA.
The goal of the problem is to predict whether a passenger was delighted considering
his/her overall travel experience of traveling on the Shinkansen (Bullet Train).
We are given four different datasets comprising two as in train sets and the other
two as the test sets. Among them, one pair of train and test set is of travel data and
the other pair of train and test sets is of the survey data.
I have performed EDA to understand the data. It’s a binary Classification data of
customer satisfaction of who traveled in the bullet train. The data was collected on
various parameters but the ultimate goal was to predict the overall customer
satisfaction. I used various classification models for prediction such as:
1) A Classification and Regression Tree (CART), is a predictive model, which
explains how an outcome variable's values can be predicted based on other values.
A CART output is a decision tree where each fork is split into a predictor variable
and each end node contains a prediction for the outcome variable.
2) Random Forest Regression is a supervised learning algorithm that uses an
ensemble learning method for regression. The ensemble learning method is a
technique that combines predictions from multiple machine learning algorithms to
make a more accurate prediction than a single model
3) Boosting, in machine learning, boosting is an ensemble meta-algorithm for
reducing bias, variance in supervised learning, and a family of machine learning
algorithms that convert weak learners to strong ones.
4) Bagging, also known as bootstrap aggregation, is the ensemble learning method
that is commonly used to reduce variance within a noisy dataset. In bagging, a
random sample of data in a training set is selected with replacement—meaning that
the individual data points can be chosen more than once.
5) Naïve Bayes Classifier is one of the simple and most effective classification
algorithms that help build fast machine learning models that can make quick
predictions. It is a probabilistic classifier, which means it predicts based on the
probability of an object.
6) Logistic regression is a supervised learning classification algorithm used to
predict the probability of a target variable. The nature of the target or dependent
variable is dichotomous, which means there would be only two possible classes. In
simple words, the dependent variable is binary having data coded as either 1 (stands
for success/yes) or 0 (stands for failure/no). Mathematically, a logistic regression
model predicts P(Y=1) as a function of X.
Adaptive Boosting with base estimator RF worked well for me. I have achieved
95.39% accuracy in my prediction. For a while (14 hours) I was at the top of the
leader board. However, I participated to win and learn as much as possible and I
learned a lot and was able to be in the top 5. Looking forward to more such
participations. It was a wonderful learning experience and would like to use these
useful Data Science techniques at my workplace too.
Thank You #greatlearning for this experience.
#machinelearning #datascience #greatlearning #hackathon #hackofalltrades
Article Link
https://www.linkedin.com/pulse/shinkansen-travel-experience-hackathon-nishant-rai-sethia