Capstone PPT Final
Capstone PPT Final
By:
Mentor: Mr. Vishal Gupta Shaurya Gupta
Avik Kumar Debnath
Vishal Arora
Sourabh Sabharwal
Agenda
• Several factors contribute to the industry’s pricing challenge. Shifting marketing conditions, globalization,
increased competition, cost pressure, and volatility are leading to a change in the market landscape.
– One area that has an opportunity to deliver a significant competitive advantage is analytics.
• In our study, we are making use of analytics techniques like regression and sentiment analysis to predict:
– The retail pricing of various cars being launched in the USA market
– How different brands are perceived among the customers over the years.
• The data set of American cars has been used in the study in which the Retail Price (Sticker Price / MSRP) of
various manufacturers over the span of 30 years is available.
(Data has been taken from the reddit.com site for the new cars)
Data Overview
• The car dataset had details of cars launched in USA from 1990 to 2019 without any Segmentation.
As a result, the time frame 2010 to 2019 was selected for the study with 8741 rows and 174
columns
Private vehicles were divided into categories: Hatchback, Sedan, Coupe & Convertible and SUV
Commercial segment was divided into categories: Minivan/ Van, Pickup Trucks & Specialty
Vehicles.
• Private vehicles- Lot of variations was present in the Retail Price. Hence, the dataset was further divided
into 3 subcategories based on retail price range
Private and Commercial Vehicles Car Segments Private and Commercial Vehicles
4000
7000 3699
6063 3500
6000
3000
2678
5000
2500
4000 1942
2000
2000 1000
500 422
1000
0
Economy Segment Premium Segment Super Premium Commercial
0 Segment Vehicles
Private Vehicles Commercial Vehicles
Problem Statement
To build a pricing model for automobile companies that can predict the
best market value of a car that is pending for its launch in the American
market
– What are the important variables that decides the retail pricing?
Note: The Customer_Score is the sum of the scores calculated for a brand over the period 2010-2019
Final Model Results
Datasets R2 Value or Variance RMSE (Test)
Explained(%)
Actual v/s Predicted MSRP (Test), Economy Segment Actual v/s Predicted MSRP (Test), Commercial
Segment
Actual and Predicted MSRP closely Actual and Predicted MSRP closely
related (R2 ~90%) related (R2~88% )
Conclusion
Objective 1:
• A pricing model was built for automobile companies that can predict the best
market value in each of the 4 segments, viz. Economy, Premium, Super
Premium and Commercial Car segments for the American market.
• The main predictors which govern the price of the automobile in the above 4
segments were identified and the coefficients for each of these predictors
were found and the model was built.
Snapshot of the Relevant Predictors
Predictors Description Economy Premium Super Commercial
Premium
Brand of the vehicle (E.g., Acura, Audi,
Model Name
Honda, and etc.)
Gross Axle Weight Rating Front Max distributed weight that can be
supported by an axle of a vehicle
Important Variable Not Important Variable Note: This is not the complete list of predictors. For complete list, kindly refer to the report
Conclusion……. (Contd.)
Objective 2:
• Our study revealed that Brand Value had a significant impact on the MSRP or
the list price.
• In all the segments we found that there were certain brands which were the
key predictors. It can thus be concluded that these Brands had a positive
effect on determining the MSRP in each of the segments.
Future Scope of Study
Our study has not considered the following and their effects on MSRP:
– Advertisement expenditure
There is also scope of future work on how a vehicle can automatically assigned
to a segment given the predictors/ features
THANK YOU
APPENDIX
Super Premium Segment
• A super premium vehicle provides increased levels of comfort, equipment, amenities, quality, performance, and status relative to
regular cars for an increased price. The term is subjective and reflects both the qualities of the car and the brand image of its
manufacturer.
• This segment in this dataset is a mix of high end sports & luxury cars whose price range starts from $80,000 USD
• It includes brands like Aston, Bentley, BMW, Cadillac, Chevrolet, Audi, Ferrari, Acura, Dodge.
• Different ML algorithms were tried on the super premium segment for predicting best MSRP like Multi-variate linear regression,
Random Forest & Boosted Regression Tree.
• Out of all, we found that Multi-variate is giving the best results, hence we finalized it for predicting MSRP for super premium
segment.
• We found that 19 parameters out of total 174 were most relevant to predict MSRP.
• Model accuracy was 98.2% in predicting MSRP considering the above mentioned 19 variables.
Super Premium Relevant Variables & Visualizations
Predictors
Year Brand
• Multi-Regression Model (Supervised Learning) is implemented to achieve the End Goal of this Case study
– Relevant Variables are validated and checked and majorly predictors are relevant to commercial segment only
– 18 set of Predictor Variables does a good job in predicting an outcome
Model Factors Values
(MSRP) variable
– Relationships between Predictors and Dependent Variable MSRP is strong and Adjusted R-squared: 88.84%
Also Model accuracy is 88.84% P-value: < 2.2e-16
Root Mean Square Error 2715
Commercial Relevant Variables
Predictors