0% found this document useful (0 votes)
3K views

Capstone PPT Final

The document discusses predicting retail car prices in the American market. It analyzes a dataset of over 16,000 cars launched from 1990-2019, focusing on 8,741 cars from 2010-2019. The data is segmented into private vehicles (sedans, SUVs, etc.) and commercial vehicles. Private vehicles are further divided into economy, premium, and super premium segments based on price. The authors build separate pricing models for each segment using variables like brand, customer reviews, vehicle specs, and price. Advanced models like random forests achieve the best accuracy for predicting economy segment prices. Customer perception is found to impact retail prices.

Uploaded by

Nikhil jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views

Capstone PPT Final

The document discusses predicting retail car prices in the American market. It analyzes a dataset of over 16,000 cars launched from 1990-2019, focusing on 8,741 cars from 2010-2019. The data is segmented into private vehicles (sedans, SUVs, etc.) and commercial vehicles. Private vehicles are further divided into economy, premium, and super premium segments based on price. The authors build separate pricing models for each segment using variables like brand, customer reviews, vehicle specs, and price. Advanced models like random forests achieve the best accuracy for predicting economy segment prices. Customer perception is found to impact retail prices.

Uploaded by

Nikhil jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Predicting Retail Prices for

Cars launched by a brand in the American


Market and Customer Perception of brand
impacting Retail Price

By:
Mentor: Mr. Vishal Gupta Shaurya Gupta
Avik Kumar Debnath
Vishal Arora
Sourabh Sabharwal
Agenda

• Context & Background


• Data Overview
• Problem Statement
• Project Roadmap
• Methodology/ Modelling
• Analysis and Results
• Conclusion & Business Context
• Appendix
Brief Background
• Automotive industry has made a strong comeback from the 2008 financial crisis in the USA.

• Several factors contribute to the industry’s pricing challenge. Shifting marketing conditions, globalization,
increased competition, cost pressure, and volatility are leading to a change in the market landscape.

– One area that has an opportunity to deliver a significant competitive advantage is analytics.

• In our study, we are making use of analytics techniques like regression and sentiment analysis to predict:

– The retail pricing of various cars being launched in the USA market
– How different brands are perceived among the customers over the years.

• The data set of American cars has been used in the study in which the Retail Price (Sticker Price / MSRP) of
various manufacturers over the span of 30 years is available.

- The period of 2010-2019 considered in our study

(Data has been taken from the reddit.com site for the new cars)
Data Overview
• The car dataset had details of cars launched in USA from 1990 to 2019 without any Segmentation.

 16,383 rows or observations and 174 columns or features


 Around 18 brands that encompasses Acura, Audi, Buick, Dodge, Ferrari, Honda and others

• Significant amount of data was missing during the timeframe 1990-2009

 As a result, the time frame 2010 to 2019 was selected for the study with 8741 rows and 174
columns

• Segmentation was done to classify the data into different categories

 Private vehicles were divided into categories: Hatchback, Sedan, Coupe & Convertible and SUV

 Commercial segment was divided into categories: Minivan/ Van, Pickup Trucks & Specialty
Vehicles.

• Private vehicles- Lot of variations was present in the Retail Price. Hence, the dataset was further divided
into 3 subcategories based on retail price range

 Economy, Premium & Super Premium segments


Data Overview Charts

Private and Commercial Vehicles Car Segments Private and Commercial Vehicles
4000
7000 3699
6063 3500
6000
3000
2678
5000
2500

4000 1942
2000

3000 2678 1500

2000 1000

500 422
1000
0
Economy Segment Premium Segment Super Premium Commercial
0 Segment Vehicles
Private Vehicles Commercial Vehicles
Problem Statement

To build a pricing model for automobile companies that can predict the
best market value of a car that is pending for its launch in the American
market

– What are the important variables that decides the retail pricing?

– Does Customer Perception about a brand impacts retail price?


Project Roadmap Outlining the Logical
Approach
Car
Dataset
(174*8741)

- Single pricing model didn’t yield


Commercial
Private
Vehicle Vehicle
good results due to a lot of
Segment Segment variation in terms of pricing, make
(174*6063) (174*2678)
and type
Premium - Broadly, dataset divided into
Economy Super Commercial
($30,000- EDA
(<=$30,000) $80,000) Premium Private & Commercial, further
(>$80,000)
segmentation done on the basis
of retail pricing within private
Premium Commercial
Economy EDA Super Model vehicle segment
EDA Premium
EDA
- Individual models built on the
Economy
Premium
Model Super respective datasets to come up
Model Premium with accurate predictions
Model
Does Customer Perception about a Brand
Impacts Retail Price?
Collected Customer Reviews
on Cars of various brands Customer Reviews collected for ~18 brands
from genuine car review sites over the period
2010-2019
Text Mining on Customer
Reviews Extracted important key words using frequency
bar plot, and wordcloud

Performed Sentiment Analysis Analyzed each review using get_nrc_sentiment


on the basis of 8 emotions (anger, anticipation,
disgust, fear, joy, sadness, surprise, and trust)
Calculated Customer_Score
against each review Net Score = Overall Positive – Overall Negative
(Positive & Negative scores obtained from 8
emotions)
Introduced Customer_Score in
each private vehicle dataset Mapped Customer_Score into private datasets
based on brands applicable to each one of
them
Impact of Customer_Score &
Brand on retail pricing Studied brand name, customer score along with
other predictors to select important predictors
Economy Car Segment
• Consists of 1942 records having 10 brands including
Acura, Audi, Buick, Chevrolet, Dodge, Fiat, Ford,
GMC, Honda, and Hyundai.

• Studied Mapped Variable (Customer_Score) along


with other predictors to build pricing model using
linear regression
Cars having traction control are priced
• Relevant Variables (as per MLR) at a relatively higher rate.

Model Name EPA Class


Segment Engine
Passenger Capacity Min Ground Clearance
Front Leg Room Length Overall
Transmission Fuel Economy
Drivetrain Turning Diameter
Fuel System Front Wheel Material
Traction Control Fog Lamps
Parking Aid Backup Camera
Air Bags Daytime Running Lights
Economy Modeling
A pattern observed between residuals & fitted
values, means heteroscedasticity
• Based on above predictors, a linear regression
model was built that has a R2 value of 84.23%
(train) and RMSE (1626) on test data

• Model built didn’t work well with the linear


regression assumptions, thus not a good fit
model.

• Taking logarithmic transformation of retail price


drastically reduced RMSE value but marginally
increased R2 to (84.87%) Fitted Values

• Advanced supervised models were built using


R2 or variance
CART, random forest, and boosted regression Algorithm
explained(%)
RMSE (Test)
trees.
CART NA 2810.40
Random Forest 89.25% 1290.52
Gradient
Boosting 99.97% 56.99
Economy Modeling Continued….!
• Finalized random forest model amongst the model
built
• Optimal accuracy achieved
• Relevant set of variables found

• Model built with parameters


• mtry -> no. of variables in each tree -> 5
• ntree-> no. of trees in a forest -> 500
• Importance = TRUE

• The plot shown explained the relevance of each


variable
• E.g., Front Wheel Material is an important
variable.
• Removing this will reduce the prediction Variables with the highest importance scores
power of the model by 50%. are the ones that give the best prediction and
contribute most to the model
Does Brand and Customer Score impacts
pricing?

Customer score not much


correlated with the retail
price

Brands like Acura, Audi, Buick


have higher average MSRP

Note: The Customer_Score is the sum of the scores calculated for a brand over the period 2010-2019
Final Model Results
Datasets R2 Value or Variance RMSE (Test)
Explained(%)

Economy Car Segment 89.25% 1290.52

Premium Car Segment 84.96% 6292

Super Premium Car 98.25% 2416.10


Segment

Commercial Vehicle 88.84% 2714.73


Segment

- For each segment, individual models were built.


- Table includes best results obtained from either of the supervised learning algorithms (Multiple linear
regression, non-linear regression, regression trees(CART), random forest, and boosted regression trees
Glimpse of Results

Actual v/s Predicted MSRP (Test), Economy Segment Actual v/s Predicted MSRP (Test), Commercial
Segment

Actual and Predicted MSRP closely Actual and Predicted MSRP closely
related (R2 ~90%) related (R2~88% )
Conclusion

Objective 1:
• A pricing model was built for automobile companies that can predict the best
market value in each of the 4 segments, viz. Economy, Premium, Super
Premium and Commercial Car segments for the American market.

• The main predictors which govern the price of the automobile in the above 4
segments were identified and the coefficients for each of these predictors
were found and the model was built.
Snapshot of the Relevant Predictors
Predictors Description Economy Premium Super Commercial
Premium
Brand of the vehicle (E.g., Acura, Audi,
Model Name
Honda, and etc.)

United States Environment Protection


EPA Class Agency (EPA) classifies vehicle as large,
compact, midsize, sports cars and etc.

Done on body style such as hatchback,


Segment sedan, SUV, pickup trucks and etc.

Engine type of the car (CNG, diesel,


Engine gasoline, and etc)

Fuel Economy Mileage of the car

Passenger Capacity Sitting capacity in a car

Amount of space available in a car in


Front Leg Room
front of one’s legs

A system that increases driver’s


Night Vision perception & vision in darkness

Important Variable Not Important Variable


Commercial Segment Differentiators
Predictors Description Economy Premium Super Commercial
Premium
Amount of space available in a car for
Second Leg Room one’s leg in the second row

Measurement from one door panel to


Second Shoulder Room another.

In 4th gear, the gear ratio of 1:1 means


that the engine and
Fourth Gear Ratio the transmission's output rotate at the
same speed.

Gross Axle Weight Rating Front Max distributed weight that can be
supported by an axle of a vehicle

Cargo Volume Space calculated behind the vehicle


driver and passenger seating area

Cargo Box Width Floor Width of the cargo box

Cargo Box Length Floor Length of the cargo box

Important Variable Not Important Variable Note: This is not the complete list of predictors. For complete list, kindly refer to the report
Conclusion……. (Contd.)

Objective 2:

• Our study revealed that Brand Value had a significant impact on the MSRP or
the list price.

• In all the segments we found that there were certain brands which were the
key predictors. It can thus be concluded that these Brands had a positive
effect on determining the MSRP in each of the segments.
Future Scope of Study

Our study has not considered the following and their effects on MSRP:

– Advertisement expenditure

– Post sales service network

– Distribution/ Supply chain/ Logistics expenditure

There is also scope of future work on how a vehicle can automatically assigned
to a segment given the predictors/ features
THANK YOU
APPENDIX
Super Premium Segment
• A super premium vehicle provides increased levels of comfort, equipment, amenities, quality, performance, and status relative to
regular cars for an increased price. The term is subjective and reflects both the qualities of the car and the brand image of its
manufacturer.
• This segment in this dataset is a mix of high end sports & luxury cars whose price range starts from $80,000 USD
• It includes brands like Aston, Bentley, BMW, Cadillac, Chevrolet, Audi, Ferrari, Acura, Dodge.
• Different ML algorithms were tried on the super premium segment for predicting best MSRP like Multi-variate linear regression,
Random Forest & Boosted Regression Tree.
• Out of all, we found that Multi-variate is giving the best results, hence we finalized it for predicting MSRP for super premium
segment.
• We found that 19 parameters out of total 174 were most relevant to predict MSRP.
• Model accuracy was 98.2% in predicting MSRP considering the above mentioned 19 variables.
Super Premium Relevant Variables & Visualizations
Predictors

Year Brand

Segment EPA Class

Passenger Capacity Engine Type

Torque Displacement (cc)

Night Vision Stability Control

Transmission Reverse Ratio

Drivetrain Miles Overall Length

Trunk Volume Maintenance Miles


Commercial Segment
• Commercial is vital segment and play a major role in dataset having 2678 record counts.
• It includes Minivan/Van, Pickup Trucks, Specialty Vehicle from different Brands such as Chevrolet, Dodge, Ford, GMC, Honda,
Chrysler.
• Data Cleaning is done properly to achieve characteristics of information i.e Valid, Minivan/Van Pickup Trucks Specialty Vehicle

Accurate, Complete, Consistent and/or uniform.


2162
• EDA is one of the crucial step in data science that allows us to achieve
certain insights and statistical measure It performs to define and refine
458
our important features variable selection, that will be used in our model. 58

• Multi-Regression Model (Supervised Learning) is implemented to achieve the End Goal of this Case study
– Relevant Variables are validated and checked and majorly predictors are relevant to commercial segment only
– 18 set of Predictor Variables does a good job in predicting an outcome
Model Factors Values
(MSRP) variable
– Relationships between Predictors and Dependent Variable MSRP is strong and Adjusted R-squared: 88.84%
Also Model accuracy is 88.84% P-value: < 2.2e-16
Root Mean Square Error 2715
Commercial Relevant Variables
Predictors

Models Name Torque

Segment Fourth Gear Ratio

Engine Gross Axle Weight Rating Front

Drivetrain Daytime Running Lights

Wheelbase Fog Lamps

Front Leg Room Parking Aid

Second Leg Room Cargo Volume

Second Shoulder Room Back Up Camera

Fuel Tank Capacity Cargo Box Width Floor

You might also like