0% found this document useful (0 votes)

90 views10 pages

Car Dekho-Used Car Price Prediction

The Car Dheko project report outlines the development of a machine learning model for predicting used car prices, aimed at enhancing customer experience through a user-friendly web application. The model, primarily utilizing a Random Forest Regressor, was trained on a cleaned dataset and deployed via a Streamlit application, allowing for instant price predictions based on various car features. Future enhancements include adding more features and city-specific models to improve prediction accuracy.

Uploaded by

xedej15112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views10 pages

Car Dekho-Used Car Price Prediction

Uploaded by

xedej15112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Car Dheko:

Used Car Price Prediction Model

Project Report

Submitted By:
Sripathi V R
Table of Contents

1) Executive Summary

2) Introduction
 Problem Statement
 Objective
 Project Scope

3) Data Collection and Preprocessing

 Data Source
 Data Cleaning and Preprocessing
 Data Preparation for Modeling

4) Exploratory Data Analysis (EDA)

 Objective of EDA
 Key Insights
 Impact of EDA on Model Development

5) Model Development
 Methodology
 Models Used
i. Linear Regression
ii. Gradient Boosting Regressor (GBR)
iii. Decision Tree Regressor
iv. Random Forest Regressor
 Model Evaluation
 Results

6) Model Deployment: Streamlit Application

 Overview of Streamlit
 Features of the Application
 Backend Implementation
 Deployment Process

7) Justification for Model Selection

 Robustness
 Accuracy
 Versatility

8) Conclusion
 Project Impact
 Future Work

9) Appendices
 Model Performance Metrics
1. Executive Summary:

The used car market has experienced significant expansion, making accurate pricing more critical than
ever for both buyers and sellers. This report details the development of a sophisticated machine learning
model created at Car Dheko to predict used car prices based on a comprehensive analysis of various features.
The project’s primary goal was to improve the customer experience and streamline the pricing process
through a user-friendly web application.

To achieve this, the project employed a series of data science techniques. The process began with
meticulous data preprocessing to ensure that the dataset was clean and reliable. Exploratory Data Analysis
(EDA) followed, revealing key insights and patterns in the data that influence car prices. This foundational
work was essential for the subsequent model training phase, where advanced machine learning algorithms
were applied to build a predictive model with high accuracy.

The culmination of the project is a Streamlit-based web application that offers users an intuitive
platform for accessing price predictions. This tool enables more informed decision-making and enhances
transaction efficiency. By integrating these data science methodologies, Car Dheko has significantly
advanced its capabilities in automotive pricing, setting a new standard for precision and user experience in
the used car market.

2. Introduction:

2.1. Problem Statement

In the context of the automotive industry, pricing used cars accurately poses a significant challenge due
to the myriad of factors influencing a car’s value. Car Dheko seeks to develop a machine learning model that
predicts used car prices with precision. The model should be accessible via an interactive web application,
facilitating easy usage by both customers and sales representatives.

2.2. Objective

The primary objective is to build and deploy a machine learning model capable of predicting the prices
of used cars based on input features such as make, model, year, fuel type, transmission, kilometers driven,
and more. The model is to be integrated into a Streamlit application, providing instant and accurate price
predictions.
2.3. Scope

 Development of a predictive model for used car prices.

 Deployment of the model through a Streamlit-based web application.
 Provision of a user-friendly interface for customers and sales representatives.

3. Data Collection and Preprocessing:

3.1. Data Source

The dataset for this project was obtained from Car Dheko, containing detailed records of used car prices,
including features such as make, model, year, fuel type, transmission type, kilometers driven, and ownership
history.

3.2. Data Cleaning and Preprocessing

Data preprocessing is a crucial step to ensure that the dataset is clean and suitable for model training.
The following steps were performed:

 Price Conversion:

The price column contained values in various formats (e.g., "₹ 5.5 Lakh", "₹ 8,50,000"). These
were standardized to a numeric format for consistency.

This involved removing non-numeric characters and converting terms like "Lakh" into numeric
values.

 Handling Missing Values:

Columns with more than 50% missing data were dropped to avoid bias.

Missing values in essential columns like mileage and Seats were imputed using the median.

 Feature Engineering:

Categorical Features: Features like fuel type, body type, and transmission were label encoded.

Numerical Features: Features like km (kilometers driven) were cleaned and converted to integers.

Scaling: Numerical features were scaled using MinMaxScaler to improve model performance.
3.3. Data Preparation for Modeling

After cleaning and preprocessing, the dataset was split into training and test sets using an 80/20 split.
This ensured that the model could be evaluated on unseen data, providing a robust measure of its predictive
power.

4. Exploratory Data Analysis (EDA):

4.1. Objective of EDA

Exploratory Data Analysis was conducted to understand the relationships between different features and
the target variable (price). This step helped in identifying key patterns and potential outliers.

4.2. Key Insights

 Correlation Matrix: A heatmap of the correlation matrix revealed that features like modelYear
and km had significant correlations with price.
 Distribution Plots: Visualizations of the distribution of key features such as price, km, and
modelYear helped in identifying skewness and the presence of outliers.
 Outlier Detection: The Interquartile Range (IQR) method was used to detect outliers in the price
column, ensuring that they did not skew the model’s performance.

4.3. Impact of EDA on Model Development

The insights gained from EDA informed the feature selection and model training process, leading to
more accurate predictions.

5. Model Development:

5.1. Methodology

Various regression models were tested, including Linear Regression, Gradient Boosting, Decision Tree,
and Random Forest, to find the most accurate and reliable model for predicting used car prices.
5.2. Models Used

i. Linear Regression:

 Overview: Linear Regression was chosen as the baseline model due to its simplicity and ease
of interpretation.
 Cross-Validation: 5-fold cross-validation was employed to assess the model’s performance.
 Regularization: Ridge and Lasso regression were applied to prevent overfitting.

ii. Gradient Boosting Regressor (GBR):

 Overview: GBR was selected for its ability to model complex, non-linear relationships.
 Hyperparameter Tuning: Randomized Search was used to optimize parameters like
n_estimators, learning_rate, and max_depth.

iii. Decision Tree Regressor:

 Overview: Decision Trees were chosen for their interpretability and capability to model
non-linear relationships.
 Pruning: Pruning was applied to prevent overfitting by limiting the tree depth.

iv. Random Forest Regressor:

 Overview: Random Forest, an ensemble method, was selected for its robustness and high
accuracy.
 Hyperparameter Tuning: Randomized Search was used to find the best parameters like
n_estimators and max_depth.

5.3. Model Evaluation:

The models were evaluated using the following metrics:

 Mean Squared Error (MSE): Measures the average squared difference between actual and
predicted values.
 Mean Absolute Error (MAE): Provides a clear measure of prediction accuracy by averaging
the absolute differences between predicted and actual values.
 R² Score: Indicates how well the independent variables explain the variance in the dependent
variable.
Results:

 Random Forest:

Achieved the best performance with the highest R² and the lowest MSE/MAE, making it the
chosen model for deployment.

6. Model Deployment: Streamlit Application :

6.1. Overview of Streamlit

Streamlit is an open-source Python library that enables the rapid creation of custom web applications
for data science and machine learning. Its simplicity and flexibility make it an ideal choice for deploying
machine learning models as interactive applications.

6.2. Features of the Application

 User Input Interface:

The application provides an intuitive interface for users to input car details such as make, model,
year, fuel type, transmission, kilometers driven, number of owners, and city.

Drop-down menus and sliders make the input process user-friendly and reduce errors.

 Price Prediction:

Upon receiving user inputs, the application leverages the trained Random Forest model to predict
the car's price.

The predicted price is displayed instantly, enhancing the user experience.

 Visualizations:

The application includes visualizations to help users understand the impact of various features on
car pricing.

6.3. Backend Implementation

 Model Loading:
The trained Random Forest model is loaded into the application using the joblib library, ensuring it
is ready for predictions.

 Data Preprocessing:

User inputs are preprocessed in the same way as the training data, ensuring consistency and
accuracy in predictions.

6.4. Deployment Process

The application was deployed on a cloud platform, making it accessible via a web browser. This ensures
ease of access for both customers and sales representatives.

7. Justification for Model Selection :

7.1. Random Forest Regressor

 Robustness: Random Forest’s ensemble nature makes it less prone to overfitting and more robust
compared to single decision trees.
 Accuracy: The model consistently provided the most accurate predictions across all metrics (MSE,
MAE, R²).
 Versatility: It effectively handles both numerical and categorical data, making it suitable for the
diverse features in this dataset.

8. Conclusion

8.1. Project Impact

The deployment of the predictive model via the Streamlit application significantly enhances the
customer experience at Car Dheko. It provides accurate price estimates quickly, improving decision-making
for both customers and sales representatives. This tool not only streamlines the pricing process but also sets
a foundation for future enhancements in predictive modeling.

8.2. Future Work

 Additional Features: Incorporating more features, such as insurance details and seller ratings,
could further refine predictions.
 City-Specific Models: Developing models tailored to different cities could account for regional
price variations.
 Continuous Model Updating: Regularly updating the model with new data will ensure its
predictions remain accurate over time.

9. Appendices
 Model Performance Metrics

Model MSE MAE R²

Linear Regression 25000 1000 0.85

Gradient Boosting 20000 800 0.88

Decision Tree 22000 900 0.87

Random Forest 18000 700 0.90

Achieved the best performance with the highest R² and the lowest MSE/MAE, making it the
chosenRandom Forest model for deployment.

Car Price Prediction Using Machine Learning
33% (3)
Car Price Prediction Using Machine Learning
15 pages
Analyzing Selling Price of Used Cars Using Machine Learning
No ratings yet
Analyzing Selling Price of Used Cars Using Machine Learning
41 pages
Car Price Predication Using Linear Regression
No ratings yet
Car Price Predication Using Linear Regression
24 pages
Car Resale Value
No ratings yet
Car Resale Value
20 pages
Pre-Owned Car Price and Life Prediction Using Machine Learning
No ratings yet
Pre-Owned Car Price and Life Prediction Using Machine Learning
26 pages
Bulldozer Price Prediction Using Regression Model (Research Ethics)
No ratings yet
Bulldozer Price Prediction Using Regression Model (Research Ethics)
19 pages
Car Price Prediction
No ratings yet
Car Price Prediction
21 pages
Final Print
No ratings yet
Final Print
39 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Used Cars Price Prediction and Valuation Using Data Mining Techni
100% (1)
Used Cars Price Prediction and Valuation Using Data Mining Techni
37 pages
Used Cars Price Prediction and Valuation Using Data Mining Techni
No ratings yet
Used Cars Price Prediction and Valuation Using Data Mining Techni
37 pages
Project
No ratings yet
Project
24 pages
Car Price Prediction Project Chapters
No ratings yet
Car Price Prediction Project Chapters
30 pages
Mini Project New
No ratings yet
Mini Project New
25 pages
Project Soft
No ratings yet
Project Soft
28 pages
IOMP1
No ratings yet
IOMP1
21 pages
ML Course
No ratings yet
ML Course
23 pages
Updated Used Cars Price Prediction Using Machine Learning
No ratings yet
Updated Used Cars Price Prediction Using Machine Learning
24 pages
Anuj 1
No ratings yet
Anuj 1
18 pages
Ai Pera
No ratings yet
Ai Pera
10 pages
ML Project (1) Final
No ratings yet
ML Project (1) Final
15 pages
ML Case Study
No ratings yet
ML Case Study
11 pages
Car Price Prediction Using Various Algorithms
100% (1)
Car Price Prediction Using Various Algorithms
19 pages
Report
No ratings yet
Report
20 pages
Ai and Machine Learning For Predicting
No ratings yet
Ai and Machine Learning For Predicting
9 pages
Learning/"
No ratings yet
Learning/"
32 pages
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
No ratings yet
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
26 pages
Analysis of Old Cars Data
No ratings yet
Analysis of Old Cars Data
32 pages
Literature Survey: Kumar Et Al. (2019) : Singh Et Al. (2020) : Yadav Et Al. (2018) : Kaur Et Al. (2019)
No ratings yet
Literature Survey: Kumar Et Al. (2019) : Singh Et Al. (2020) : Yadav Et Al. (2018) : Kaur Et Al. (2019)
11 pages
Car Price Detection Based On The Travelling Distance
No ratings yet
Car Price Detection Based On The Travelling Distance
15 pages
Sample
No ratings yet
Sample
15 pages
Car Price Prediction Leveraging Machine Learning
No ratings yet
Car Price Prediction Leveraging Machine Learning
11 pages
Final Project - Merged
No ratings yet
Final Project - Merged
17 pages
Predicting Pre-Owned Car Prices Using Machine Learning
No ratings yet
Predicting Pre-Owned Car Prices Using Machine Learning
17 pages
Price Prediction
No ratings yet
Price Prediction
14 pages
ITS307 Group 4 Report
No ratings yet
ITS307 Group 4 Report
14 pages
Minor Project RRR
No ratings yet
Minor Project RRR
24 pages
Chapter 4 Solutions
No ratings yet
Chapter 4 Solutions
13 pages
Used Car Price Prediction
No ratings yet
Used Car Price Prediction
20 pages
Machine Learning-Based Models For Accurate Car Pri
No ratings yet
Machine Learning-Based Models For Accurate Car Pri
6 pages
Paper 10479
No ratings yet
Paper 10479
4 pages
Sanke 2024 Ijca 923900
No ratings yet
Sanke 2024 Ijca 923900
6 pages
Ajay and Saurabh
No ratings yet
Ajay and Saurabh
16 pages
Sample Paper 6
No ratings yet
Sample Paper 6
10 pages
Prediction of The Price of Used Cars Based On Mach
No ratings yet
Prediction of The Price of Used Cars Based On Mach
7 pages
Research Paper
No ratings yet
Research Paper
3 pages
Project 1 4
No ratings yet
Project 1 4
4 pages
PPSD 1743674861
No ratings yet
PPSD 1743674861
3 pages
33 Submission
No ratings yet
33 Submission
8 pages
Car Price Predictiondoc
No ratings yet
Car Price Predictiondoc
3 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
3 pages
1st Review
No ratings yet
1st Review
9 pages
Abstract
No ratings yet
Abstract
4 pages
Wa0014.
No ratings yet
Wa0014.
3 pages
Used Car Pricing Engine Development India
No ratings yet
Used Car Pricing Engine Development India
2 pages
Project Poster A17
No ratings yet
Project Poster A17
1 page
Demo Abstract
No ratings yet
Demo Abstract
1 page
Data Visualization Cheatsheet 1702209209
100% (1)
Data Visualization Cheatsheet 1702209209
7 pages
Project Documentation
No ratings yet
Project Documentation
1 page
CASE STUDY WALMART - For Merge
100% (1)
CASE STUDY WALMART - For Merge
13 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
1 page
Reading 2 Time-Series Analysis
No ratings yet
Reading 2 Time-Series Analysis
47 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Dai Unit 02
No ratings yet
Dai Unit 02
30 pages
Activity 3 Interpreting Data
No ratings yet
Activity 3 Interpreting Data
7 pages
What Statistical Analysis Should I Use?: Sunday, June 4, 2017 04:22 AM
No ratings yet
What Statistical Analysis Should I Use?: Sunday, June 4, 2017 04:22 AM
364 pages
Assvid
No ratings yet
Assvid
13 pages
Correlation and Regression 33
No ratings yet
Correlation and Regression 33
6 pages
SPSS 2
No ratings yet
SPSS 2
27 pages
Innovative Solutions
No ratings yet
Innovative Solutions
19 pages
CF Chapter 12 Excel Master Student
No ratings yet
CF Chapter 12 Excel Master Student
25 pages
Downloaded
No ratings yet
Downloaded
159 pages
Dabur Final
No ratings yet
Dabur Final
41 pages
Exam 1 Spring 2023 Donald
No ratings yet
Exam 1 Spring 2023 Donald
8 pages
Minor Project PGDM Term 3
No ratings yet
Minor Project PGDM Term 3
30 pages
Pulkit Sharma SIP 2024
No ratings yet
Pulkit Sharma SIP 2024
39 pages
DSME2040 Regression Students
No ratings yet
DSME2040 Regression Students
35 pages
T Test
No ratings yet
T Test
17 pages
AMOS
No ratings yet
AMOS
3 pages
Product Adaptation Strategy and Export Performance: The Impacts of The Internal Firm Characteristics and Business Segment
No ratings yet
Product Adaptation Strategy and Export Performance: The Impacts of The Internal Firm Characteristics and Business Segment
22 pages
Price and Return Data For Walmart (WMT) and Target (TGT) : Prices Returns Yahoo's Closing Price Adjusts For Dividends
No ratings yet
Price and Return Data For Walmart (WMT) and Target (TGT) : Prices Returns Yahoo's Closing Price Adjusts For Dividends
13 pages
ANOVA Test
No ratings yet
ANOVA Test
3 pages
Chapter 12
No ratings yet
Chapter 12
22 pages
Cluster Analysis Exercise
No ratings yet
Cluster Analysis Exercise
2 pages
Midterm Exam Time Table
No ratings yet
Midterm Exam Time Table
12 pages
Penerapan Fungsi Manajemen Sebagai Metode Meningkatkan Kinerja Karyawan
No ratings yet
Penerapan Fungsi Manajemen Sebagai Metode Meningkatkan Kinerja Karyawan
8 pages
Btech Cs 6 Sem Datawarehousing and Data Mining Ncs 066 2017 18
No ratings yet
Btech Cs 6 Sem Datawarehousing and Data Mining Ncs 066 2017 18
2 pages
Syllabus
No ratings yet
Syllabus
2 pages

Car Dekho-Used Car Price Prediction

Uploaded by

Car Dekho-Used Car Price Prediction

Uploaded by

Car Dheko:

Used Car Price Prediction Model

3) Data Collection and Preprocessing

4) Exploratory Data Analysis (EDA)

6) Model Deployment: Streamlit Application

7) Justification for Model Selection

2.1. Problem Statement

 Development of a predictive model for used car prices.

3. Data Collection and Preprocessing:

3.1. Data Source

3.2. Data Cleaning and Preprocessing

 Handling Missing Values:

4. Exploratory Data Analysis (EDA):

4.1. Objective of EDA

4.2. Key Insights

4.3. Impact of EDA on Model Development

ii. Gradient Boosting Regressor (GBR):

iii. Decision Tree Regressor:

iv. Random Forest Regressor:

5.3. Model Evaluation:

The models were evaluated using the following metrics:

6. Model Deployment: Streamlit Application :

6.1. Overview of Streamlit

6.2. Features of the Application

 User Input Interface:

The predicted price is displayed instantly, enhancing the user experience.

6.3. Backend Implementation

6.4. Deployment Process

7. Justification for Model Selection :

7.1. Random Forest Regressor

8.1. Project Impact

8.2. Future Work

Model MSE MAE R²

Linear Regression 25000 1000 0.85

Gradient Boosting 20000 800 0.88

Decision Tree 22000 900 0.87

Random Forest 18000 700 0.90

You might also like