Topic:- Car Prices prediction using Linear Regression
Analysis
Code
Step1:- Summary of database
>summary(raw_data2)
car_ID symboling CarName fueltype aspiration doornumber carbody
Min. : 1 Min. :-2.0000 Length:205 Length:205 Length:205 Length:205
Length:205
1st Qu.: 52 1st Qu.: 0.0000 Class :character Class :character Class :character Class :character
Class :character
Median :103 Median : 1.0000 Mode :character Mode :character Mode :character
Mode :character Mode :character
Mean :103 Mean : 0.8341
3rd Qu.:154 3rd Qu.: 2.0000
Max. :205 Max. : 3.0000
drivewheel enginelocation wheelbase carlength carwidth carheight
curbweight
Length:205 Length:205 Min. : 86.60 Min. :141.1 Min. :60.30 Min. :47.80 Min. :1488
Class :character Class :character 1st Qu.: 94.50 1st Qu.:166.3 1st Qu.:64.10 1st Qu.:52.00 1st
Qu.:2145
Mode :character Mode :character Median : 97.00 Median :173.2 Median :65.50 Median :54.10
Median :2414
Mean : 98.76 Mean :174.0 Mean :65.91 Mean :53.72 Mean :2556
3rd Qu.:102.40 3rd Qu.:183.1 3rd Qu.:66.90 3rd Qu.:55.50 3rd Qu.:2935
Max. :120.90 Max. :208.1 Max. :72.30 Max. :59.80 Max. :4066
enginetype cylindernumber enginesize fuelsystem boreratio stroke
compressionratio
Length:205 Length:205 Min. : 61.0 Length:205 Min. :2.54 Min. :2.070 Min. :
7.00
Class :character Class :character 1st Qu.: 97.0 Class :character 1st Qu.:3.15 1st Qu.:3.110 1st
Qu.: 8.60
Mode :character Mode :character Median :120.0 Mode :character Median :3.31 Median :3.290
Median : 9.00
Mean :126.9 Mean :3.33 Mean :3.255 Mean :10.14
3rd Qu.:141.0 3rd Qu.:3.58 3rd Qu.:3.410 3rd Qu.: 9.40
Max. :326.0 Max. :3.94 Max. :4.170 Max. :23.00
horsepower peakrpm citympg highwaympg price
Min. : 48.0 Min. :4150 Min. :13.00 Min. :16.00 Min. : 5118
1st Qu.: 70.0 1st Qu.:4800 1st Qu.:19.00 1st Qu.:25.00 1st Qu.: 7788
Median : 95.0 Median :5200 Median :24.00 Median :30.00 Median :10295
Mean :104.1 Mean :5125 Mean :25.22 Mean :30.75 Mean :13277
3rd Qu.:116.0 3rd Qu.:5500 3rd Qu.:30.00 3rd Qu.:34.00 3rd Qu.:16503
Max. :288.0 Max. :6600 Max. :49.00 Max. :54.00 Max. :45400
>str(data)
tibble [10,000 × 8] (S3: tbl_df/tbl/[Link])
$ web-scraper-order : chr [1:10000] "1680204632-1" "1680204632-2" "1680204632-3"
"1680204632-4" ...
$ Car Model : chr [1:10000] "Skoda Octavia A8 2022" "Skoda Octavia A8 2022"
"Skoda Octavia A8 2022" "Skoda Octavia A8 2022" ...
$ Month/Year : chr [1:10000] "2023-03" "2023-02" "2023-01" "2022-12" ...
$ Average_price : chr [1:10000] "967,000 EGP" "979,000 EGP" "917,000 EGP" "881,000
EGP" ...
$ Minimum_Price : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
$ Maximum_Price : chr [1:10000] "1,017,000 EGP" "1,045,000 EGP" "950,000 EGP"
"950,000 EGP" ...
$ Average_Price : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
$ Average_price_Cleaned: num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
> str(data1)
'[Link]': 205 obs. of 26 variables:
$ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
$ CarName : chr "alfa-romero giulia" "alfa-romero stelvio" "alfa-romero Quadrifoglio"
"audi 100 ls" ...
$ fueltype : chr "gas" "gas" "gas" "gas" ...
$ aspiration : chr "std" "std" "std" "std" ...
$ doornumber : chr "two" "two" "two" "four" ...
$ carbody : chr "convertible" "convertible" "hatchback" "sedan" ...
$ drivewheel : chr "rwd" "rwd" "rwd" "fwd" ...
$ enginelocation : chr "front" "front" "front" "front" ...
$ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
$ carlength : num 169 169 171 177 177 ...
$ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
$ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
$ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
$ enginetype : chr "dohc" "dohc" "ohcv" "ohc" ...
$ cylindernumber : chr "four" "four" "six" "four" ...
$ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
$ fuelsystem : chr "mpfi" "mpfi" "mpfi" "mpfi" ...
$ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
$ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
$ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
$ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
$ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
$ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
$ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
$ price : num 13495 16500 16500 13950 17450 ...
Step2: Data Visualization (point plot of RPM vs car price):
ggplot(data, aes(x = horsepower, y = price)) +
+ geom_point() +
+ labs(title = "Scatter Plot of horsepower Vs price")
> ggplot(data, aes(x = peakrpm, y = price)) +
+ geom_point() +
+ labs(title = "Scatter Plot of RPM Vs price")
Step3:- Making the linear model:-
Syntax
lm_model <- lm(price ~ highwaympg + symboling + stroke, data = data1)
> summary(lm_model)
Output:-
Call:
lm(formula = price ~ highwaympg + symboling + stroke, data = data1)
Residuals:
Min 1Q Median 3Q Max
-8208 -3513 -1490 1461 20534
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34294.2 4637.8 7.395 3.76e-12 ***
highwaympg -804.6 58.4 -13.776 < 2e-16 ***
symboling -356.4 322.7 -1.104 0.271
stroke 1235.3 1281.8 0.964 0.336
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5736 on 201 degrees of freedom
Multiple R-squared: 0.4921, Adjusted R-squared: 0.4845
F-statistic: 64.92 on 3 and 201 DF, p-value: < 2.2e-16
Step4:- Visualizing Linear Model:-
Step5:- Plotting the Predicted Values:-