Submitted By:-Shaikshahanaafroz - Cms20Mba093: 1. Identify The Shape of The Data
Submitted By:-Shaikshahanaafroz - Cms20Mba093: 1. Identify The Shape of The Data
Notebook
Submitted By:-ShaikShahanaAfroz_CMS20MBA093
In [2]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matp1otlib inline
import seaborn as sns
In [3] : df=pd.read_csv(”training.csv”)
df
Out[ 3 ] : Refld IsBadBuy PurchDate Auction VehYear VehicleAge Make Model Trin
1500 RAM
1 2 0 12/7/2009 ADESA 2004 5 DODGE PICKUP SP
2WD
STRATUS
2 3 0 12/7/2009 ADESA 2005 4 DODGE SX?
V6
GRAND
72980 73012 0 12/2/2009 ADESA 2005 4 JEEP CHEROKEE La
2WD V
I n [4] : df . shape
<class pandas.core.frame.DataFrame'>
RangeIndex: 72983 entries, 0 to 72982
Data columns (total 34 columns):
# Column Non-Null Count Dtype
Out [ 6] :
RefId 72983
IsBadBuy 72983
PurchDate 72983
Auction 72983
VehYear 72983
VehicleAge 72983
Make 72983
Model 72983
Trim 70623
SubModel 72975
Color 72975
Transmission 72974
WheelTypeID 69814
WheelType 69809
Vehodo 72983
Nationality 72978
Size 72978
TopThreeAmericanName 72978
MMRAcquisitionAuctionAveragePrice 72965
MMRAcquisitionAuctionCleanPrice 72965
MMRAcquisitionRetailAveragePrice 72965
MMRAcquisitonRetailCleanPrice 72965
MMRCurrentAuctionAveragePrice 72668
MMRCurrentAuctionCleanPrice 72668
MMRCurrentRetailAveragePrlce 72668
MMRCurrentRetailCleanPrice 72668
PRIMEUNIT 3419
AUCGUART 3419
BYRNO 72983
VNZIP1 72983
VNST 72983
VehBCost 72983
IsonlineSale 72983
WarrantyCost 72983
dtype: int64
I n [7] : df.isnull().sum()
Out[7]: RefId 0
IsBadBuy 0
PurchDate 0
Auction 0
VehYear 0
VehicleAge 0
Make 0
Model 0
Trim 2360
SubModel 8
Color 8
Transmission 9
WheelTypeID 3169
WheelType 3174
Vehodo 0
Nationality 5
Size 5
TopThreeAmericanName 5
MMRAcquisitionAuctionAveragePrice 18
MMRAcquisitionAuctionCleanPrice 18
MMRAcquisitionRetailAveragePrice 18
MMRAcquisitonRetailCleanPrice 18
MMRCurrentAuctionAveragePrice 315
MMRCurrentAuctionCleanPrice 315
MMRCurrentRetailAveragePrice 315
MMRCurrentRetailCleanPrice 315
PRIMEUNIT 69564
AUCGUART 69564
BYRNO 0
VNZIP1 0
VNST 0
VehBCost 0
IsonlineSale 0
WarrantyCost 0
dtype: int64
Out[8]:
print('Duplicate Columns
are: ') for column in
duplicateColNames:
print('Column Name : ', column)
In[II]: plt.figure(figsize=(20,10))
sns.heatmap(df.corr(),annot=True,cmap='rainbow')
p1t.show()
In [] :