CMS Business School MBA 2022-24
Course: Marketing Analytics Code: 21MBADSE433
Semester-3 BLOCK – 1
(Tick in the box)
CIA-1: Individual Report
Topic of the Assignment: EDA using python – GOODS
Group
S. No. Name (in Caps only) USN Signature
No.
TEJA SAI PAVAN
1 22MBAR0423
SURAGOWNI
Date of Submission: 13-09-23
Submitted to Dr. AVINASH RANA
Marketing Analytics
Conducting exploratory data analysis using tables, plots, and
descriptive statistics.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('/content/Goods.csv')
print(df.head())
print(df.describe())
print(df.info())
df.hist(figsize=(12, 8))
plt.show()
sns.boxplot(data=df)
plt.show()
sns.pairplot(df)
plt.show()
sns.countplot(x='Price', data=df)
plt.show()
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()
OUTPUT: -
ProductID CompanyID ProductType Price Unit Color \
0 5001 BAC200 Seesaw 1,394.00 Each Yellow
1 5003 AND225 Small picnic table 830 Each Green
2 5004 AND225 Large picnic table 990 Each Blue
3 5006 BAC200 Spiderweb climber 1,722.00 Each Green
4 5008 AND225 Small round table 575 Each Red
Material Size Weight Discount
0 Powder-coated steel 1.5 x 9 feet 112.0 No
1 Powder-coated steel 7 x 6 feet 299.0 No
2 Powder-coated steel 9 x 6 feet 357.0 No
3 Powder-coated steel 84 x 84 inches 146.0 No
4 Powder-coated steel 5 feet diameter 213.0 No
ProductID Weight
count 62.000000 30.000000
mean 5263.870968 399.900000
std 210.520314 591.957148
min 5001.000000 17.000000
25% 5031.750000 60.750000
50% 5229.000000 186.000000
75% 5401.750000 362.250000
max 5628.000000 2100.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ProductID 62 non-null int64
1 CompanyID 62 non-null object
2 ProductType 62 non-null object
3 Price 62 non-null object
4 Unit 62 non-null object
5 Color 46 non-null object
6 Material 52 non-null object
7 Size 49 non-null object
8 Weight 30 non-null float64
9 Discount 62 non-null object
dtypes: float64(1), int64(1), object(8)
memory usage: 5.0+ KB
None
<ipython-input-13-21ee30c3803a>:16: FutureWarning: The default value of
numeric_only in DataFrame.corr is deprecated. In a future version, it will default
to False. Select only valid columns or specify the value of numeric_only to silence
this warning.
corr_matrix = df.corr()