Number of rows: 115609
Number of columns: 14
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 115609 entries, 0 to 115608
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order_id 115609 non-null object
1 customer_unique_id 115609 non-null object
2 order_status 115609 non-null object
3 order_purchase_timestamp 115609 non-null object
4 order_approved_at 115595 non-null object
5 order_delivered_carrier_date 114414 non-null object
6 order_delivered_customer_date 113209 non-null object
7 order_estimated_delivery_date 115609 non-null object
8 order_item_id 115609 non-null int64
9 product_id 115609 non-null object
10 price 115609 non-null float64
11 payment_value 115609 non-null float64
12 review_score 115609 non-null int64
13 product_category_name_english 115609 non-null object
dtypes: float64(2), int64(2), object(10)
memory usage: 12.3+ MB
None
Missing data percentage per column:
order_id 0.000000
customer_unique_id 0.000000
order_status 0.000000
order_purchase_timestamp 0.000000
order_approved_at 0.012110
order_delivered_carrier_date 1.033657
order_delivered_customer_date 2.075963
order_estimated_delivery_date 0.000000
order_item_id 0.000000
product_id 0.000000
price 0.000000
payment_value 0.000000
review_score 0.000000
product_category_name_english 0.000000
dtype: float64
Columns with missing data:
order_approved_at 0.012110
order_delivered_carrier_date 1.033657
order_delivered_customer_date 2.075963
dtype: float64
Value counts for order_status:
order_status
delivered 113210
shipped 1138
canceled 536
invoiced 358
processing 357
unavailable 7
approved 3
Name: count, dtype: int64
Value counts for product_category_name_english:
product_category_name_english
bed_bath_table 11847
health_beauty 9944
sports_leisure 8942
furniture_decor 8743
computers_accessories 8105
...
arts_and_craftmanship 24
la_cuisine 15
cds_dvds_musicals 14
fashion_childrens_clothes 8
security_and_services 2
Name: count, Length: 71, dtype: int64