Pandas Practice 2
Pandas Practice 2
Introduction
This assignment will help you to consolidate the concepts learnt in the session.
import pandas as pd
import numpy as np
%matplotlib inline
df =
pd.read_csv('https://raw.githubusercontent.com/jackiekazil/data-wrangling/master/data/chp3
/data-text.csv')
df.head(2)
df1 =
pd.read_csv('https://raw.githubusercontent.com/kjam/data-wrangling-pycon/master/data/berl
in_weather_oldest.csv')
df1.head(2)
Expected Output:
2. Get the row names from the above files.
Expected Output:
Expected Output:
4. Change the column name from any of the above file and store the changes made
permanently.
Expected Output:
Expected Output:
Expected Output:
Expected Output:
9. Get the column array using a variable
Expected Output:
Expected Output:
Expected Output:
users =
pd.read_csv('https://raw.githubusercontent.com/ben519/DataWrangling/master/Data/users.cs
v')
sessions =
pd.read_csv('https://raw.githubusercontent.com/ben519/DataWrangling/master/Data/session
s.csv')
products =
pd.read_csv('https://raw.githubusercontent.com/ben519/DataWrangling/master/Data/product
s.csv')
transactions =
pd.read_csv('https://raw.githubusercontent.com/ben519/DataWrangling/master/Data/transac
tions.csv')
users.head()
sessions.head()
transactions.head()
12. Join users to transactions, keeping all rows from transactions and only matching rows from
users (left join)
Expected Output:
13. Which transactions have a UserID not in users?
Expected Output:
14. Join users to transactions, keeping only rows from transactions and users that match via
UserID (inner join)
Expected Output:
15. Join users to transactions, displaying all matching rows AND all non-matching rows (full
outer join)
Expected Output:
16. Determine which sessions occurred on the same day each user registered
Expected Output:
17. Build a dataset with every possible (UserID, ProductID) pair (cross join)
Expected Output:
18. Determine how much quantity of each product was purchased by each user
Expected Output:
19. For each user, get each possible pair of pair transactions (TransactionID1, TransacationID2)
Expected Output:
20. Join each user to his/her first occuring transaction in the transactions table
Expected Output:
my_columns = list(data.columns)
my_columns
['UserID',
'User',
'Gender',
'Registered',
'Cancelled',
'TransactionID',
'TransactionDate',
'ProductID',
'Quantity']
missing_info = list(data.columns[data.isnull().any()])
missing_info
print('number missing for column {}: {}'.format(col, num_missing)) #count of missing data
col, percent_missing))