0% found this document useful (0 votes)

42 views152 pages

Joining Data With Pandas

Here is the head of the new grants DataFrame: account ward amount business 0 10002 14 5000 CELINA DELI 1 10044 44 2500 NEYBOUR'S TAVERN 2 12024 1 3000 DIGILOG ELECTRONICS 3 14446 1 4000 EMPTY BOTTLE 4 14624 1 2500 LITTLE MEL'S PIZZA JOINING DATA WITH PANDAS

Uploaded by

Zhuo Yang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views152 pages

Joining Data With Pandas

Uploaded by

Zhuo Yang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 152

Inner join

J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
For clarity

Tables = DataFrames

Merging = Joining

1 Photo by David Travis on Unsplash

JOINING DATA WITH PANDAS

Chicago data portal dataset

1 Photo by Pedro Lastra on Unsplash

JOINING DATA WITH PANDAS

Datasets for example

1 Ward image By Alissapump, Own work, CC BY-SA 3.0

JOINING DATA WITH PANDAS

The ward data
wards = pd.read_csv('Ward_Offices.csv')
print(wards.head())
print(wards.shape)

ward alderman address zip

0 1 Proco "Joe" ... 2058 NORTH W... 60647
1 2 Brian Hopkins 1400 NORTH ... 60622
2 3 Pat Dowell 5046 SOUTH S... 60609
3 4 William D. B... 435 EAST 35T... 60616
4 5 Leslie A. Ha... 2325 EAST 71... 60649
(50, 4)

JOINING DATA WITH PANDAS

Census data
census = pd.read_csv('Ward_Census.csv')
print(census.head())
print(census.shape)

ward pop_2000 pop_2010 change address zip

0 1 52951 56149 6% 2765 WEST SA... 60647
1 2 54361 55805 3% WM WASTE MAN... 60622
2 3 40385 53039 31% 17 EAST 38TH... 60653
3 4 51953 54589 5% 31ST ST HARB... 60653
4 5 55302 51455 -7% JACKSON PARK... 60637
(50, 6)

JOINING DATA WITH PANDAS

Merging tables
ward alderman address zip
0 1 Proco "Joe" ... 2058 NORTH W... 60647
1 2 Brian Hopkins 1400 NORTH ... 60622
2 3 Pat Dowell 5046 SOUTH S... 60609
3 4 William D. B... 435 EAST 35T... 60616
4 5 Leslie A. Ha... 2325 EAST 71... 60649

ward pop_2000 pop_2010 change address zip

JOINING DATA WITH PANDAS

Inner join
wards_census = wards.merge(census, on='ward')
print(wards_census.head(4))

ward alderman address_x zip_x pop_2000 pop_2010 change address_y zip_y

0 1 Proco "Joe" ... 2058 NORTH W... 60647 52951 56149 6% 2765 WEST SA... 60647
1 2 Brian Hopkins 1400 NORTH ... 60622 54361 55805 3% WM WASTE MAN... 60622
2 3 Pat Dowell 5046 SOUTH S... 60609 40385 53039 31% 17 EAST 38TH... 60653
3 4 William D. B... 435 EAST 35T... 60616 51953 54589 5% 31ST ST HARB... 60653

print(wards_census.shape)

(50, 9)

JOINING DATA WITH PANDAS

Inner join

JOINING DATA WITH PANDAS

Suffixes
print(wards_census.columns)

Index(['ward', 'alderman', 'address_x', 'zip_x', 'pop_2000', 'pop_2010', 'change',

'address_y', 'zip_y'],
dtype='object')

JOINING DATA WITH PANDAS

Suffixes
wards_census = wards.merge(census, on='ward', suffixes=('_ward','_cen'))
print(wards_census.head())
print(wards_census.shape)

ward alderman address_ward zip_ward pop_2000 pop_2010 change address_cen zi

0 1 Proco "Joe" ... 2058 NORTH W... 60647 52951 56149 6% 2765 WEST SA... 60
1 2 Brian Hopkins 1400 NORTH ... 60622 54361 55805 3% WM WASTE MAN... 60
2 3 Pat Dowell 5046 SOUTH S... 60609 40385 53039 31% 17 EAST 38TH... 60
3 4 William D. B... 435 EAST 35T... 60616 51953 54589 5% 31ST ST HARB... 60
4 5 Leslie A. Ha... 2325 EAST 71... 60649 55302 51455 -7% JACKSON PARK... 60
(50, 9)

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
One to many
relationships
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
One-to-one

One-To-One = Every row in the le table is related to only one row in the right table

JOINING DATA WITH PANDAS

One-to-one example
ward alderman address zip
0 1 Proco "Joe" ... 2058 NORTH W... 60647
1 2 Brian Hopkins 1400 NORTH ... 60622
2 3 Pat Dowell 5046 SOUTH S... 60609
3 4 William D. B... 435 EAST 35T... 60616
4 5 Leslie A. Ha... 2325 EAST 71... 60649

ward pop_2000 pop_2010 change address zip

JOINING DATA WITH PANDAS

One-to-many

One-To-Many = Every row in le table is related to one or more rows in the right table

JOINING DATA WITH PANDAS

One-to-many example

JOINING DATA WITH PANDAS

One-to-many example
licenses = pd.read_csv('Business_Licenses.csv')
print(licenses.head())
print(licenses.shape)

account ward aid business address zip

0 307071 3 743 REGGIE'S BAR... 2105 S STATE ST 60616
1 10 10 829 HONEYBEERS 13200 S HOUS... 60633
2 10002 14 775 CELINA DELI 5089 S ARCHE... 60632
3 10005 12 nan KRAFT FOODS ... 2005 W 43RD ST 60609
4 10044 44 638 NEYBOUR'S TA... 3651 N SOUTH... 60613
(10000, 6)

JOINING DATA WITH PANDAS

One-to-many example
ward alderman address zip
0 1 Proco "Joe" ... 2058 NORTH W... 60647
1 2 Brian Hopkins 1400 NORTH ... 60622
2 3 Pat Dowell 5046 SOUTH S... 60609
3 4 William D. B... 435 EAST 35T... 60616
4 5 Leslie A. Ha... 2325 EAST 71... 60649

account ward aid business address zip

JOINING DATA WITH PANDAS

One-to-many example
ward_licenses = wards.merge(licenses, on='ward', suffixes=('_ward','_lic'))
print(ward_licenses.head())

ward alderman address_ward zip_ward account aid business address_lic

0 1 Proco "Joe" ... 2058 NORTH W... 60647 12024 nan DIGILOG ELEC... 1038 N ASHLA...
1 1 Proco "Joe" ... 2058 NORTH W... 60647 14446 743 EMPTY BOTTLE... 1035 N WESTE...
2 1 Proco "Joe" ... 2058 NORTH W... 60647 14624 775 LITTLE MEL'S... 2205 N CALIF...
3 1 Proco "Joe" ... 2058 NORTH W... 60647 14987 nan MR. BROWN'S ... 2301 W CHICA...
4 1 Proco "Joe" ... 2058 NORTH W... 60647 15642 814 Beat Kitchen 2000-2100 W ...

JOINING DATA WITH PANDAS

One-to-many example
print(wards.shape)

(50, 4)

print(ward_licenses.shape)

(10000, 9)

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Merging multiple
DataFrames
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Merging multiple tables

JOINING DATA WITH PANDAS

Remembering the licenses table
print(licenses.head())

account ward aid business address zip

JOINING DATA WITH PANDAS

Remembering the wards table
print(wards.head())

ward alderman address zip

JOINING DATA WITH PANDAS

Review new data
grants = pd.read_csv('Small_Business_Grant_Agreements.csv')
print(grants.head())

address zip grant company

0 1000 S KOSTN... 60624 148914.50 NATIONWIDE F...
1 1000 W 35TH ST 60609 100000.00 SMALL BATCH,...
2 1000 W FULTO... 60612 34412.50 FULTON MARKE...
3 10008 S WEST... 60643 12285.32 LAW OFFICES ...
4 1002 W ARGYL... 60640 28998.75 MASALA'S IND...

JOINING DATA WITH PANDAS

Tables to merge
address zip grant company
0 1031 N CICER... 60651 150000.00 1031 HANS LLC
1 1375 W LAKE ST 60612 150000.00 1375 W LAKE ...
2 1800 W LAKE ST 60612 47700.00 1800 W LAKE LLC
3 4311 S HALST... 60609 87350.63 4311 S. HALS...
4 1747 W CARRO... 60612 50000.00 ACE STYLINE ...

account ward aid business address zip

JOINING DATA WITH PANDAS

Theoretical merge
grants_licenses = grants.merge(licenses, on='zip')
print(grants_licenses.loc[grants_licenses['business']=="REGGIE'S BAR & GRILL",
['grant','company','account','ward','business']])

grant company account ward business

0 136443.07 CEDARS MEDIT... 307071 3 REGGIE'S BAR...
1 39943.15 DARRYL & FYL... 307071 3 REGGIE'S BAR...
2 31250.0 JGF MANAGEMENT 307071 3 REGGIE'S BAR...
3 143427.79 HYDE PARK AN... 307071 3 REGGIE'S BAR...
4 69500.0 ZBERRY INC 307071 3 REGGIE'S BAR...

JOINING DATA WITH PANDAS

Single merge
grants.merge(licenses, on=['address','zip'])

address zip grant company account ward aid business

0 1020 N KOLMA... 60651 68309.8 TRITON INDUS... 7689 37 929 TRITON INDUS...
1 10241 S COMM... 60617 33275.5 SOUTH CHICAG... 246598 10 nan SOUTH CHICAG...
2 11612 S WEST... 60643 30487.5 BEVERLY RECO... 3705 19 nan BEVERLY RECO...
3 1600 S KOSTN... 60623 128513.7 CHARTER STEE... 293825 24 nan LEELO STEEL,...
4 1647 W FULTO... 60612 5634.0 SN PECK BUIL... 85595 27 673 S.N. PECK BU...

JOINING DATA WITH PANDAS

Merging multiple tables
grants_licenses_ward = grants.merge(licenses, on=['address','zip']) \
.merge(wards, on='ward', suffixes=('_bus','_ward'))
grants_licenses_ward.head()

address_bus zip_bus grant company account ward aid business alderma

0 1020 N KOLMA... 60651 68309.8 TRITON INDUS... 7689 37 929 TRITON INDUS... Emma M.
1 10241 S COMM... 60617 33275.5 SOUTH CHICAG... 246598 10 nan SOUTH CHICAG... Susan S
2 11612 S WEST... 60643 30487.5 BEVERLY RECO... 3705 19 nan BEVERLY RECO... Matthew
3 3502 W 111TH ST 60655 50000.0 FACE TO FACE... 263274 19 704 FACE TO FACE Matthew
4 1600 S KOSTN... 60623 128513.7 CHARTER STEE... 293825 24 nan LEELO STEEL,... Michael

JOINING DATA WITH PANDAS

Results
import matplotlib.pyplot as plt
grant_licenses_ward.groupby('ward').agg('sum').plot(kind='bar', y='grant')
plt.show()

JOINING DATA WITH PANDAS

Merging even more...
Three tables:

df1.merge(df2, on='col') \
.merge(df3, on='col')

Four tables:

df1.merge(df2, on='col') \
.merge(df3, on='col') \
.merge(df4, on='col')

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Left join
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Quick review

JOINING DATA WITH PANDAS

Left join

JOINING DATA WITH PANDAS

Left join

JOINING DATA WITH PANDAS

New dataset

JOINING DATA WITH PANDAS

Movies table
movies = pd.read_csv('tmdb_movies.csv')
print(movies.head())
print(movies.shape)

id original_title popularity release_date

0 257 Oliver Twist 20.415572 2005-09-23
1 14290 Better Luck ... 3.877036 2002-01-12
2 38365 Grown Ups 38.864027 2010-06-24
3 9672 Infamous 3.6808959999... 2006-11-16
4 12819 Alpha and Omega 12.300789 2010-09-17
(4803, 4)

JOINING DATA WITH PANDAS

Tagline table
taglines = pd.read_csv('tmdb_taglines.csv')
print(taglines.head())
print(taglines.shape)

id tagline
0 19995 Enter the World of Pandora.
1 285 At the end of the world, the adventure begins.
2 206647 A Plan No One Escapes
3 49026 The Legend Ends
4 49529 Lost in our world, found in another.
(3955, 2)

JOINING DATA WITH PANDAS

Merge with left join
movies_taglines = movies.merge(taglines, on='id', how='left')
print(movies_taglines.head())

id original_title popularity release_date tagline

0 257 Oliver Twist 20.415572 2005-09-23 NaN
1 14290 Better Luck ... 3.877036 2002-01-12 Never undere...
2 38365 Grown Ups 38.864027 2010-06-24 Boys will be...
3 9672 Infamous 3.6808959999... 2006-11-16 There's more...
4 12819 Alpha and Omega 12.300789 2010-09-17 A Pawsome 3D...

JOINING DATA WITH PANDAS

Number of rows returned
print(movies_taglines.shape)

(4805, 5)

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Other joins
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Right join

JOINING DATA WITH PANDAS

Right join

JOINING DATA WITH PANDAS

Looking at data
movie_to_genres = pd.read_csv('tmdb_movie_to_genres.csv')
tv_genre = movie_to_genres[movie_to_genres['genre'] == 'TV Movie']
print(tv_genre)

movie_id genre
4998 10947 TV Movie
5994 13187 TV Movie
7443 22488 TV Movie
10061 78814 TV Movie
10790 153397 TV Movie
10835 158150 TV Movie
11096 205321 TV Movie
11282 231617 TV Movie

JOINING DATA WITH PANDAS

Filtering the data
m = movie_to_genres['genre'] == 'TV Movie'
tv_genre = movie_to_genres[m]
print(tv_genre)

movie_id genre
4998 10947 TV Movie
5994 13187 TV Movie
7443 22488 TV Movie
10061 78814 TV Movie
10790 153397 TV Movie
10835 158150 TV Movie
11096 205321 TV Movie
11282 231617 TV Movie

JOINING DATA WITH PANDAS

Data to merge
id title popularity release_date
0 257 Oliver Twist 20.415572 2005-09-23
1 14290 Better Luck ... 3.877036 2002-01-12
2 38365 Grown Ups 38.864027 2010-06-24
3 9672 Infamous 3.6808959999... 2006-11-16
4 12819 Alpha and Omega 12.300789 2010-09-17

movie_id genre
4998 10947 TV Movie
5994 13187 TV Movie
7443 22488 TV Movie
10061 78814 TV Movie
10790 153397 TV Movie

JOINING DATA WITH PANDAS

Merge with right join
tv_movies = movies.merge(tv_genre, how='right',
left_on='id', right_on='movie_id')
print(tv_movies.head())

id title popularity release_date movie_id genre

0 153397 Restless 0.812776 2012-12-07 153397 TV Movie
1 10947 High School ... 16.536374 2006-01-20 10947 TV Movie
2 231617 Signed, Seal... 1.444476 2013-10-13 231617 TV Movie
3 78814 We Have Your... 0.102003 2011-11-12 78814 TV Movie
4 158150 How to Fall ... 1.923514 2012-07-21 158150 TV Movie

JOINING DATA WITH PANDAS

Outer join

JOINING DATA WITH PANDAS

Outer join

JOINING DATA WITH PANDAS

Datasets for outer join
m = movie_to_genres['genre'] == 'Family' m = movie_to_genres['genre'] == 'Comedy'
family = movie_to_genres[m].head(3) comedy = movie_to_genres[m].head(3)

movie_id genre movie_id genre

0 12 Family 0 5 Comedy
1 35 Family 1 13 Comedy
2 105 Family 2 35 Comedy

JOINING DATA WITH PANDAS

Merge with outer join
family_comedy = family.merge(comedy, on='movie_id', how='outer',
suffixes=('_fam', '_com'))
print(family_comedy)

movie_id genre_fam genre_com

0 12 Family NaN
1 35 Family Comedy
2 105 Family NaN
3 5 NaN Comedy
4 13 NaN Comedy

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Merging a table to
itself
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Sequel movie data
print(sequel.head())

id title sequel
0 19995 Avatar NaN
1 862 Toy Story 863
2 863 Toy Story 2 10193
3 597 Titanic NaN
4 24428 The Avengers NaN

JOINING DATA WITH PANDAS

Merging a table to itself

JOINING DATA WITH PANDAS

Merging a table to itself
original_sequels = sequels.merge(sequels, left_on='sequel', right_on='id',
suffixes=('_org','_seq'))
print(original_sequels.head())

id_org title_org sequel_org id_seq title_seq sequel_seq

0 862 Toy Story 863 863 Toy Story 2 10193
1 863 Toy Story 2 10193 10193 Toy Story 3 NaN
2 675 Harry Potter... 767 767 Harry Potter... NaN
3 121 The Lord of ... 122 122 The Lord of ... NaN
4 120 The Lord of ... 121 121 The Lord of ... 122

JOINING DATA WITH PANDAS

Continue format results
print(original_sequels[,['title_org','title_seq']].head())

title_org title_seq
0 Toy Story Toy Story 2
1 Toy Story 2 Toy Story 3
2 Harry Potter... Harry Potter...
3 The Lord of ... The Lord of ...
4 The Lord of ... The Lord of ...

JOINING DATA WITH PANDAS

Merging a table to itself with left join
original_sequels = sequels.merge(sequels, left_on='sequel', right_on='id',
how='left', suffixes=('_org','_seq'))
print(original_sequels.head())

id_org title_org sequel_org id_seq title_seq sequel_seq

0 19995 Avatar NaN NaN NaN NaN
1 862 Toy Story 863 863 Toy Story 2 10193
2 863 Toy Story 2 10193 10193 Toy Story 3 NaN
3 597 Titanic NaN NaN NaN NaN
4 24428 The Avengers NaN NaN NaN NaN

JOINING DATA WITH PANDAS

When to merge at table to itself
Common situations:

Hierarchical relationships

Sequential relationships

Graph data

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Merging on indexes
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Table with an index
id title popularity release_date
0 257 Oliver Twist 20.415572 2005-09-23
1 14290 Better Luck ... 3.877036 2002-01-12
2 38365 Grown Ups 38.864027 2010-06-24
3 9672 Infamous 3.680896 2006-11-16
4 12819 Alpha and Omega 12.300789 2010-09-17

title popularity release_date

id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16
12819 Alpha and Omega 12.300789 2010-09-17

JOINING DATA WITH PANDAS

Setting an index
movies = pd.read_csv('tmdb_movies.csv', index_col=['id'])
print(movies.head())

title popularity release_date

id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16
12819 Alpha and Omega 12.300789 2010-09-17

JOINING DATA WITH PANDAS

Merge index datasets
title popularity release_date
id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16

tagline
id
19995 Enter the Wo...
285 At the end o...
206647 A Plan No On...
49026 The Legend Ends

JOINING DATA WITH PANDAS

Merging on index
movies_taglines = movies.merge(taglines, on='id', how='left')
print(movies_taglines.head())

title popularity release_date tagline

id
257 Oliver Twist 20.415572 2005-09-23 NaN
14290 Better Luck ... 3.877036 2002-01-12 Never undere...
38365 Grown Ups 38.864027 2010-06-24 Boys will be...
9672 Infamous 3.680896 2006-11-16 There's more...
12819 Alpha and Omega 12.300789 2010-09-17 A Pawsome 3D...

JOINING DATA WITH PANDAS

MultiIndex datasets
samuel = pd.read_csv('samuel.csv', casts = pd.read_csv('casts.csv',
index_col=['movie_id', index_col=['movie_id',
'cast_id']) 'cast_id'])
print(samuel.head()) print(casts.head())

name character
movie_id cast_id movie_id cast_id
184 3 Samuel L. Jackson 5 22 Jezebel
319 13 Samuel L. Jackson 23 Diana
326 2 Samuel L. Jackson 24 Athena
329 138 Samuel L. Jackson 25 Elspeth
393 21 Samuel L. Jackson 26 Eva

JOINING DATA WITH PANDAS

MultiIndex merge
samuel_casts = samuel.merge(casts, on=['movie_id','cast_id'])
print(samuel_casts.head())
print(samuel_casts.shape)

name character
movie_id cast_id
184 3 Samuel L. Jackson Ordell Robbie
319 13 Samuel L. Jackson Big Don
326 2 Samuel L. Jackson Neville Flynn
329 138 Samuel L. Jackson Arnold
393 21 Samuel L. Jackson Rufus
(67, 2)

JOINING DATA WITH PANDAS

Index merge with left_on and right_on
title popularity release_date
id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16

genre
movie_id
5 Crime
5 Comedy
11 Science Fiction
11 Action

JOINING DATA WITH PANDAS

Index merge with left_on and right_on
movies_genres = movies.merge(movie_to_genres, left_on='id', left_index=True,
right_on='movie_id', right_index=True)
print(movies_genres.head())

id title popularity release_date genre

5 5 Four Rooms 22.876230 1995-12-09 Crime
5 5 Four Rooms 22.876230 1995-12-09 Comedy
11 11 Star Wars 126.393695 1977-05-25 Science Fiction
11 11 Star Wars 126.393695 1977-05-25 Action
11 11 Star Wars 126.393695 1977-05-25 Adventure

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Filtering joins
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Mutating versus filtering joins
Mutating joins:

Combines data from two tables based on matching observations in both tables

Filtering joins:

Filter observations from table based on whether or not they match an observation in
another table

JOINING DATA WITH PANDAS

What is a semi-join?

Semi-joins
Returns the intersection, similar to an inner join

Returns only columns from the le table and not the right

No duplicates

JOINING DATA WITH PANDAS

Musical dataset

1 Photo by Vlad Bagacian from Pexels

JOINING DATA WITH PANDAS

Example datasets
gid name
0 1 Rock
1 2 Jazz
2 3 Metal
3 4 Alternative ...
4 5 Rock And Roll

tid name aid mtid gid composer u_price

0 1 For Those Ab... 1 1 1 Angus Young,... 0.99
1 2 Balls to the... 2 2 1 nan 0.99
2 3 Fast As a Shark 3 2 1 F. Baltes, S... 0.99
3 4 Restless and... 3 2 1 F. Baltes, R... 0.99
4 5 Princess of ... 3 2 1 Deaffy & R.A... 0.99

JOINING DATA WITH PANDAS

Step 1 - semi-join
genres_tracks = genres.merge(top_tracks, on='gid')
print(genres_tracks.head())

gid name_x tid name_y aid mtid composer u_price

0 1 Rock 2260 Don't Stop M... 185 1 Mercury, Fre... 0.99
1 1 Rock 2933 Mysterious Ways 232 1 U2 0.99
2 1 Rock 2618 Speed Of Light 212 1 Billy Duffy/... 0.99
3 1 Rock 2998 When Love Co... 237 1 Bono/Clayton... 0.99
4 1 Rock 685 Who'll Stop ... 54 1 J. C. Fogerty 0.99

JOINING DATA WITH PANDAS

Step 2 - semi-join
genres['gid'].isin(genres_tracks['gid'])

JOINING DATA WITH PANDAS

Step 2 - semi-join
genres['gid'].isin(genres_tracks['gid'])

0 True
1 True
2 True
3 True
4 False
Name: gid, dtype: bool

JOINING DATA WITH PANDAS

Step 3 - semi-join
genres_tracks = genres.merge(top_tracks, on='gid')
top_genres = genres[genres['gid'].isin(genres_tracks['gid'])]
print(top_genres.head())

gid name
0 1 Rock
1 2 Jazz
2 3 Metal
3 4 Alternative & Punk
4 6 Blues

JOINING DATA WITH PANDAS

What is an anti-join?

Anti-join:
Returns the le table, excluding the intersection

Returns only columns from the le table and not the right

JOINING DATA WITH PANDAS

Step 1 - anti-join
genres_tracks = genres.merge(top_tracks, on='gid', how='left', indicator=True)
print(genres_tracks.head())

gid name_x tid name_y aid mtid composer u_price _merge

0 1 Rock 2260.0 Don't Stop M... 185.0 1.0 Mercury, Fre... 0.99 both
1 1 Rock 2933.0 Mysterious Ways 232.0 1.0 U2 0.99 both
2 1 Rock 2618.0 Speed Of Light 212.0 1.0 Billy Duffy/... 0.99 both
3 1 Rock 2998.0 When Love Co... 237.0 1.0 Bono/Clayton... 0.99 both
4 5 Rock And Roll NaN NaN NaN NaN NaN NaN left_only

JOINING DATA WITH PANDAS

Step 2 - anti-join
gid_list = genres_tracks.loc[genres_tracks['_merge'] == 'left_only', 'gid']
print(gid_list.head())

23 5
34 9
36 11
37 12
38 13
Name: gid, dtype: int64

JOINING DATA WITH PANDAS

Step 3 - anti-join
genres_tracks = genres.merge(top_tracks, on='gid', how='left', indicator=True)
gid_list = genres_tracks.loc[genres_tracks['_merge'] == 'left_only','gid']
non_top_genres = genres[genres['gid'].isin(gid_list)]
print(non_top_genres.head())

gid name
0 5 Rock And Roll
1 9 Pop
2 11 Bossa Nova
3 12 Easy Listening
4 13 Heavy Metal

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Concatenate
DataFrames
together vertically
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Concatenate two tables vertically
Pandas .concat() method can
concatenate both vertical and horizontal.
axis=0 , vertical

JOINING DATA WITH PANDAS

Basic concatenation
3 di erent tables iid cid invoice_date total
0 1 2 2009-01-01 1.98
Same column names
1 2 4 2009-01-02 3.96
Table variable names: 2 3 8 2009-01-03 5.94

inv_jan (top)
iid cid invoice_date total
inv_feb (middle)
0 7 38 2009-02-01 1.98
inv_mar (bo om) 1 8 40 2009-02-01 1.98
2 9 42 2009-02-02 3.96

iid cid invoice_date total

0 14 17 2009-03-04 1.98
1 15 19 2009-03-04 1.98
2 16 21 2009-03-05 3.96

JOINING DATA WITH PANDAS

Basic concatenation
pd.concat([inv_jan, inv_feb, inv_mar]) iid cid invoice_date total
0 1 2 2009-01-01 1.98
1 2 4 2009-01-02 3.96
2 3 8 2009-01-03 5.94
0 7 38 2009-02-01 1.98
1 8 40 2009-02-01 1.98
2 9 42 2009-02-02 3.96
0 14 17 2009-03-04 1.98
1 15 19 2009-03-04 1.98
2 16 21 2009-03-05 3.96

JOINING DATA WITH PANDAS

Ignoring the index
pd.concat([inv_jan, inv_feb, inv_mar], iid cid invoice_date total
ignore_index=True) 0 1 2 2009-01-01 1.98
1 2 4 2009-01-02 3.96
2 3 8 2009-01-03 5.94
3 7 38 2009-02-01 1.98
4 8 40 2009-02-01 1.98
5 9 42 2009-02-02 3.96
6 14 17 2009-03-04 1.98
7 15 19 2009-03-04 1.98
8 16 21 2009-03-05 3.96

JOINING DATA WITH PANDAS

Setting labels to original tables
pd.concat([inv_jan, inv_feb, inv_mar], iid cid invoice_date total
ignore_index=False, jan 0 1 2 2009-01-01 1.98
keys=['jan','feb','mar']) 1 2 4 2009-01-02 3.96
2 3 8 2009-01-03 5.94
feb 0 7 38 2009-02-01 1.98
1 8 40 2009-02-01 1.98
2 9 42 2009-02-02 3.96
mar 0 14 17 2009-03-04 1.98
1 15 19 2009-03-04 1.98
2 16 21 2009-03-05 3.96

JOINING DATA WITH PANDAS

Concatenate tables with different column names
Table: inv_jan

iid cid invoice_date total

0 1 2 2009-01-01 1.98
1 2 4 2009-01-02 3.96
2 3 8 2009-01-03 5.94

Table: inv_feb

iid cid invoice_date total bill_ctry

0 7 38 2009-02-01 1.98 Germany
1 8 40 2009-02-01 1.98 France
2 9 42 2009-02-02 3.96 France

JOINING DATA WITH PANDAS

Concatenate tables with different column names
pd.concat([inv_jan, inv_feb], bill_ctry cid iid invoice_date total
sort=True) 0 NaN 2 1 2009-01-01 1.98
1 NaN 4 2 2009-01-02 3.96
2 NaN 8 3 2009-01-03 5.94
0 Germany 38 7 2009-02-01 1.98
1 France 40 8 2009-02-01 1.98
2 France 42 9 2009-02-02 3.96

JOINING DATA WITH PANDAS

Concatenate tables with different column names
pd.concat([inv_jan, inv_feb], iid cid invoice_date total
join='inner') 1 2 2009-01-01 1.98
2 4 2009-01-02 3.96
3 8 2009-01-03 5.94
7 38 2009-02-01 1.98
8 40 2009-02-01 1.98
9 42 2009-02-02 3.96

JOINING DATA WITH PANDAS

Using append method
.append()
Simpli ed version of the .concat() method

Supports: ignore_index , and sort

Does Not Support: keys and join

Always join = outer

JOINING DATA WITH PANDAS

Append these tables
iid cid invoice_date total
0 1 2 2009-01-01 1.98
1 2 4 2009-01-02 3.96
2 3 8 2009-01-03 5.94

iid cid invoice_date total bill_ctry

0 7 38 2009-02-01 1.98 Germany
1 8 40 2009-02-01 1.98 France
2 9 42 2009-02-02 3.96 France

iid cid invoice_date total

0 14 17 2009-03-04 1.98
1 15 19 2009-03-04 1.98
2 16 21 2009-03-05 3.96

JOINING DATA WITH PANDAS

Append the tables
inv_jan.append([inv_feb, inv_mar], bill_ctry cid iid invoice_date total
ignore_index=True, 0 NaN 2 1 2009-01-01 1.98
sort=True) 1 NaN 4 2 2009-01-02 3.96
2 NaN 8 3 2009-01-03 5.94
3 Germany 38 7 2009-02-01 1.98
4 France 40 8 2009-02-01 1.98
5 France 42 9 2009-02-02 3.96
6 NaN 17 14 2009-03-04 1.98
7 NaN 19 15 2009-03-04 1.98
8 NaN 21 16 2009-03-05 3.96

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Verifying integrity
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Let's check our data
Possible merging issue: Possible concatenating issue:

Unintentional one-to-many relationship

Unintentional many-to-many relationship

Duplicate records possibly unintentionally
introduced

JOINING DATA WITH PANDAS

Validating merges
.merge(validate=None) :

Checks if merge is of speci ed type

'one_to_one'

'one_to_many'

'many_to_one'

'many_to_many'

JOINING DATA WITH PANDAS

Merge dataset for example
Table Name: tracks

tid name aid mtid gid u_price

0 2 Balls to the... 2 2 1 0.99
1 3 Fast As a Shark 3 2 1 0.99
2 4 Restless and... 3 2 1 0.99

Table Name: specs

tid milliseconds bytes

0 2 342562 5510424
1 3 230619 3990994
2 2 252051 4331779

JOINING DATA WITH PANDAS

Merge validate: one_to_one
tracks.merge(specs, on='tid',
validate='one_to_one')

Traceback (most recent call last):

MergeError: Merge keys are not unique in right dataset; not a one-to-one merge

JOINING DATA WITH PANDAS

Merge validate: one_to_many
albums.merge(tracks, on='aid',
validate='one_to_many')

aid title artid tid name mtid gid u_price

0 2 Balls to the... 2 2 Balls to the... 2 1 0.99
1 3 Restless and... 2 3 Fast As a Shark 2 1 0.99
2 3 Restless and... 2 4 Restless and... 2 1 0.99

JOINING DATA WITH PANDAS

Verifying concatenations
.concat(verify_integrity=False) :

Check whether the new concatenated index contains duplicates

Default value is False

JOINING DATA WITH PANDAS

Dataset for .concat() example
Table Name: inv_feb Table Name: inv_mar

cid invoice_date total cid invoice_date total

iid iid
7 38 2009-02-01 1.98 9 17 2009-03-04 1.98
8 40 2009-02-01 1.98 15 19 2009-03-04 1.98
9 42 2009-02-02 3.96 16 21 2009-03-05 3.96

JOINING DATA WITH PANDAS

Verifying concatenation: example
pd.concat([inv_feb, inv_mar], pd.concat([inv_feb, inv_mar],
verify_integrity=True) verify_integrity=False)

Traceback (most recent call last): cid invoice_date total

ValueError: Indexes have overlapping iid
values: Int64Index([9], dtype='int64', 7 38 2009-02-01 1.98
name='iid') 8 40 2009-02-01 1.98
9 42 2009-02-02 3.96
9 17 2009-03-04 1.98
15 19 2009-03-04 1.98
16 21 2009-03-05 3.96

JOINING DATA WITH PANDAS

Why verify integrity and what to do
Why:

Real world data is o en NOT clean

What to do:

Fix incorrect data

Drop duplicate rows

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Using
merge_ordered()
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
merge_ordered()

JOINING DATA WITH PANDAS

Method comparison
.merge() method: merge_ordered() method:

Column(s) to join on Column(s) to join on

on , left_on , and right_on on , left_on , and right_on

Type of join Type of join

how (le , right, inner, outer) {{@}} how (le , right, inner, outer)

default inner default outer

Overlapping column names Overlapping column names

suffixes suffixes

Calling the method Calling the function

df1.merge(df2) pd.merge_ordered(df1, df2)

JOINING DATA WITH PANDAS

Financial dataset

1 Photo by Markus Spiske on Unsplash

JOINING DATA WITH PANDAS

Stock data
Table Name: appl Table Name: mcd

date close date close

0 2007-02-01 12.087143 0 2007-01-01 44.349998
1 2007-03-01 13.272857 1 2007-02-01 43.689999
2 2007-04-01 14.257143 2 2007-03-01 45.049999
3 2007-05-01 17.312857 3 2007-04-01 48.279999
4 2007-06-01 17.434286 4 2007-05-01 50.549999

JOINING DATA WITH PANDAS

Merging stock data
import pandas as pd
pd.merge_ordered(appl, mcd, on='date', suffixes=('_aapl','_mcd'))

date close_aapl close_mcd

0 2007-01-01 NaN 44.349998
1 2007-02-01 12.087143 43.689999
2 2007-03-01 13.272857 45.049999
3 2007-04-01 14.257143 48.279999
4 2007-05-01 17.312857 50.549999
5 2007-06-01 17.434286 NaN

JOINING DATA WITH PANDAS

Forward fill

JOINING DATA WITH PANDAS

Forward fill example
pd.merge_ordered(appl, mcd, on='date', pd.merge_ordered(appl, mcd, on='date',
suffixes=('_aapl','_mcd'), suffixes=('_aapl','_mcd'))
fill_method='ffill')

date close_AAPL close_mcd

date close_aapl close_mcd 0 2007-01-01 NaN 44.349998
0 2007-01-01 NaN 44.349998 1 2007-02-01 12.087143 43.689999
1 2007-02-01 12.087143 43.689999 2 2007-03-01 13.272857 45.049999
2 2007-03-01 13.272857 45.049999 3 2007-04-01 14.257143 48.279999
3 2007-04-01 14.257143 48.279999 4 2007-05-01 17.312857 50.549999
4 2007-05-01 17.312857 50.549999 5 2007-06-01 17.434286 NaN
5 2007-06-01 17.434286 50.549999

JOINING DATA WITH PANDAS

When to use merge_ordered()?
Ordered data / time series

Filling in missing values

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Using merge_asof()
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Using merge_asof()

Similar to a merge_ordered() le -join

Similar features as merge_ordered()

Match on the nearest key column and not exact matches.

Merged "on" columns must be sorted.

JOINING DATA WITH PANDAS

Using merge_asof()

Similar to a merge_ordered() le -join

Similar features as merge_ordered()

Match on the nearest key column and not exact matches.

Merged "on" columns must be sorted.

JOINING DATA WITH PANDAS

Datasets
Table Name: visa Table Name: ibm

date_time close date_time close

0 2017-11-17 16:00:00 110.32 0 2017-11-17 15:35:12 149.3
1 2017-11-17 17:00:00 110.24 1 2017-11-17 15:40:34 149.13
2 2017-11-17 18:00:00 110.065 2 2017-11-17 15:45:50 148.98
3 2017-11-17 19:00:00 110.04 3 2017-11-17 15:50:20 148.99
4 2017-11-17 20:00:00 110.0 4 2017-11-17 15:55:10 149.11
5 2017-11-17 21:00:00 109.9966 5 2017-11-17 16:00:03 149.25
6 2017-11-17 22:00:00 109.82 6 2017-11-17 16:05:06 149.5175
7 2017-11-17 16:10:12 149.57
8 2017-11-17 16:15:30 149.59
9 2017-11-17 16:20:32 149.82
10 2017-11-17 16:25:47 149.96

JOINING DATA WITH PANDAS

merge_asof() example
pd.merge_asof(visa, ibm, on='date_time', Table Name: ibm
suffixes=('_visa','_ibm'))
date_time close

date_time close_visa close_ibm 0 2017-11-17 15:35:12 149.3

0 2017-11-17 16:00:00 110.32 149.11 1 2017-11-17 15:40:34 149.13

1 2017-11-17 17:00:00 110.24 149.83 2 2017-11-17 15:45:50 148.98

2 2017-11-17 18:00:00 110.065 149.59 3 2017-11-17 15:50:20 148.99

3 2017-11-17 19:00:00 110.04 149.505 4 2017-11-17 15:55:10 149.11

4 2017-11-17 20:00:00 110.0 149.42 5 2017-11-17 16:00:03 149.25

5 2017-11-17 21:00:00 109.9966 149.26 6 2017-11-17 16:05:06 149.5175

6 2017-11-17 22:00:00 109.82 148.97 7 2017-11-17 16:10:12 149.57

8 2017-11-17 16:15:30 149.59
9 2017-11-17 16:20:32 149.82
10 2017-11-17 16:25:47 149.96

JOINING DATA WITH PANDAS

merge_asof() example with direction
pd.merge_asof(visa, ibm, on=['date_time'], Table Name: ibm
suffixes=('_visa','_ibm'),
direction='forward') date_time close
0 2017-11-17 15:35:12 149.3

date_time close_visa close_ibm 1 2017-11-17 15:40:34 149.13

0 2017-11-17 16:00:00 110.32 149.25 2 2017-11-17 15:45:50 148.98

1 2017-11-17 17:00:00 110.24 149.6184 3 2017-11-17 15:50:20 148.99

2 2017-11-17 18:00:00 110.065 149.59 4 2017-11-17 15:55:10 149.11

3 2017-11-17 19:00:00 110.04 149.505 5 2017-11-17 16:00:03 149.25

4 2017-11-17 20:00:00 110.0 149.42 6 2017-11-17 16:05:06 149.5175

5 2017-11-17 21:00:00 109.9966 149.26 7 2017-11-17 16:10:12 149.57

6 2017-11-17 22:00:00 109.82 148.97 8 2017-11-17 16:15:30 149.59

9 2017-11-17 16:20:32 149.82
10 2017-11-17 16:25:47 149.96

JOINING DATA WITH PANDAS

When to use merge_asof()
Data sampled from a process

Developing a training set (no data leakage)

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Selecting data with
.query()
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
The .query() method
.query('SOME SELECTION STATEMENT')

Accepts an input string

Input string used to determine what rows are returned

Input string similar to statement a er WHERE clause in SQL statement

Prior knowledge of SQL is not necessary

JOINING DATA WITH PANDAS

Querying on a single condition
This table is stocks stocks.query('nike >= 90')

date disney nike

date disney nike
0 2019-07-01 143.009995 86.029999
2 2019-09-01 130.320007 93.919998
1 2019-08-01 137.259995 84.5
4 2019-11-01 151.580002 93.489998
2 2019-09-01 130.320007 93.919998
5 2019-12-01 144.630005 101.309998
3 2019-10-01 129.919998 89.550003
6 2020-01-01 138.309998 96.300003
4 2019-11-01 151.580002 93.489998
5 2019-12-01 144.630005 101.309998
6 2020-01-01 138.309998 96.300003
7 2020-02-01 117.650002 89.379997
8 2020-03-01 96.599998 82.739998
9 2020-04-01 99.580002 84.629997

JOINING DATA WITH PANDAS

Querying on a multiple conditions, "and", "or"
This table is stocks stocks.query('nike > 90 and disney < 140')

date disney nike

date disney nike
0 2019-07-01 143.009995 86.029999
2 2019-09-01 130.320007 93.919998
1 2019-08-01 137.259995 84.5
6 2020-01-01 138.309998 96.300003
2 2019-09-01 130.320007 93.919998
3 2019-10-01 129.919998 89.550003
stocks.query('nike > 96 or disney < 98')
4 2019-11-01 151.580002 93.489998
5 2019-12-01 144.630005 101.309998
6 2020-01-01 138.309998 96.300003 date disney nike
7 2020-02-01 117.650002 89.379997 5 2019-12-01 144.630005 101.309998
8 2020-03-01 96.599998 82.739998 6 2020-01-01 138.309998 96.300003
9 2020-04-01 99.580002 84.629997 28 020-03-01 96.599998 82.739998

JOINING DATA WITH PANDAS

Updated dataset
This table is stocks_long

date stock close

0 2019-07-01 disney 143.009995
1 2019-08-01 disney 137.259995
2 2019-09-01 disney 130.320007
3 2019-10-01 disney 129.919998
4 2019-11-01 disney 151.580002
5 2019-07-01 nike 86.029999
6 2019-08-01 nike 84.5
7 2019-09-01 nike 93.919998
8 2019-10-01 nike 89.550003
9 2019-11-01 nike 93.489998

JOINING DATA WITH PANDAS

Using .query() to select text
stocks_long.query('stock=="disney" or (stock=="nike" and close < 90)')

date stock close

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Reshaping data with
.melt()
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
Wide versus long data
Wide Format Long Format

JOINING DATA WITH PANDAS

What does the .melt() method do?
The melt method will allow us to unpivot our dataset

JOINING DATA WITH PANDAS

Dataset in wide format
This table is called social_fin

financial company 2019 2018 2017 2016

0 total_revenue twitter 3459329 3042359 2443299 2529619
1 gross_profit twitter 2322288 2077362 1582057 1597379
2 net_income twitter 1465659 1205596 -108063 -456873
3 total_revenue facebook 70697000 55838000 40653000 27638000
4 gross_profit facebook 57927000 46483000 35199000 23849000
5 net_income facebook 18485000 22112000 15934000 10217000

JOINING DATA WITH PANDAS

Example of .melt()
social_fin_tall = social_fin.melt(id_vars=['financial','company'])
print(social_fin_tall.head(10))

financial company variable value

0 total_revenue twitter 2019 3459329
1 gross_profit twitter 2019 2322288
2 net_income twitter 2019 1465659
3 total_revenue facebook 2019 70697000
4 gross_profit facebook 2019 57927000
5 net_income facebook 2019 18485000
6 total_revenue twitter 2018 3042359
7 gross_profit twitter 2018 2077362
8 net_income twitter 2018 1205596
9 total_revenue facebook 2018 55838000

JOINING DATA WITH PANDAS

Melting with value_vars
social_fin_tall = social_fin.melt(id_vars=['financial','company'],
value_vars=['2018','2017'])
print(social_fin_tall.head(9))

financial company variable value

0 total_revenue twitter 2018 3042359
1 gross_profit twitter 2018 2077362
2 net_income twitter 2018 1205596
3 total_revenue facebook 2018 55838000
4 gross_profit facebook 2018 46483000
5 net_income facebook 2018 22112000
6 total_revenue twitter 2017 2443299
7 gross_profit twitter 2017 1582057
8 net_income twitter 2017 -108063

JOINING DATA WITH PANDAS

Melting with column names
social_fin_tall = social_fin.melt(id_vars=['financial','company'],
value_vars=['2018','2017'],
var_name=['year'], value_name='dollars')
print(social_fin_tall.head(8))

financial company year dollars

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Course wrap-up
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubber eld

Instructor
You're this high performance race car now

1 Photo by jae park from Pexels

JOINING DATA WITH PANDAS

Data merging basics
Inner join using .merge()

One-to-one and one-to-many relationships

Merging multiple tables

JOINING DATA WITH PANDAS

Merging tables with different join types
Inner join using .merge()

One-to-one and one-to-one relationships

Merging multiple tables

Le , right, and outer joins

Merging a table to itself and merging on indexes

JOINING DATA WITH PANDAS

Advanced merging and concatenating
Inner join using .merge()

One-to-one and one-to-one relationships

Merging multiple tables

Le , right, and outer joins

Merging a table to itself and merging on indexes

Filtering joins
semi and anti joins

Combining data vertically with .concat()

Verify data integrity

JOINING DATA WITH PANDAS

Merging ordered and time-series data
Inner join using .merge() Ordered data
merge_ordered() and merge_asof()
One-to-one and one-to-one relationships
Manipulating data with .melt()
Merging multiple tables

Le , right, and outer joins

Merging a table to itself and merging on

indexes

Filtering joins
semi and anti joins

Combining data vertically with .concat()

Verify data integrity

JOINING DATA WITH PANDAS

Thank you!
J O I N I N G D ATA W I T H PA N D A S

Chapter 1
No ratings yet
Chapter 1
34 pages
Binder-4-Join Data With Pandas
No ratings yet
Binder-4-Join Data With Pandas
152 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Python Day 6 (Typed Notes) - Pandas Day 3 - Practice HomeWork, Concat, Different Systems - Connectivity, GIT Installation
No ratings yet
Python Day 6 (Typed Notes) - Pandas Day 3 - Practice HomeWork, Concat, Different Systems - Connectivity, GIT Installation
15 pages
DSP Unit-5 Updated
No ratings yet
DSP Unit-5 Updated
23 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Joining Data 4
No ratings yet
Joining Data 4
40 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Data Wrangling with Pandas
No ratings yet
Data Wrangling with Pandas
16 pages
UnitIV 1
No ratings yet
UnitIV 1
4 pages
B 6vdzrQSgaur3c60LoG1g - Data Analysis All Slides
No ratings yet
B 6vdzrQSgaur3c60LoG1g - Data Analysis All Slides
60 pages
4th Unit Answer Bank
No ratings yet
4th Unit Answer Bank
40 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Pandas Introduction Notes
No ratings yet
Pandas Introduction Notes
7 pages
EDA Lecture 7 - 9
No ratings yet
EDA Lecture 7 - 9
7 pages
OOM Unit 2
No ratings yet
OOM Unit 2
145 pages
Unit 4 1
No ratings yet
Unit 4 1
3 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas
No ratings yet
Pandas
26 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Combining Data in Pandas With Merge, .Join, and Concat - Real Python
No ratings yet
Combining Data in Pandas With Merge, .Join, and Concat - Real Python
2 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Pandas DataFrame Join Techniques
No ratings yet
Pandas DataFrame Join Techniques
29 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Dataset Merging and Concatenation Guide
No ratings yet
Dataset Merging and Concatenation Guide
36 pages
Exp 6
No ratings yet
Exp 6
9 pages
Pandas DataFrame Merging Guide
No ratings yet
Pandas DataFrame Merging Guide
62 pages
Edp 3
No ratings yet
Edp 3
16 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Panda Joins
No ratings yet
Panda Joins
25 pages
Data Ingestion and Reshaping Guide
100% (1)
Data Ingestion and Reshaping Guide
2 pages
Exp 3
No ratings yet
Exp 3
10 pages
Average Height Data Analysis
No ratings yet
Average Height Data Analysis
27 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
DataFrames Continued
No ratings yet
DataFrames Continued
9 pages
Essential Pandas Cheat Sheet Guide
No ratings yet
Essential Pandas Cheat Sheet Guide
5 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas Intermediate Functions Logic
No ratings yet
Pandas Intermediate Functions Logic
2 pages
Ch-2 - Panda - Part-1 - 2nd - Day
No ratings yet
Ch-2 - Panda - Part-1 - 2nd - Day
4 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Pandas Data Structures and Operations
No ratings yet
Pandas Data Structures and Operations
36 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Pandas Moderate
No ratings yet
Pandas Moderate
15 pages
Cheat Sheet Pandas
No ratings yet
Cheat Sheet Pandas
4 pages
Week 2
No ratings yet
Week 2
6 pages
Spark SQLPDF 20 Jan
No ratings yet
Spark SQLPDF 20 Jan
4 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pandas Indexing and Data Handling
No ratings yet
Pandas Indexing and Data Handling
44 pages
Pandas: DataFrames & Series Guide
No ratings yet
Pandas: DataFrames & Series Guide
2 pages
Data Wrangling & Analysis Guide
100% (1)
Data Wrangling & Analysis Guide
36 pages
Introduction to Pandas DataFrames
100% (1)
Introduction to Pandas DataFrames
21 pages
Numpy - Pandas - Colab
No ratings yet
Numpy - Pandas - Colab
6 pages
Pandas
No ratings yet
Pandas
44 pages
UNIT IV Material
No ratings yet
UNIT IV Material
23 pages
11 B)
No ratings yet
11 B)
1 page
Maryknoll Convent School (Secondary Section) 2025-2026 Class Timetable
No ratings yet
Maryknoll Convent School (Secondary Section) 2025-2026 Class Timetable
1 page
GLORIA - Messe Du Cenacle PDF
100% (1)
GLORIA - Messe Du Cenacle PDF
5 pages
Hagan SocialNetworksGender 1998
No ratings yet
Hagan SocialNetworksGender 1998
14 pages
Faulhaber EN - 1219G - MIN
No ratings yet
Faulhaber EN - 1219G - MIN
1 page
Home Networking: A Seminar Report On
No ratings yet
Home Networking: A Seminar Report On
16 pages
Cold Vent Stack Foundation Design
0% (1)
Cold Vent Stack Foundation Design
13 pages
Goalkeeper Training Schedule
No ratings yet
Goalkeeper Training Schedule
35 pages
Emcee Script District Meet Contests
No ratings yet
Emcee Script District Meet Contests
12 pages
Standard Costing and Variance Analysis Guide
No ratings yet
Standard Costing and Variance Analysis Guide
5 pages
Fan Et Al 2023 Global Burden Risk Factor Analysis and Prediction Study of Ischemic Stroke 1990 2030
No ratings yet
Fan Et Al 2023 Global Burden Risk Factor Analysis and Prediction Study of Ischemic Stroke 1990 2030
14 pages
English 4 - Quarter 4 - Module 3 Fact Opinion
No ratings yet
English 4 - Quarter 4 - Module 3 Fact Opinion
14 pages
Aspiring Front-End Developer Resume
No ratings yet
Aspiring Front-End Developer Resume
1 page
Apollo Agricultural Tyres Overview
No ratings yet
Apollo Agricultural Tyres Overview
37 pages
Engine Upgrade Expanded Hold Starship: Command Vehicle
100% (3)
Engine Upgrade Expanded Hold Starship: Command Vehicle
9 pages
Rahul Deorao Moholkar - Petitioner Versus Mrs. Shama Rahul Moholkar - Respondent
No ratings yet
Rahul Deorao Moholkar - Petitioner Versus Mrs. Shama Rahul Moholkar - Respondent
5 pages
Sher Shah Suri Biography
No ratings yet
Sher Shah Suri Biography
2 pages
New Holland E385 Workshop Manual
100% (2)
New Holland E385 Workshop Manual
41 pages
What Is Capacity Building
No ratings yet
What Is Capacity Building
17 pages
CHN Midterms Notes
No ratings yet
CHN Midterms Notes
11 pages
Topic 14. Comparison
No ratings yet
Topic 14. Comparison
7 pages
Polavaram - Rep
No ratings yet
Polavaram - Rep
6 pages
Business Expansion Analysis Guide
100% (1)
Business Expansion Analysis Guide
3 pages
PD 11 - 12 Q1 0402 Effective Communication SW1
No ratings yet
PD 11 - 12 Q1 0402 Effective Communication SW1
5 pages
Word Dog, To, 2024: ENGLISH Typewriting
No ratings yet
Word Dog, To, 2024: ENGLISH Typewriting
4 pages
Chave A Galician Game: October 2010. Wernigerode (Germany)
No ratings yet
Chave A Galician Game: October 2010. Wernigerode (Germany)
11 pages
ICT BASICS-lesson3
No ratings yet
ICT BASICS-lesson3
23 pages
Application Form - CMP
No ratings yet
Application Form - CMP
3 pages
Midterm Correction Post Test Psych Nursing
No ratings yet
Midterm Correction Post Test Psych Nursing
25 pages
Unit II Methods To Initiate Ventures
No ratings yet
Unit II Methods To Initiate Ventures
28 pages
Dust of Snow: Q&A on Mood Change
No ratings yet
Dust of Snow: Q&A on Mood Change
3 pages