0% found this document useful (0 votes)
54 views23 pages

Data Filtering in Pandas

The document discusses various SQL operators and their Pandas equivalents for filtering data in a DataFrame. It provides examples of using single condition, multiple condition, IN, LIKE, IS NULL, and date filtering operators in SQL queries and equivalent Pandas code using the DataFrame and query methods. Key Pandas methods discussed include loc(), isin(), match(), isna(), notna(), and query().
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views23 pages

Data Filtering in Pandas

The document discusses various SQL operators and their Pandas equivalents for filtering data in a DataFrame. It provides examples of using single condition, multiple condition, IN, LIKE, IS NULL, and date filtering operators in SQL queries and equivalent Pandas code using the DataFrame and query methods. Key Pandas methods discussed include loc(), isin(), match(), isna(), notna(), and query().
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

with sql codes

yunusemrecevik
TABLE OF
CONTENTS
SINGLE CONDITION 2
MULTIPLE CONDITION 4
IN OPERATOR 6
LIKE OPERATOR 8
IS NULL OPERATOR 11
FILTER DATES 14
BASIC OPERATIONS 17
DATA
TABLE
ID NAME SURNAME COUNTRY AGE SALARY

1 ADAM SMITH USA 25 150000

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

5 ADAM WAYNE USA 21 225000

6 JENNIFER NICK CANADA 24 200000

7 WANDA RYAN USA 36 150000

8 JOHNY BRYAN FRANCE 19 100000

pandas

1
SINGLE CONDITION

SQL
SELECT * FROM TABLE WHERE ID > 5

ID NAME SURNAME COUNTRY AGE SALARY

6 JENNIFER NICK CANADA 24 200000

7 WANDA RYAN USA 36 150000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS
METHOD 1: DATAFRAME WAY

You can use all logical operators.

Pandas loc() method same as dataframe way for filtering.

METHOD 2: QUERY METHOD

2
SINGLE CONDITION (NOT)

SQL
SELECT * FROM TABLE WHERE NOT ID > 5

ID NAME SURNAME COUNTRY AGE SALARY

1 ADAM SMITH USA 25 150000

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

5 ADAM WAYNE USA 21 225000

PANDAS
METHOD 1: DATAFRAME WAY
bracket notation

METHOD 2: QUERY METHOD

( ~ ) is a unary operator that performs bitwise inversion

3
MULTIPLE CONDITION (AND)

SQL
SELECT * FROM TABLE WHERE ID > 5 AND COUNTRY = 'CANADA'

ID NAME SURNAME COUNTRY AGE SALARY

6 JENNIFER NICK CANADA 24 200000

PANDAS
METHOD 1: DATAFRAME WAY

dot notation make sure to use parentheses for multiple conditions

METHOD 2: QUERY METHOD

Should you use "dot notation" or "bracket notation" ?

Dot notation is faster to write and clearer to read. You can use whichever you like.
For more details:
https://www.dataschool.io/pandas-dot-notation-vs-brackets/

4
MULTIPLE CONDITION (AND & OR)

SQL
SELECT * FROM TABLE WHERE ID > 5 AND (COUNTRY = 'CANADA'
OR COUNTRY = 'USA')

ID NAME SURNAME COUNTRY AGE SALARY

6 JENNIFER NICK CANADA 24 200000

7 WANDA RYAN USA 36 150000

PANDAS
METHOD 1: DATAFRAME WAY

you can use " & " for and operator you can use " | " for or operator

METHOD 2: QUERY METHOD

you can use "and" or "&" for and operator you can use "or" or "|" for and operator

5
IN OPERATOR

SQL
SELECT * FROM TABLE WHERE COUNTRY IN ('GERMANY', 'FRANCE')

ID NAME SURNAME COUNTRY AGE SALARY

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS

METHOD 1: DATAFRAME WAY


isin() method helps for selecting multiple value.
you cant use "in" operator for dataframe way

METHOD 2: QUERY METHOD


you can use "in" operator for query() method

6
NOT IN OPERATOR

SQL
SELECT * FROM TABLE WHERE COUNTRY NOT IN ('USA', 'FRANCE')

ID NAME SURNAME COUNTRY AGE SALARY

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

6 JENNIFER NICK CANADA 24 200000

PANDAS
METHOD 1: DATAFRAME WAY

you can use tilde sign for "not" operator

METHOD 2: QUERY METHOD


you can use "not" or tilde sign for "not" operator

7
LIKE OPERATOR - startswith

SQL
SELECT * FROM TABLE WHERE SURNAME LIKE "WA%"

ID NAME SURNAME COUNTRY AGE SALARY

2 PAUL WAGNER FRANCE 35 250000

5 ADAM WAYNE USA 21 225000

PANDAS
METHOD 1: DATAFRAME WAY

METHOD 2: QUERY METHOD

LIKE REGEX
str.match() determine if each string matches a regular expression.
For more detail :
https://pandas.pydata.org/docs/reference/api/pandas.Series.str.match.html

8
LIKE OPERATOR - endswith

SQL
SELECT * FROM TABLE WHERE SURNAME LIKE "%YAN"

ID NAME SURNAME COUNTRY AGE SALARY

3 JACK RYAN GERMANY 32 200000

7 WANDA RYAN USA 36 150000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS
METHOD 1: DATAFRAME WAY

METHOD 2: QUERY METHOD

9
LIKE OPERATOR - contains

SQL
SELECT * FROM TABLE WHERE NAME LIKE "%OH%"

ID NAME SURNAME COUNTRY AGE SALARY

4 JOHN WICK CANADA 40 300000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY METHOD

MORE strıng methods

For more string metods visit https://pandas.pydata.org/docs/user_guide/text.html

10
IS NULL OPERATOR

ID NAME SURNAME COUNTRY AGE SALARY

1 ADAM SMITH NaN 25 150000

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

5 ADAM WAYNE NaN 21 225000

6 JENNIFER NICK CANADA 24 200000

7 WANDA RYAN NaN 36 150000

8 JOHNY BRYAN FRANCE 19 100000

USA records updated to NULL

SQL
SELECT * FROM TABLE WHERE COUNTRY IS NULL

ID NAME SURNAME COUNTRY AGE SALARY

1 ADAM SMITH NaN 25 150000

5 ADAM WAYNE NaN 21 225000

7 WANDA RYAN NaN 36 150000

11
IS NULL OPERATOR

PANDAS

METHOD 1: DATAFRAME WAY

you can use also isna() method

METHOD 2: QUERY METHOD

you can use also isna() method

12
IS NOT NULL OPERATOR

SQL
SELECT * FROM TABLE WHERE COUNTRY IS NOT NULL

ID NAME SURNAME COUNTRY AGE SALARY

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

6 JENNIFER NICK CANADA 24 200000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS
METHOD 1: DATAFRAME WAY
you can use also notna() method

METHOD 2: QUERY METHOD


you can use also notna() method

13
FILTER DATES

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

1 ADAM SMITH NaN 25 150000 2021-02-14

2 PAUL WAGNER FRANCE 35 250000 2005-03-17

3 JACK RYAN GERMANY 32 200000 2000-03-01

4 JOHN WICK CANADA 40 300000 1999-01-19

5 ADAM WAYNE NaN 21 225000 2011-04-24

6 JENNIFER NICK CANADA 24 200000 2008-11-10

7 WANDA RYAN NaN 36 150000 2015-11-30

8 JOHNY BRYAN FRANCE 19 100000 2019-04-19

MEMBER_DATE column added to data

PANDAS

14
FILTER STRING DATES

SQL
SELECT * FROM TABLE WHERE MEMBER_DATE > '2015-01-01'

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

1 ADAM SMITH NaN 25 150000 2021-02-14

7 WANDA RYAN NaN 36 150000 2015-11-30

8 JOHNY BRYAN FRANCE 19 100000 2019-04-19

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

15
FILTER DATATIME DATES
SQL
SELECT * FROM TABLE WHERE YEAR(MEMBER_DATE) < 2005

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

3 JACK RYAN GERMANY 32 200000 2000-03-01

4 JOHN WICK CANADA 40 300000 1999-01-19

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

16
BASIC OPERATIONS
SQL
SELECT * FROM TABLE WHERE SALARY / 20 > 12000

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

2 PAUL WAGNER FRANCE 35 250000 2005-03-17

4 JOHN WICK CANADA 40 300000 1999-01-19

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

17
BASIC OPERATIONS

SQL
SELECT * FROM TABLE WHERE NAME > "JO"

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

2 PAUL WAGNER FRANCE 35 250000 2005-03-17

4 JOHN WICK CANADA 40 300000 1999-01-19

7 WANDA RYAN NaN 36 150000 2015-11-30

8 JOHNY BRYAN FRANCE 19 100000 2019-04-19

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

18
BASIC OPERATIONS
SQL
SELECT * FROM TABLE WHERE SALARY * 2 > @VALUE

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

4 JOHN WICK CANADA 40 300000 1999-01-19

PANDAS
METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

19
BASIC OPERATIONS

SQL
SELECT * FROM TABLE WHERE CONCAT(NAME, ' ' ,SURNAME) = 'PAUL
WAGNER'

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

2 PAUL WAGNER FRANCE 35 250000 2005-03-17

PANDAS
METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

20
DOWNLOAD PANDAS CODES

https://github.com/egitimplus/helpers/blob/main/notebook/filter.ipynb

FOLLOW ME ON

@yunusemrecevik

@yunusemrecevik

@EGITIMPLUS

21

You might also like