
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Drop Duplicate Rows in Pandas Series
The main advantage of using the pandas package is analysing the data for Data Science and Machine Learning applications. In the process of analysing the data, deleting duplicate values is a commonly used data cleaning task.
To remove duplicate values from a pandas series object, we can use the drop_duplicate() method. This method returns a series with deleted duplicate rows, and it won’t alter the original series object. Instead, it will return a new one.
By using the inplace parameter, we can update the changes into the original series object by setting “inplace=True”.
The other important parameter in the drop_duplicates() method is “Keep”. The default behavior of this parameter is “first” which means it drops the duplicate values except for the first occurrence. Also, we can change it to last and False occurrences.
Example 1
In this following example, we have created a pandas series with a list of strings and we assigned the index labels also by defining index parameters.
# import pandas package import pandas as pd # create pandas series with duplicate values series = pd.Series( ['John','Garyooo','John','Richard','Peter','Richard','Gary'], index=['East','West','North','South','East','West','North']) print(series) # delete duplicate values result = series.drop_duplicates() print('Output:',result)
Explanation
After creating the series object we applied the drop_duplicate() method without changing the default parameters.
The Pandas series is given below −
East John West Garyooo North John South Richard East Peter West Richard North Gary dtype: object
Output
The output is as follows −
East John West Garyooo South Richard East Peter North Gary dtype: object
The drop_duplicate method returns a new series object with deleted rows. Here the original series object does not affect by this method instead it returns a new series object.
Example 2
For the same example, we have changed the inplace parameter value from default False to True.
# import pandas package import pandas as pd # create pandas series with duplicate values series = pd.Series( ['John','Garyooo','John','Richard','Peter','Richard','Gary'], index=['East','West','North','South','East','West','North']) print(series) # delete duplicate values with inplace=True result = series.drop_duplicates(inplace=True) print('Output:
',result) print(series)
Explanation
By setting the True value to the inplace parameter, we can modify our original series object with deleted rows and the method returns None as its output.
The Pandas series is as follows −
East John West Garyooo North John South Richard East Peter West Richard North Gary dtype: object
Output
The output is given below −
Output: None East John West Garyooo South Richard East Peter North Gary dtype: object
By setting inplace=True, we have successfully updated the original series object with deleted rows. We can see the outputs in the above output block, and the value “None” is the output from the drop_duplicates() method.