
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Get Unique Values from a Column in Python Pandas
There are several ways to extract unique values from a column in a data frame using Python Pandas, including unique() and nunique(). The panda's library in Python is mostly used for data analysis and manipulation to locate unique values in a data frame column.
Some common methods to get unique values from a column are as follows:
-
unique(): This method will return the unique values of a Series or DataFrame column as a NumPy array.
-
drop_duplicates(): It removes duplicate values from a DataFrame or Series
-
nunique(): This method returns the number of unique values in a Series or DataFrame column.
Using 'unique()' method
'unique()' will return a NumPy array containing the unique values, it performs efficiently for finding unique in a single column.
Import library
import pandas as pd;
Creating a dataframe
# Create a DataFrame data = { 'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'], 'Age': [26, 30, 35, 26, 40, 30] }
Example
In the below code, the unique() method will eliminate duplicates and show each unique entry once.
import pandas as pd # Create a DataFrame data = { 'Name': ['Robert', 'Naveen', 'Charlie', 'Robert', 'Kumar', 'Naveen'], 'Age': [26, 30, 35, 26, 40, 30] } df = pd.DataFrame(data) # Get unique values from 'Name' column unique_names = df['Name'].unique() print("Unique Names:", unique_names)
Output
Unique Names: ['Robert' 'Naveen' 'Charlie' 'Kumar']
Using the 'drop_duplicates()' Method
This method will return a data frame or series by removing duplicate values, drop_duplicates() is mostly used for removing duplicates from multiple columns or entire rows.
Syntax
In the below syntax, keep refers, if any duplicates exist to keep first(Keeps the first occurrence), last(Keeps the last occurrence), and false (Removes all duplicates).
DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)
Example
import pandas as pd # Create a DataFrame df = pd.DataFrame({ 'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'], 'Age': [26, 30, 35, 26, 40, 30] }) # Removing duplicates from the 'Name' column, keeping the first occurrence unique_df = df.drop_duplicates(subset='Name', keep='first') print(unique_df)
Following is the output ?
NAME | AGE | |
---|---|---|
0 | Robert | 26 |
1 | John | 30 |
2 | Charlie | 35 |
4 | Kumar | 40 |
5 | John | 30 |
Using 'nunique()' Method
The nunique() method counts the unique values in a series or DataFrame column and returns an integer for a Series or DataFrame. It performs efficiently for getting the count of unique entries.
Example
import pandas as pd # Create a DataFrame from the dictionary df = pd.DataFrame({ 'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'], 'Age': [26, 30, 35, 26, 40, 30] }) # Counting unique values from the 'Name' column unique_name_count = df['Name'].nunique() print(f"Num of Unique Names: {unique_name_count}") # Counting unique values for each column in the DataFrame unique_counts = df.nunique() print(unique_counts)
Output
Num of Unique Names: 5 Name 5 Age 4 dtype: int64