Get Unique Values from a Column in Python Pandas



There are several ways to extract unique values from a column in a data frame using Python Pandas, including unique() and nunique(). The panda's library in Python is mostly used for data analysis and manipulation to locate unique values in a data frame column.

Some common methods to get unique values from a column are as follows:

  • unique(): This method will return the unique values of a Series or DataFrame column as a NumPy array.

  • drop_duplicates(): It removes duplicate values from a DataFrame or Series

  • nunique(): This method returns the number of unique values in a Series or DataFrame column.

Using 'unique()' method

'unique()' will return a NumPy array containing the unique values, it performs efficiently for finding unique in a single column.

Import library

import pandas as pd;

Creating a dataframe

# Create a DataFrame
data = {
    'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'],
    'Age': [26, 30, 35, 26, 40, 30]
}

Example

In the below code, the unique() method will eliminate duplicates and show each unique entry once.

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Robert', 'Naveen', 'Charlie', 'Robert', 'Kumar', 'Naveen'],
    'Age': [26, 30, 35, 26, 40, 30]
}

df = pd.DataFrame(data)

# Get unique values from 'Name' column
unique_names = df['Name'].unique()

print("Unique Names:", unique_names)

Output

Unique Names: ['Robert' 'Naveen' 'Charlie' 'Kumar']

Using the 'drop_duplicates()' Method

This method will return a data frame or series by removing duplicate values, drop_duplicates() is mostly used for removing duplicates from multiple columns or entire rows.

Syntax

In the below syntax, keep refers, if any duplicates exist to keep first(Keeps the first occurrence), last(Keeps the last occurrence), and false (Removes all duplicates).

DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)

Example

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'],
    'Age': [26, 30, 35, 26, 40, 30]
})

# Removing duplicates from the 'Name' column, keeping the first occurrence
unique_df = df.drop_duplicates(subset='Name', keep='first')

print(unique_df)

Following is the output ?

NAME AGE
0 Robert 26
1 John 30
2 Charlie 35
4 Kumar 40
5 John 30

Using 'nunique()' Method

The nunique() method counts the unique values in a series or DataFrame column and returns an integer for a Series or DataFrame. It performs efficiently for getting the count of unique entries.

Example

import pandas as pd

# Create a DataFrame from the dictionary
df = pd.DataFrame({
    'Name': ['Robert', 'John', 'Charlie', 'Robert', 'Kumar', 'Naveen'],
    'Age': [26, 30, 35, 26, 40, 30]
})

# Counting unique values from the 'Name' column
unique_name_count = df['Name'].nunique()

print(f"Num of Unique Names: {unique_name_count}")

# Counting unique values for each column in the DataFrame
unique_counts = df.nunique()

print(unique_counts)

Output

Num of Unique Names: 5
Name    5
Age     4
dtype: int64
Updated on: 2024-09-23T14:00:32+05:30

14K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements