Open In App

How to Categorize Floating Values in Python Using Pandas Library

Last Updated : 05 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Categorizing floating values in a dataset is something that you will come across while working with numerical data, especially when you divide a continuous variable into distinct groups or bins.

In pandas, pd.cut() is a common method for categorizing floating-point values.

Python
import pandas as pd
data = {
    'Values': [0.1, 0.5, 1.2, 1.9, 2.4, 3.3, 4.7, 5.6]
}

df = pd.DataFrame(data)

# Define the bin edges
bins = [0, 1, 2, 3, 4, 5, 6]

# Categorize the 'Values' column into bins
df['Categories'] = pd.cut(df['Values'], bins)
print(df)

Output:

Capture

The pd.cut() function categorizes each value based on the range it falls into.

Categorizing floating values can help in:

  • Simplifying complex data for better analysis.
  • Grouping continuous data into discrete intervals or bins.
  • Preparing data for machine learning models that require categorical input.

Other than pd.cut() method, there are different ways to categorize floating values:

1. Using pd.qcut() for Quantile-Based Categorization

pd.qcut()divides data into quantiles, ensuring that each bin contains an equal number of data points. It is useful when you want to create bins based on the distribution of the data.

Python
import pandas as pd
data = {
    'Values': [0.1, 0.5, 1.2, 1.9, 2.4, 3.3, 4.7, 5.6]
}
df = pd.DataFrame(data)

# Categorize using pd.qcut (3 quantiles)
df['Category'] = pd.qcut(df['Values'], q=3)
print(df)

Output:

Capture

pd.qcut() function automatically calculates the bin edges based on the data distribution.

2. Creating Custom Categorization Logic with apply()

For more advanced categorization, you can use apply() with a custom function. This method gives you full control over how you categorize the data based on your own logic.

Python
import pandas as pd

data = {
    'Values': [0.1, 0.5, 1.2, 1.9, 2.4, 3.3, 4.7, 5.6]
}
df = pd.DataFrame(data)

# Custom categorization function
def categorize(value):
    if value < 1:
        return 'Low'
    elif value < 3:
        return 'Medium'
    elif value < 5:
        return 'High'
    else:
        return 'Very High'

# Apply the custom function to categorize the values
df['Category'] = df['Values'].apply(categorize)
print(df)

Output:

Capture

The categorization logic checks the value and assigns it to one of the categories: 'Low', 'Medium', 'High', or 'Very High'.

3. Labeling Data with Custom Categories

If you prefer more intuitive labels, you can use pd.cut() with custom labels. This makes the output more readable, especially when you want your categories to reflect a specific interpretation of the bins.

Python
import pandas as pd
data = {
    'Values': [0.1, 0.5, 1.2, 1.9, 2.4, 3.3, 4.7, 5.6]
}
df = pd.DataFrame(data)

# Define custom labels
labels = ['Very Low', 'Low', 'Medium', 'High']

# Categorize using pd.cut with custom labels
df['Category'] = pd.cut(df['Values'], bins=[0, 1, 2, 4, 6], labels=labels)
print(df)

Output:

Capture

We used custom labels such as 'Very Low', 'Low', 'Medium', and 'High' to describe the bins. pd.cut() function automatically assigns the appropriate label based on the bin the value falls into.


Next Article

Similar Reads