【Pandas】pandas Index objects CategoricalIndex.remove_categories

最新推荐文章于 2025-09-06 00:15:00 发布

liuweidong0802

最新推荐文章于 2025-09-06 00:15:00 发布

阅读量1.2k

点赞数 35

CC 4.0 BY-SA版权

分类专栏： Pandas Index objects 文章标签： pandas

本文链接：https://blog.csdn.net/weixin_39648905/article/details/151115037

Pandas Index objects 专栏收录该内容

100 篇文章

订阅专栏

Pandas2.2 Index objects

Categorical components

方法	描述
CategoricalIndex.codes	用于获取分类索引中每个元素对应的整数代码
CategoricalIndex.categories	用于获取分类索引中的所有类别（categories）
CategoricalIndex.ordered	用于指示分类索引中的类别是否具有有序关系
CategoricalIndex.rename_categories(*args, …)	用于重命名分类索引中的类别
CategoricalIndex.reorder_categories(*args, …)	用于重新排列分类索引中的类别顺序
CategoricalIndex.add_categories(args, *kwargs)	用于向分类索引中添加新的类别
CategoricalIndex.remove_categories(*args, …)	用于从分类索引中移除指定的类别

pandas.CategoricalIndex.remove_categories

pandas.CategoricalIndex.remove_categories 是 [CategoricalIndex] 对象的一个方法，用于从分类索引中移除指定的类别。这个方法允许我们从 [CategoricalIndex] 中删除不再需要的类别。

详细说明

用途：从 [CategoricalIndex] 中移除指定的类别
参数：
- removals：要移除的类别，可以是单个类别或类别列表
- inplace（可选，关键字参数）：是否就地修改，默认为 False
返回值：
- 返回一个新的 [CategoricalIndex] 对象
异常：
- 如果尝试移除不存在的类别，会抛出 ValueError 异常
- 如果尝试移除仍在使用的类别，会抛出 ValueError 异常

示例代码及结果

示例 1: 基本用法 - 移除单个未使用的类别

import pandas as pd

# 创建一个 CategoricalIndex
cat_index = pd.CategoricalIndex(['small', 'large', 'medium'], 
                               categories=['small', 'medium', 'large', 'extra_large'])
print("原始 CategoricalIndex:")
print(cat_index)
print("原始类别:")
print(cat_index.categories)

# 移除未使用的类别
new_cat_index = cat_index.remove_categories('extra_large')
print("\n移除 'extra_large' 类别后:")
print(new_cat_index)
print("新类别:")
print(new_cat_index.categories)

输出结果：

原始 CategoricalIndex:
CategoricalIndex(['small', 'large', 'medium'], categories=['small', 'medium', 'large', 'extra_large'], ordered=False, dtype='category')
原始类别:
Index(['small', 'medium', 'large', 'extra_large'], dtype='object')

移除 'extra_large' 类别后:
CategoricalIndex(['small', 'large', 'medium'], categories=['small', 'medium', 'large'], ordered=False, dtype='category')
新类别:
Index(['small', 'medium', 'large'], dtype='object')

示例 2: 移除多个未使用的类别

import pandas as pd

# 创建一个 CategoricalIndex
cat_index = pd.CategoricalIndex(['red', 'blue'], 
                               categories=['red', 'blue', 'green', 'yellow', 'purple'])
print("原始 CategoricalIndex:")
print(cat_index)
print("所有类别:")
print(cat_index.categories)

# 移除多个未使用的类别
new_cat_index = cat_index.remove_categories(['green', 'yellow', 'purple'])
print("\n移除多个类别后:")
print(new_cat_index)
print("剩余类别:")
print(new_cat_index.categories)

输出结果：

原始 CategoricalIndex:
CategoricalIndex(['red', 'blue'], categories=['red', 'blue', 'green', 'yellow', 'purple'], ordered=False, dtype='category')
所有类别:
Index(['red', 'blue', 'green', 'yellow', 'purple'], dtype='object')

移除多个类别后:
CategoricalIndex(['red', 'blue'], categories=['red', 'blue'], ordered=False, dtype='category')
剩余类别:
Index(['red', 'blue'], dtype='object')

示例 3: 错误处理 - 移除不存在的类别

import pandas as pd

# 创建一个 CategoricalIndex
cat_index = pd.CategoricalIndex(['low', 'medium'], 
                               categories=['low', 'medium', 'high'])
print("原始 CategoricalIndex:")
print(cat_index)

# 尝试移除不存在的类别（会抛出异常）
try:
    new_cat_index = cat_index.remove_categories('extra_high')  # 'extra_high' 不存在
    print(new_cat_index)
except ValueError as e:
    print(f"\n错误: {e}")

# 正确的做法：移除存在的未使用类别
new_cat_index = cat_index.remove_categories('high')
print("\n正确移除类别后的结果:")
print(new_cat_index)
print("剩余类别:")
print(new_cat_index.categories)

输出结果：

原始 CategoricalIndex:
CategoricalIndex(['low', 'medium'], categories=['low', 'medium', 'high'], ordered=False, dtype='category')

错误: removals must all be in old categories

正确移除类别后的结果:
CategoricalIndex(['low', 'medium'], categories=['low', 'medium'], ordered=False, dtype='category')
剩余类别:
Index(['low', 'medium'], dtype='object')

示例 4: 错误处理 - 移除正在使用的类别

import pandas as pd

# 创建一个 CategoricalIndex
cat_index = pd.CategoricalIndex(['small', 'large', 'medium'], 
                               categories=['small', 'medium', 'large'])
print("原始 CategoricalIndex:")
print(cat_index)
print("使用的类别:", cat_index.categories)

# 尝试移除正在使用的类别（会抛出异常）
try:
    new_cat_index = cat_index.remove_categories('small')  # 'small' 正在使用
    print(new_cat_index)
except ValueError as e:
    print(f"\n错误: {e}")

# 正确的做法：只能移除未使用的类别
# 先创建一个包含未使用类别的 CategoricalIndex
cat_index_with_unused = pd.CategoricalIndex(['small', 'large'], 
                                           categories=['small', 'medium', 'large'])
print("\n包含未使用类别的 CategoricalIndex:")
print(cat_index_with_unused)
print("所有类别:", cat_index_with_unused.categories)
print("未使用的类别: 'medium'")

# 移除未使用的类别
new_cat_index = cat_index_with_unused.remove_categories('medium')
print("\n移除未使用类别后的结果:")
print(new_cat_index)
print("剩余类别:")
print(new_cat_index.categories)

输出结果：

原始 CategoricalIndex:
CategoricalIndex(['small', 'large', 'medium'], categories=['small', 'medium', 'large'], ordered=False, dtype='category')
使用的类别: Index(['small', 'medium', 'large'], dtype='object')

错误: cannot remove categories that are present in the data

包含未使用类别的 CategoricalIndex:
CategoricalIndex(['small', 'large'], categories=['small', 'medium', 'large'], ordered=False, dtype='category')
所有类别: Index(['small', 'medium', 'large'], dtype='object')
未使用的类别: 'medium'

移除未使用类别后的结果:
CategoricalIndex(['small', 'large'], categories=['small', 'large'], ordered=False, dtype='category')
剩余类别:
Index(['small', 'large'], dtype='object')

示例 5: 在数据分析中的应用

import pandas as pd

# 创建一个表示产品等级的 CategoricalIndex，包含一些未使用的类别
product_ratings = pd.CategoricalIndex(['good', 'excellent', 'good', 'excellent'], 
                                     categories=['poor', 'fair', 'good', 'very_good', 'excellent', 'outstanding'])
print("原始产品等级:")
print(product_ratings)
print("所有类别:")
print(product_ratings.categories)
print("未使用的类别:", set(product_ratings.categories) - set(product_ratings))

# 移除未使用的类别以优化内存和提高效率
optimized_ratings = product_ratings.remove_categories(['poor', 'fair', 'very_good', 'outstanding'])
print("\n优化后的产品等级:")
print(optimized_ratings)
print("剩余类别:")
print(optimized_ratings.categories)

输出结果：

原始产品等级:
CategoricalIndex(['good', 'excellent', 'good', 'excellent'], categories=['poor', 'fair', 'good', 'very_good', 'excellent', 'outstanding'], ordered=False, dtype='category')
所有类别:
Index(['poor', 'fair', 'good', 'very_good', 'excellent', 'outstanding'], dtype='object')
未使用的类别: {'very_good', 'outstanding', 'poor', 'fair'}

优化后的产品等级:
CategoricalIndex(['good', 'excellent', 'good', 'excellent'], categories=['good', 'excellent'], ordered=False, dtype='category')
剩余类别:
Index(['good', 'excellent'], dtype='object')

示例 6: 与 pandas.concat 的配合使用

import pandas as pd

# 创建两个 CategoricalIndex，其中一个有额外的类别
cat_index1 = pd.CategoricalIndex(['A', 'B'], categories=['A', 'B', 'C'])
cat_index2 = pd.CategoricalIndex(['A', 'B'], categories=['A', 'B'])

print("第一个 CategoricalIndex:")
print(cat_index1)
print("未使用的类别:", set(cat_index1.categories) - set(cat_index1))

print("\n第二个 CategoricalIndex:")
print(cat_index2)

# 移除第一个索引中的未使用类别
cleaned_cat_index1 = cat_index1.remove_categories('C')
print("\n清理后的第一个索引:")
print(cleaned_cat_index1)

# 现在可以更容易地进行比较或合并操作
print("\n两个索引的类别是否相同:", cleaned_cat_index1.categories.equals(cat_index2.categories))

输出结果：

第一个 CategoricalIndex:
CategoricalIndex(['A', 'B'], categories=['A', 'B', 'C'], ordered=False, dtype='category')
未使用的类别: {'C'}

第二个 CategoricalIndex:
CategoricalIndex(['A', 'B'], categories=['A', 'B'], ordered=False, dtype='category')

清理后的第一个索引:
CategoricalIndex(['A', 'B'], categories=['A', 'B'], ordered=False, dtype='category')

两个索引的类别是否相同: True