【Pandas】pandas Index objects CategoricalIndex.as_ordered

Pandas2.2 Index objects

Categorical components

方法描述
CategoricalIndex.codes用于获取分类索引中每个元素对应的整数代码
CategoricalIndex.categories用于获取分类索引中的所有类别(categories)
CategoricalIndex.ordered用于指示分类索引中的类别是否具有有序关系
CategoricalIndex.rename_categories(*args, …)用于重命名分类索引中的类别
CategoricalIndex.reorder_categories(*args, …)用于重新排列分类索引中的类别顺序
CategoricalIndex.add_categories(*args, **kwargs)用于向分类索引中添加新的类别
CategoricalIndex.remove_categories(*args, …)用于从分类索引中移除指定的类别
CategoricalIndex.remove_unused_categories(…)用于移除分类索引中未使用的分类(categories)
CategoricalIndex.set_categories(*args, **kwargs)用于设置新的分类(categories)
CategoricalIndex.as_ordered(*args, **kwargs)用于将无序的分类索引转换为有序的分类索引

pandas.CategoricalIndex.as_ordered()

pandas.CategoricalIndex.as_ordered() 是 [CategoricalIndex] 对象的一个方法,用于将无序的分类索引转换为有序的分类索引

详细说明
  • 用途:将无序的 [CategoricalIndex] 转换为有序的 [CategoricalIndex]
  • 返回值:一个新的有序 [CategoricalIndex] 对象
示例代码及结果
示例 1: 基本用法 - 将无序分类转换为有序分类
import pandas as pd

# 创建一个无序的 CategoricalIndex
values = ['small', 'large', 'medium', 'small', 'large']
categories = ['small', 'medium', 'large']
cat_index = pd.CategoricalIndex(values, categories=categories, ordered=False)
print("原始 CategoricalIndex:")
print(cat_index)
print("是否有序:", cat_index.ordered)

# 转换为有序分类
ordered_cat_index = cat_index.as_ordered()
print("\n转换为有序分类后:")
print(ordered_cat_index)
print("是否有序:", ordered_cat_index.ordered)

输出结果:

原始 CategoricalIndex:
CategoricalIndex(['small', 'large', 'medium', 'small', 'large'], 
                 categories=['small', 'medium', 'large'], 
                 ordered=False, dtype='category')
是否有序: False

转换为有序分类后:
CategoricalIndex(['small', 'large', 'medium', 'small', 'large'], 
                 categories=['small', 'medium', 'large'], 
                 ordered=True, dtype='category')
是否有序: True
示例 2: 在 DataFrame 中使用
import pandas as pd

# 创建一个 DataFrame,使用无序的 CategoricalIndex 作为索引
data = {
    'sales': [100, 200, 150, 300],
    'profit': [20, 40, 30, 60]
}
values = ['North', 'South', 'East', 'West']
categories = ['North', 'South', 'East', 'West']
cat_index = pd.CategoricalIndex(values, categories=categories, ordered=False, name='region')
df = pd.DataFrame(data, index=cat_index)

print("原始 DataFrame:")
print(df)
print("索引是否有序:", df.index.ordered)

# 将索引转换为有序分类
df.index = df.index.as_ordered()
print("\n将索引转换为有序分类后:")
print(df)
print("索引是否有序:", df.index.ordered)

输出结果:

原始 DataFrame:
        sales  profit
region              
North     100      20
South     200      40
East      150      30
West      300      60
索引是否有序: False

将索引转换为有序分类后:
        sales  profit
region              
North     100      20
South     200      40
East      150      30
West      300      60
索引是否有序: True
示例 3: 对已有序的分类使用 as_ordered()
import pandas as pd

# 创建一个有序的 CategoricalIndex
values = ['A', 'C', 'B', 'A']
categories = ['A', 'B', 'C']
cat_index = pd.CategoricalIndex(values, categories=categories, ordered=True)
print("原始有序 CategoricalIndex:")
print(cat_index)
print("是否有序:", cat_index.ordered)

# 对已有序的分类使用 as_ordered()
new_cat_index = cat_index.as_ordered()
print("\n对已有序的分类使用 as_ordered():")
print(new_cat_index)
print("是否为同一对象:", cat_index is new_cat_index)
print("是否有序:", new_cat_index.ordered)

输出结果:

原始有序 CategoricalIndex:
CategoricalIndex(['A', 'C', 'B', 'A'], 
                 categories=['A', 'B', 'C'], 
                 ordered=True, dtype='category')
是否有序: True

对已有序的分类使用 as_ordered():
CategoricalIndex(['A', 'C', 'B', 'A'], 
                 categories=['A', 'B', 'C'], 
                 ordered=True, dtype='category')
是否为同一对象: True
是否有序: True
示例 4: 与 as_unordered() 方法的比较
import pandas as pd

# 创建一个无序的 CategoricalIndex
values = ['red', 'blue', 'green', 'red']
categories = ['red', 'blue', 'green']
cat_index = pd.CategoricalIndex(values, categories=categories, ordered=False)
print("原始无序 CategoricalIndex:")
print(cat_index)
print("是否有序:", cat_index.ordered)

# 使用 as_ordered() 转换为有序
ordered_cat_index = cat_index.as_ordered()
print("\n使用 as_ordered() 后:")
print(ordered_cat_index)
print("是否有序:", ordered_cat_index.ordered)

# 使用 as_unordered() 转换为无序
unordered_cat_index = ordered_cat_index.as_unordered()
print("\n使用 as_unordered() 后:")
print(unordered_cat_index)
print("是否有序:", unordered_cat_index.ordered)

输出结果:

原始无序 CategoricalIndex:
CategoricalIndex(['red', 'blue', 'green', 'red'], 
                 categories=['red', 'blue', 'green'], 
                 ordered=False, dtype='category')
是否有序: False

使用 as_ordered() 后:
CategoricalIndex(['red', 'blue', 'green', 'red'], 
                 categories=['red', 'blue', 'green'], 
                 ordered=True, dtype='category')
是否有序: True

使用 as_unordered() 后:
CategoricalIndex(['red', 'blue', 'green', 'red'], 
                 categories=['red', 'blue', 'green'], 
                 ordered=False, dtype='category')
是否有序: False
示例 5: 排序操作的差异
import pandas as pd

# 创建无序和有序的 CategoricalIndex
values = ['C', 'A', 'B', 'C']
categories = ['A', 'B', 'C']

unordered_cat = pd.CategoricalIndex(values, categories=categories, ordered=False)
ordered_cat = pd.CategoricalIndex(values, categories=categories, ordered=True)

print("无序 CategoricalIndex:")
print(unordered_cat)
print("排序结果:")
print(unordered_cat.sort_values())

print("\n有序 CategoricalIndex:")
print(ordered_cat)
print("排序结果:")
print(ordered_cat.sort_values())

输出结果:

无序 CategoricalIndex:
CategoricalIndex(['C', 'A', 'B', 'C'], 
                 categories=['A', 'B', 'C'], 
                 ordered=False, dtype='category')
排序结果:
CategoricalIndex(['A', 'B', 'C', 'C'], 
                 categories=['A', 'B', 'C'], 
                 ordered=False, dtype='category')

有序 CategoricalIndex:
CategoricalIndex(['C', 'A', 'B', 'C'], 
                 categories=['A', 'B', 'C'], 
                 ordered=True, dtype='category')
排序结果:
CategoricalIndex(['A', 'B', 'C', 'C'], 
                 categories=['A', 'B', 'C'], 
                 ordered=True, dtype='category')
应用场景
  1. 数据分析:在需要对分类数据进行排序或比较操作时,将无序分类转换为有序分类
  2. 数据可视化:在绘图时,有序分类可以按照指定顺序显示
  3. 统计分析:在进行统计分析时,有序分类可以提供更多信息
  4. 数据处理:在需要利用分类顺序进行数据处理时
  5. 机器学习:在某些机器学习算法中,有序分类可能有特殊处理方式
注意事项
  • 该方法只适用于 [CategoricalIndex],不适用于普通的 Index
  • 对于已经是有序的 [CategoricalIndex],使用该方法会返回原对象
  • 转换为有序分类后,可以进行大小比较操作
  • 分类的顺序按照 [categories] 参数中指定的顺序

通过 as_ordered() 方法,我们可以方便地将无序的分类索引转换为有序的分类索引,这在数据分析和处理中非常有用。

编码: ascii, 置信度: 1.00 Training until validation scores don't improve for 20 rounds [10] training's auc: 0.999999 valid_1's auc: 0.999999 [20] training's auc: 0.999999 valid_1's auc: 0.999999 Early stopping, best iteration is: [1] training's auc: 0.999999 valid_1's auc: 0.999999 Validation AUC: 1.0000 --------------------------------------------------------------------------- InvalidIndexError Traceback (most recent call last) Cell In[16], line 188 186 samples = prepare_samples(all_see, all_click, all_play) 187 model, features, auc_score = train_model(samples) --> 188 result = predict_new_data(model, features, 'testA_did_show.csv') Cell In[16], line 164, in predict_new_data(model, feature_columns, test_file) 161 user_click_rate = pd.read_csv('user_click_rate.csv', encoding='gbk').set_index('did')['user_click_rate'] 162 video_popularity = pd.read_csv('video_popularity.csv', encoding='gbk').set_index('vid')['video_popularity'] --> 164 test_data['user_click_rate'] = test_data['did'].map(user_click_rate).fillna(0).astype(np.float32) 165 test_data['video_popularity'] = test_data['vid'].map(video_popularity).fillna(0).astype(np.int32) 167 test_data[feature_columns] = test_data[feature_columns].fillna(0) File ~\ANA\Lib\site-packages\pandas\core\series.py:4544, in Series.map(self, arg, na_action) 4464 def map( 4465 self, 4466 arg: Callable | Mapping | Series, 4467 na_action: Literal["ignore"] | None = None, 4468 ) -> Series: 4469 """ 4470 Map values of Series according to an input mapping or function. 4471 (...) 4542 dtype: object 4543 """ -> 4544 new_values = self._map_values(arg, na_action=na_action) 4545 return self._constructor(new_values, index=self.index, copy=False).__finalize__( 4546 self, method="map" 4547 ) File ~\ANA\Lib\site-packages\pandas\core\base.py:919, in IndexOpsMixin._map_values(self, mapper, na_action, convert) 916 arr = self._values 918 if isinstance(arr, ExtensionArray): --> 919 return arr.map(mapper, na_action=na_action) 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File ~\ANA\Lib\site-packages\pandas\core\arrays\categorical.py:1530, in Categorical.map(self, mapper, na_action) 1526 na_action = "ignore" 1528 assert callable(mapper) or is_dict_like(mapper) -> 1530 new_categories = self.categories.map(mapper) 1532 has_nans = np.any(self._codes == -1) 1534 na_val = np.nan File ~\ANA\Lib\site-packages\pandas\core\indexes\base.py:6419, in Index.map(self, mapper, na_action) 6383 """ 6384 Map values using an input mapping or function. 6385 (...) 6415 Index(['A', 'B', 'C'], dtype='object') 6416 """ 6417 from pandas.core.indexes.multi import MultiIndex -> 6419 new_values = self._map_values(mapper, na_action=na_action) 6421 # we can return a MultiIndex 6422 if new_values.size and isinstance(new_values[0], tuple): File ~\ANA\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert) 918 if isinstance(arr, ExtensionArray): 919 return arr.map(mapper, na_action=na_action) --> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File ~\ANA\Lib\site-packages\pandas\core\algorithms.py:1803, in map_array(arr, mapper, na_action, convert) 1799 mapper = mapper[mapper.index.notna()] 1801 # Since values were input this means we came from either 1802 # a dict or a series and mapper should be an index -> 1803 indexer = mapper.index.get_indexer(arr) 1804 new_values = take_nd(mapper._values, indexer) 1806 return new_values File ~\ANA\Lib\site-packages\pandas\core\indexes\base.py:3875, in Index.get_indexer(self, target, method, limit, tolerance) 3872 self._check_indexing_method(method, limit, tolerance) 3874 if not self._index_as_unique: -> 3875 raise InvalidIndexError(self._requires_unique_msg) 3877 if len(target) == 0: 3878 return np.array([], dtype=np.intp) InvalidIndexError: Reindexing only valid with uniquely valued Index objects,请帮我定位并解决问题
07-09
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

liuweidong0802

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值