对表格进行操作：转换表格数据类型，剔除指定元素，表格元素替换

最新推荐文章于 2022-10-30 15:08:42 发布

AI搬砖小能手

最新推荐文章于 2022-10-30 15:08:42 发布

阅读量370

点赞数

CC 4.0 BY-SA版权

文章标签： python 列表

本文链接：https://blog.csdn.net/weixin_42076509/article/details/106074572

本文解决在使用pandas的groupby方法时遇到的问题，当数据中包含非数值型字段，如带有“%”的湿度数据，直接进行mean()操作会引发错误。通过替换“%”符号并将数据转换为数值型，成功完成数据聚合。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

对表格使用groupby进行操作时，发现少了几个数据

#查看列表中有多少个不同的元素
from collections import *
weather = pd.read_csv('dataset/train_data/city_A/weather.csv', 
            header=None,
            names=['data', 'hour', 'temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather'])
weather=weather.dropna(axis=0,how='any') #drop all rows that have any NaN values

weather
wind_direction_dict = {'North': 0, 'Northwest': 1, 'East': 2, 'Northeast': 3, 'South': 4, 'Southeast': 5, 'Southwest': 6, 'West': 7, 'Quiet': 8}
wind_speed_dict = {'<12km/h': 0, '16-24km/h': 1, '24-34km/h': 2}
weather_dict = {'Sunny': 0, 'Cloudy': 1, 'Overcast': 2, 'Rain': 3, 'Fog': 4}
weather=weather.replace(wind_direction_dict) 
weather=weather.replace(wind_speed_dict) 
# weather=weather.replace({'%':''}, regex=True)
df=weather.replace(weather_dict)

# df = df.astype('int') #对原始数据进行转换并覆盖原始数据列
x1 = df['wind_force']
print(Counter(x1))
# print(df)
df = df.groupby(['data']).mean().reset_index() 
# df = df[['data','temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather']]
df

原表格：
在这里插入图片描述
新表格：

原因是有些数据不是数值型数据，不能进行mean()操作。既然类型不对，那就转类型呗。

#查看列表中有多少个不同的元素
from collections import *
weather = pd.read_csv('dataset/train_data/city_A/weather.csv', 
            header=None,
            names=['data', 'hour', 'temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather'])
weather=weather.dropna(axis=0,how='any') #drop all rows that have any NaN values

weather
wind_direction_dict = {'North': 0, 'Northwest': 1, 'East': 2, 'Northeast': 3, 'South': 4, 'Southeast': 5, 'Southwest': 6, 'West': 7, 'Quiet': 8}
wind_speed_dict = {'<12km/h': 0, '16-24km/h': 1, '24-34km/h': 2}
weather_dict = {'Sunny': 0, 'Cloudy': 1, 'Overcast': 2, 'Rain': 3, 'Fog': 4}
weather=weather.replace(wind_direction_dict) 
weather=weather.replace(wind_speed_dict) 
# weather=weather.replace({'%':''}, regex=True)
df=weather.replace(weather_dict)

df = df.astype('int') #对原始数据进行转换并覆盖原始数据列
x1 = df['wind_force']
print(Counter(x1))
# print(df)
df = df.groupby(['data']).mean().reset_index() 
# df = df[['data','temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather']]
df

发现报错：ValueError: invalid literal for int() with base 10: ‘76%’
即“%”不能转为int数据，那就把“%”去掉,把数值除以100，转换成小数

#查看列表中有多少个不同的元素
from collections import *
weather = pd.read_csv('dataset/train_data/city_A/weather.csv', 
            header=None,
            names=['data', 'hour', 'temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather'])
weather=weather.dropna(axis=0,how='any') #drop all rows that have any NaN values

weather
wind_direction_dict = {'North': 0, 'Northwest': 1, 'East': 2, 'Northeast': 3, 'South': 4, 'Southeast': 5, 'Southwest': 6, 'West': 7, 'Quiet': 8}
wind_speed_dict = {'<12km/h': 0, '16-24km/h': 1, '24-34km/h': 2}
weather_dict = {'Sunny': 0, 'Cloudy': 1, 'Overcast': 2, 'Rain': 3, 'Fog': 4}
weather=weather.replace(wind_direction_dict) 
weather=weather.replace(wind_speed_dict) 
weather=weather.replace({'%':''}, regex=True)
df=weather.replace(weather_dict)

df = df.astype('int') #对原始数据进行转换并覆盖原始数据列
x1 = df['wind_force']
print(Counter(x1))
# print(df)
df = df.groupby(['data']).mean().reset_index() 
df['humidity'] = df['humidity']/100
#df = df[['data','temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather']]
df