对表格进行操作:转换表格数据类型,剔除指定元素,表格元素替换

本文解决在使用pandas的groupby方法时遇到的问题,当数据中包含非数值型字段,如带有“%”的湿度数据,直接进行mean()操作会引发错误。通过替换“%”符号并将数据转换为数值型,成功完成数据聚合。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

对表格使用groupby进行操作时,发现少了几个数据

#查看列表中有多少个不同的元素
from collections import *
weather = pd.read_csv('dataset/train_data/city_A/weather.csv', 
            header=None,
            names=['data', 'hour', 'temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather'])
weather=weather.dropna(axis=0,how='any') #drop all rows that have any NaN values

weather
wind_direction_dict = {'North': 0, 'Northwest': 1, 'East': 2, 'Northeast': 3, 'South': 4, 'Southeast': 5, 'Southwest': 6, 'West': 7, 'Quiet': 8}
wind_speed_dict = {'<12km/h': 0, '16-24km/h': 1, '24-34km/h': 2}
weather_dict = {'Sunny': 0, 'Cloudy': 1, 'Overcast': 2, 'Rain': 3, 'Fog': 4}
weather=weather.replace(wind_direction_dict) 
weather=weather.replace(wind_speed_dict) 
# weather=weather.replace({'%':''}, regex=True)
df=weather.replace(weather_dict)

# df = df.astype('int') #对原始数据进行转换并覆盖原始数据列
x1 = df['wind_force']
print(Counter(x1))
# print(df)
df = df.groupby(['data']).mean().reset_index() 
# df = df[['data','temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather']]
df

原表格:
在这里插入图片描述
新表格:
在这里插入图片描述
原因是有些数据不是数值型数据,不能进行mean()操作。既然类型不对,那就转类型呗。

#查看列表中有多少个不同的元素
from collections import *
weather = pd.read_csv('dataset/train_data/city_A/weather.csv', 
            header=None,
            names=['data', 'hour', 'temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather'])
weather=weather.dropna(axis=0,how='any') #drop all rows that have any NaN values

weather
wind_direction_dict = {'North': 0, 'Northwest': 1, 'East': 2, 'Northeast': 3, 'South': 4, 'Southeast': 5, 'Southwest': 6, 'West': 7, 'Quiet': 8}
wind_speed_dict = {'<12km/h': 0, '16-24km/h': 1, '24-34km/h': 2}
weather_dict = {'Sunny': 0, 'Cloudy': 1, 'Overcast': 2, 'Rain': 3, 'Fog': 4}
weather=weather.replace(wind_direction_dict) 
weather=weather.replace(wind_speed_dict) 
# weather=weather.replace({'%':''}, regex=True)
df=weather.replace(weather_dict)

df = df.astype('int') #对原始数据进行转换并覆盖原始数据列
x1 = df['wind_force']
print(Counter(x1))
# print(df)
df = df.groupby(['data']).mean().reset_index() 
# df = df[['data','temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather']]
df

发现报错:ValueError: invalid literal for int() with base 10: ‘76%’
即“%”不能转为int数据,那就把“%”去掉,把数值除以100,转换成小数

#查看列表中有多少个不同的元素
from collections import *
weather = pd.read_csv('dataset/train_data/city_A/weather.csv', 
            header=None,
            names=['data', 'hour', 'temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather'])
weather=weather.dropna(axis=0,how='any') #drop all rows that have any NaN values

weather
wind_direction_dict = {'North': 0, 'Northwest': 1, 'East': 2, 'Northeast': 3, 'South': 4, 'Southeast': 5, 'Southwest': 6, 'West': 7, 'Quiet': 8}
wind_speed_dict = {'<12km/h': 0, '16-24km/h': 1, '24-34km/h': 2}
weather_dict = {'Sunny': 0, 'Cloudy': 1, 'Overcast': 2, 'Rain': 3, 'Fog': 4}
weather=weather.replace(wind_direction_dict) 
weather=weather.replace(wind_speed_dict) 
weather=weather.replace({'%':''}, regex=True)
df=weather.replace(weather_dict)

df = df.astype('int') #对原始数据进行转换并覆盖原始数据列
x1 = df['wind_force']
print(Counter(x1))
# print(df)
df = df.groupby(['data']).mean().reset_index() 
df['humidity'] = df['humidity']/100
#df = df[['data','temperature', 'humidity','wind_direction', 'wind_speed', 'wind_force','weather']]
df


结果十分nice:
在这里插入图片描述
参考链接:
https://www.jb51.net/article/139630.htm
https://www.icode9.com/content-1-191446.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值