【Python】数据分析.pandas.序列（Series）

原创已于 2022-06-09 16:56:31 修改 · 1.6k 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#python #数据分析 #pandas

于 2020-05-22 21:48:08 首次发布

Python数据分析专栏收录该内容

38 篇文章

订阅专栏

文章目录

数据分析.pandas.数据结构介绍

数据分析.pandas.数据结构介绍

pandas介绍

pandas是建立在numpy基础上的高级数据分析处理库，也是Python的重要数据分析处理库。pandas提供了众多的高级函数，极大地简化了数据分析处理的流程。主要包括：

带有标签的数据结构，主要包括序列（series）和数据框（dataframe）等
允许简单索引和多级索引
整合了对数据集的集合和转换功能
生成特定类型的数据
支持从Excel、CSV等文本格式中导入数据，以pytables/HDF5格式高效地读写数据
能够高效的处理带有默认值的数据集
能够进行常规的统计回归分析

序列（Series）

序列是可以容纳证书、浮点数、字符串等数据类型的一位带标签的数组（标签也被称为是索引）。

一、序列的创建

import string
import pandas as pd

#创建序列
t1 = pd.Series([15,46,19,24,33])
print(t1)
#索引   数据
# 0    15
# 1    46
# 2    19
# 3    24
# 4    33
# dtype: int64

# t = pd.Series([15,46,19,24,33],index=["a","b","c","d","e"])
t2 = pd.Series([15,46,19,24,33],index=list("abcde"))
print(t2)
# a    15
# b    46
# c    19
# d    24
# e    33
# dtype: int64

#以字典形式创建（键-值）
temp_dict = {"name":"小明","age":20,"tel":10086}
t3 = pd.Series(temp_dict)
print(t3)
# name       小明
# age        20
# tel     10086
# dtype: object

t4 = {string.ascii_uppercase[i]:i for i in range(5)}
print(t4)
# {'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4}
print(pd.Series(t4))
# A    0
# B    1
# C    2
# D    3
# E    4
# dtype: int64

返回顶部

二、序列重新指定索引

t4 = {string.ascii_uppercase[i]:i for i in range(5)}
print(t4)
# {'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4}
print(pd.Series(t4))
# A    0
# B    1
# C    2
# D    3
# E    4
# dtype: int64
#当重新指定索引时，新的索引不能与旧索引对应的数据为nan
t5 = pd.Series(t4,index=list(string.ascii_uppercase[3:8]))
print(t5)
# D    3.0
# E    4.0
# F    NaN
# G    NaN
# H    NaN
# dtype: float64

当重新指定索引时，新的索引不能与旧索引对应的数据为nan。
t5 = pd.Series(t4,index=list(string.ascii_uppercase[3:8]))
t5是在t4的基础上重新指定索引，[3:8]指按照字典创建方式索引从第4个开始第7个结束，所以就是从D到H，但是原来索引只到E，对于新的索引没有值与之对应，就为nan。

返回顶部

三、序列的切片、索引操作

#创建序列
t1 = pd.Series([15,46,19,24,33])
print(t1)
#索引   数据
# 0    15
# 1    46
# 2    19
# 3    24
# 4    33
# dtype: int64
t3 = pd.Series(temp_dict)
print(t3)
# name       小明
# age        20
# tel     10086
# dtype: object
print(pd.Series(t4))
# A    0
# B    1
# C    2
# D    3
# E    4
# dtype: int64
print(t5)
# D    3.0
# E    4.0
# F    NaN
# G    NaN
# H    NaN
# dtype: float64

#切片、索引操作
print(t4["A"])
# 0
print(t3[["name","age"]])
# name    小明
# age     20
# dtype: object
print(t3[1])
# 20
print(t5[["D","E","F"]])
# D    3.0
# E    4.0
# F    NaN
# dtype: float64
print(t1[t1>20])
# 1    46
# 3    24
# 4    33
# dtype: int64

返回顶部

四、序列的拆解（键、值）

Series本质上是由两个数组构成的，一个数组是构成对象的键（index，索引），一个数组是构成对象的值（values），键与值配对出现。

#创建序列
t1 = pd.Series([15,46,19,24,33])
print(t1)
#索引   数据
# 0    15
# 1    46
# 2    19
# 3    24
# 4    33
# dtype: int64

#取出序列的索引和值
print(t1.index)
# RangeIndex(start=0, stop=5, step=1)
print(t1.values)
# [15 46 19 24 33]

返回顶部

五、序列的算数运算

数列和数组的一个重要不同点是：不同的序列进行运算时，相同位置会自动对齐，并进行运算。当两个序列进行运算时，如果不是公共的索引，则在运算结果中自动标记为nan。

#序列的运算
s = pd.Series(np.arange(5),index=['a','b','c','d','e'])
print(s)
# a    0
# # b    1
# # c    2
# # d    3
# # e    4
# # dtype: int32
S = s[1:]+s[3:]
print(S)
# b    NaN
# c    NaN
# d    6.0
# e    8.0
# dtype: float64