Python pandas 核心数据结构

一、用字典表创建Series数据一维

d = {'a': 0, 'b': 1, 'd':3}
s = pd.Series(d, index=list('abcd'))

特性：
可以进行索引

print(s[0])
print(s[:2])
print(s[1:3])

-0.887761065211812
0   -0.887761
1    0.904833
dtype: float64
1    0.904833
2   -0.525255
dtype: float64

标签对齐

s1 = pd.Series(np.random.randn(3), index=['a', 'c', 'e'])
s2 = pd.Series(np.random.randn(3), index=['a', 'd', 'e'])
print('{0}\n\n{1}'.format(s1,s2))

a    0.737636
c    1.285566
e   -0.011916
dtype: float64a   -1.239792
d    0.393916
e    1.061057

将s1+s2，a与a相加，e与e相加，但没有相同的c,d，就返回NAN

a   -1.515469
c         NaN
d         NaN
e   -1.398320

二、DataFrame可以想象成一个字典，每一行或者每一列都是一个Series二维

d = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

修改行索引

df = pd.DataFrame(d, index=['a', 'd', 'c'])

修改列索引

df = pd.DataFrame(d, columns=['two', 'four'])

由列表构成的结构数据
个数必须一样

d = {'one': [1, 2, 3, 4],'two': [21, 22, 23, 24]}
df = pd.DataFrame(d)

one  two
0    1   21
1    2   22
2    3   23
3    4   24

由元组构成的结构数据

d = [(1, 2.2, 'Hello'), (2, 3., "world")]
df = pd.DataFrame(d)

  0    1      2
0  1  2.2  Hello
1  2  3.0  world

创建复杂的数据结构

d = {('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 10}}
df = pd.DataFrame(d)
print(df)

 a         bb  a  c   a
A B  1  4  5  10C  2  3  6   7

三、

df = pd.DataFrame(pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four']))

将第一列替换成，第二列与第四列相加的值

df['one']=df['two']+df['four']

删除一列

del df['three']
print(df)

  one       two      four
0  1.051110  1.415449 -0.364339
1  1.344168  0.992644  0.351524
2 -1.459153 -0.801384 -0.657768
3  1.092571  1.420621 -0.328050
4  1.119014  0.678703  0.440311
5 -3.670993 -1.872445 -1.798548

增加一列，大于0.2

df['flag'] = df['one'] > 0.2

   one       two     three      four   flag
0 -1.501718  0.178474  2.304388 -1.680193  False
1  1.554432  1.467871 -1.819724  0.086561   True
2 -1.961002  0.095922 -1.456745 -2.056924  False
3 -0.314674  0.066831 -1.394350 -0.381506  False
4 -0.375624 -0.669872 -1.059675  0.294248  False
5  0.528540  0.625411 -1.994278 -0.096871   True

df['five'] =5

    one       two     three      four  five
0 -1.692373 -0.793184 -1.022882 -0.899188     5
1 -2.281554 -0.200473  1.978368 -2.081081     5
2  2.077127  0.431241  0.361226  1.645886     5
3  1.991769  1.382926 -0.584086  0.608842     5
4  0.446861  0.076911 -1.591722  0.369950     5
5  1.833202  0.089428 -0.883797  1.743774     5

使用pop的方法删除

df.pop('four')

  one       two     three
0  2.895448  1.925215 -1.078841
1 -0.204535 -1.123640  0.509472
2  0.542504  0.501191 -0.399165
3 -3.501155 -1.851473  0.259770
4 -0.622400 -0.326245  1.275971
5  2.400736 -0.090141 -0.078183

在一列的位置插入一列，就是第二列

df.insert(1, 'bar', df['one']+df['two'])
print(df)

   one       bar       two     three      four
0 -0.101852  0.333351  0.435203  2.111786  0.643502
1 -0.123245  0.047569  0.170814 -1.519405 -1.135779
2 -1.913974 -2.248223 -0.334249  0.912003 -0.364429
3 -0.852416 -0.821420  0.030996  0.124609 -0.096879
4  1.445873  2.279212  0.833339 -0.606428  1.170697
5  1.511618  4.677563  3.165944  0.486133 -0.490859

用assign增加新的一列，它是复制一个副本，原本的df没有改变

print(df.assign(Ratio= df['one']/df['two']))

   one       two     three      four     Ratio
0  1.674716  0.350798  0.561203  0.177807  4.774023
1  0.309234  0.509702  0.394534 -1.411585  0.606696
2  0.534071 -0.457463  0.224305 -1.268856 -1.167464
3  0.442194  0.816737 -0.315137  0.772761  0.541415
4  1.477292 -0.456929 -1.425932  0.172979 -3.233087
5  0.362322 -1.688482  0.160745  2.358249 -0.214584

用函数进行计算

print(df.assign(Ratio = lambda x: x.one-x.two))

三、panel 三维数据结构

d = {'Item1': pd.DataFrame(np.random.randn(4, 3)),'Item2': pd.DataFrame(np.random.randn(4, 2))}
pn = pd.Panel(d)
print(pn['Item2'])

Python pandas 核心数据结构相关推荐

Python：23种Pandas核心操作
Pandas 是一个 Python 软件库,它提供了大量能使我们快速便捷地处理数据的函数和方法.一般而言,Pandas 是使 Python 成为强大而高效的数据分析环境的重要因素之一.在本文中,作者从 ...
python怎么分析数据结构_《利用Python进行数据分析》第五章-pandas的数据结构介绍...
pandas的数据结构介绍要使用pandas,你首先就得熟悉它的两个主要数据结构:Series和DataFrame.虽然它们并不能解决所有问题,但它们为大多数应用提供了一种可靠的.易于使用的基础. ...
Python——pandas模块—Series数据结构
Python--pandas模块-Series数据结构 Python--pandas模块-Series数据结构 pandas Series 创建Series 没有指定索引列时,自动创建:0~~(N-1 ...
Python Pandas 常用的数据结构有哪些？详解Series、DataFrame、Index数据结构。
Pandas常用数据结构 Pandas简介 Series 构建 Series 对象通过数组/列表通过ndarray 通过dict 指定index Series 数据结构查看 Series 数据 ...
【学习笔记】 Python - Pandas
Pandas 一.Pandas简介 Pandas是数据分析三剑客之一(Pandas.Matplotlib.Numpy),是Python核心数据分析库,提供了快速.灵活.明确的数据结构,能够简单.直观. ...
你可能不知道的10个Python Pandas的技巧和特性
Pandas是一个基础库,用于分析.数据处理和数据科学.它是一个庞大的项目,有大量的选择和奥秘. 本教程将以Buzzfeed清单体介绍一些使用较少但惯用的Pandas功能,这些功能为你的代码提供更好的 ...
图解数据分析(13) | Pandas - 核心操作函数大全（数据科学家入门·完结）
作者:韩信子@ShowMeAI 教程地址:https://www.showmeai.tech/tutorials/33 本文地址:https://www.showmeai.tech/article-d ...
python -pandas学习笔记
认识Pandas Pandas是数据分析三剑客之一,是python的核心数据分析库 Pandas能够处理的数据类型 sql或者excel类似的数据有序或无序的时间序列序列数据带行或标签的矩阵数据 ...
python——pandas
pandas概述 pandas是python第三方库,提供高性能易用数据类型和分析工具 pandas基于numpy实现,常与numpy和matplotlib一同使用 pandas中有两大核心数据结构: ...

Python pandas 核心数据结构

Python pandas 核心数据结构相关推荐

最新文章

热门文章