【python】详解pandas.DataFrame.resample根据时间聚合采样（一）

首先我们直接看官方的文档：

DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None,label=None, convention='start', kind=None, loffset=None,limit=None, base=0, on=None, level=None)

聚合的时间参数rule
参数如下：

rule : 表示目标转换的偏移字符串或对象，一般是时间参数，比如“M”，“A”，“Q”，“BM”，“BA”，“BQ”和“W”；
axis : int, optional, default 0
closed : {‘right’, ‘left’}；间隔的哪一侧是关闭的，对于除“M”，“A”，“Q”，“BM”，“BA”，“BQ”和“W”之外的所有频率偏移，默认值为“左”，其默认值均为“右”
label : {‘right’, ‘left’}；用于标记bins，间隔的哪一侧是关闭的，对于除“M”，“A”，“Q”，“BM”，“BA”，“BQ”和“W”之外的所有频率偏移，默认值为“左”，其默认值均为“右”
convention : {‘start’, ‘end’, ‘s’, ‘e’}：For PeriodIndex only, controls whether to use the start or end of rule
kind: {‘timestamp’, ‘period’}, optional；Pass ‘timestamp’ to convert the resulting index to a DateTimeIndex or ‘period’ to convert it to a PeriodIndex. By default the input representation is retained.
loffset : 调整重新采样的时间标签
on : 对于DataFrame，要使用的列而不是索引进行重新采样。列必须与日期时间相似的数据。

实例

1.1 创建时间索引，并查看按时间合并后进行求和计算

#首先创建一个包含9个一分钟时间戳的系列
index = pd.date_range('1/1/2000', periods=9, freq='T')  # 生成频率为每分钟的数据
series = pd.Series(range(9), index=index)series
Out[31]:
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64series.resample('3T')
# 3T就是rule，合并的方向默认是axis = 0；
Out[34]: DatetimeIndexResampler [freq=<3 * Minutes>, axis=0, closed=left, label=left, convention=start, base=0]series.resample('3T').sum()
Out[35]:
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64

1.2 close和left的左右


# 间隔的哪一侧是关闭的，对于除“M”，“A”，“Q”，“BM”，“BA”，“BQ”和“W”之外的所有频率偏移，默认值为“左”，其默认值均为“右”series.resample('W')
Out[37]: DatetimeIndexResampler [freq=<Week: weekday=6>, axis=0, closed=right, label=right, convention=start, base=0]
series.resample('T')
Out[38]: DatetimeIndexResampler [freq=<Minute>, axis=0, closed=left, label=left, convention=start, base=0]# 如果是T分钟频率，合并后的K线是从合并的时间开始计算，如果是2T，就从开始的第一根往后数2根；
series.resample('T').sum()
Out[40]:
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64# 对比可以看出是左侧关闭的
series.resample('2T').sum()
Out[42]:
2000-01-01 00:00:00     1
2000-01-01 00:02:00     5
2000-01-01 00:04:00     9
2000-01-01 00:06:00    13
2000-01-01 00:08:00     8
Freq: 2T, dtype: int64# 如果是W周频率，那么整个合并的结果归集到最后一天
series = pd.Series(range(9), index=index)
series.resample('W').sum()
Out[53]:
2000-01-02     0
2000-01-09     3
2000-01-16    12
2000-01-23    13
2000-01-30     8
Freq: W-SUN, dtype: int64series
Out[54]:
2000-01-01    0
2000-01-04    1
2000-01-07    2
2000-01-10    3
2000-01-13    4
2000-01-16    5
2000-01-19    6
2000-01-22    7
2000-01-25    8
Freq: 3D, dtype: int64

1.3 自定义label的方向,也就是数据归集的位置

index = pd.date_range('1/1/2000', periods=9, freq='T')  # 生成频率为每分钟的数据
series = pd.Series(range(9), index=index)
series.resample('3T', label='right').sum()Out[57]:
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3T, dtype: int64

1.4 自定义close的方向

index = pd.date_range('1/1/2000', periods=9, freq='T')  # 生成频率为每分钟的数据
series = pd.Series(range(9), index=index)series
Out[63]:
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64
# 被合并的数据的最右不显示，关闭箱间隔的右侧
series.resample('3T', closed='right').sum()
Out[64]:
1999-12-31 23:57:00     0
2000-01-01 00:00:00     6
2000-01-01 00:03:00    15
2000-01-01 00:06:00    15
Freq: 3T, dtype: int64

1.5 上采用成30s的数据，加上asfreq()

series.resample('30S').asfreq()
Out[66]:
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    1.0
2000-01-01 00:01:30    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    NaN
2000-01-01 00:03:00    3.0
2000-01-01 00:03:30    NaN
2000-01-01 00:04:00    4.0
2000-01-01 00:04:30    NaN
2000-01-01 00:05:00    5.0
2000-01-01 00:05:30    NaN
2000-01-01 00:06:00    6.0
2000-01-01 00:06:30    NaN
2000-01-01 00:07:00    7.0
2000-01-01 00:07:30    NaN
2000-01-01 00:08:00    8.0
Freq: 30S, dtype: float64

1.6 将系列上采样到30秒的箱中，并使用pad方法填充NaN值

series.resample('30S').pad()[0:5]
Out[67]:
2000-01-01 00:00:00    0
2000-01-01 00:00:30    0
2000-01-01 00:01:00    1
2000-01-01 00:01:30    1
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

1.7 将系列上上采样30秒的箱中，并使用bfill方法填充NaN值

series.resample('30S').bfill()[0:5]
Out[68]:
2000-01-01 00:00:00    0
2000-01-01 00:00:30    1
2000-01-01 00:01:00    1
2000-01-01 00:01:30    2
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

1.8 通过apply传递自定义功能

def custom_resampler(array_like):return np.sum(array_like)+5import numpy as npseries.resample('3T').apply(custom_resampler)
Out[74]:
2000-01-01 00:00:00     8
2000-01-01 00:03:00    17
2000-01-01 00:06:00    26
Freq: 3T, dtype: int64

1.9 对于具有PeriodIndex的Series，关键字约定可用于控制是否使用规则的开头或结尾。

s = pd.Series([1, 2], index=pd.period_range('2012-01-01',freq='A',periods=2))# 上采样需要asfreq( )
s.resample('M', convention='start').asfreq().head()
Out[76]:
2012-01    1.0
2012-02    NaN
2012-03    NaN
2012-04    NaN
2012-05    NaN
Freq: M, dtype: float64s.resample('M', convention='end').asfreq()
Out[78]:
2012-12    1.0
2013-01    NaN
2013-02    NaN
2013-03    NaN
2013-04    NaN
2013-05    NaN
2013-06    NaN
2013-07    NaN
2013-08    NaN
2013-09    NaN
2013-10    NaN
2013-11    NaN
2013-12    2.0
Freq: M, dtype: float64

1.10 对于DataFrame对象，关键字on可用于指定列而不是重新取样的索引

df = pd.DataFrame(data=9*[range(4)], columns=['a', 'b', 'c', 'd'])
df['time'] = pd.date_range('1/1/2000', periods=9, freq='T')
df.resample('3T', on='time').sum()Out[81]: a  b  c  d
time
2000-01-01 00:00:00  0  3  6  9
2000-01-01 00:03:00  0  3  6  9
2000-01-01 00:06:00  0  3  6  9

1.11 how，未来不会用的功能

index = pd.date_range('1/1/2000', periods=9, freq='T')  # 生成频率为每分钟的数据
series = pd.Series(range(9), index=index)series.resample('2T',how='last')
Out[83]:
2000-01-01 00:00:00    1
2000-01-01 00:02:00    3
2000-01-01 00:04:00    5
2000-01-01 00:06:00    7
2000-01-01 00:08:00    8
Freq: 2T, dtype: int64series.resample('2T',how='mean')
Out[84]:
2000-01-01 00:00:00    0.5
2000-01-01 00:02:00    2.5
2000-01-01 00:04:00    4.5
2000-01-01 00:06:00    6.5
2000-01-01 00:08:00    8.0
Freq: 2T, dtype: float64

【python】详解pandas.DataFrame.resample根据时间聚合采样（一）相关推荐

python怎么画参数函数图像_详解pandas.DataFrame.plot() 画图函数
首先看官网的DataFrame.plot( )函数 DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, share ...
两万文字详解Pandas DataFrame(持续更新)
文章目录 1. pandas.DataFrame 数据结构 1.1 创建DataFrame 1.1.1 高效初始化DataFrame 1.2 DataFrame的基本属性和常用方法 1.2.1 基本属 ...
python dataframe loc函数_详解pandas DataFrame的查询方法（loc,iloc,at,iat,ix的用法和区别）...
在操作DataFrame时,肯定会经常用到loc,iloc,at等函数,各个函数看起来差不多,但是还是有很多区别的,我们一起来看下吧. 首先,还是列出一个我们用的DataFrame,注意index一列 ...
python中的iloc函数_详解pandas中利用DataFrame对象的.loc[]、.iloc[]方法抽取数据
pandas的DataFrame对象,本质上是二维矩阵,跟常规二维矩阵的差别在于前者额外指定了每一行和每一列的名称.这样内部数据抽取既可以用"行列名称(对应.loc[]方法)",也 ...
python替换缺失值_详解Pandas 处理缺失值指令大全
前言运用pandas 库对所得到的数据进行数据清洗,复习一下相关的知识. 1 数据清洗 1.1 处理缺失数据对于数值型数据,分为缺失值(NAN)和非缺失值,对于缺失值的检测,可以通过Python中 ...
dataframe groupby_详解pandas中的map、apply、applymap、groupby、agg.
一.简介 pandas提供了很多方便简洁的方法,用于对单列.多列数据进行批量运算或分组聚合运算,熟悉这些方法后可极大地提升数据分析的效率,也会使得你的代码更加地优雅简洁. 本文就将针对pandas中的 ...
python read_excel header_详解pandas库pd.read_excel操作读取excel文件参数整理与实例
除了使用xlrd库或者xlwt库进行对excel表格的操作读与写,而且pandas库同样支持excel的操作:且pandas操作更加简介方便. 首先是pd.read_excel的参数:函数为: pd. ...
python read_excel 参数_详解pandas库pd.read_excel操作读取excel文件参数整理与实例
详解pandas库pd.read_excel操作读取excel文件参数整理与实例来源:中文源码网浏览: 次日期:2019年11月5日详解pandas库pd.read_excel操作 ...
【原创干货】6000字、22个案例详解Pandas数据分析/预处理时的实用技巧，超简单...
上两篇原创的文章,小编主要是讲了数据可视化方面的内容,但是好像看得人不是很多的样子(搞得小编心里拔凉拔凉的....) [硬核干货]4500字.10个案例分享几个Python可视化小技巧,助你绘制高质量 ...

【python】详解pandas.DataFrame.resample根据时间聚合采样（一）

首先我们直接看官方的文档：

实例

1.1 创建时间索引，并查看按时间合并后进行求和计算

1.2 close和left的左右

1.3 自定义label的方向,也就是数据归集的位置

1.4 自定义close的方向

1.5 上采用成30s的数据，加上asfreq()

1.6 将系列上采样到30秒的箱中，并使用pad方法填充NaN值

1.7 将系列上上采样30秒的箱中，并使用bfill方法填充NaN值

1.8 通过apply传递自定义功能

1.9 对于具有PeriodIndex的Series，关键字约定可用于控制是否使用规则的开头或结尾。

1.10 对于DataFrame对象，关键字on可用于指定列而不是重新取样的索引

1.11 how，未来不会用的功能

【python】详解pandas.DataFrame.resample根据时间聚合采样（一）相关推荐

最新文章

热门文章