数据分组技术GroupBy

数据分组技术GroupBy
- 引入相关库
- 数据获取
- 数据分组

数据分组技术GroupBy

引入相关库

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

数据获取

df=pd.read_csv('../homework/city_weather.csv')
df

	date	city	temperature	wind
0	03/01/2016	BJ	8	5
1	17/01/2016	BJ	12	2
2	31/01/2016	BJ	19	2
3	14/02/2016	BJ	-3	3
4	28/02/2016	BJ	19	2
5	13/03/2016	BJ	5	3
6	27/03/2016	SH	-4	4
7	10/04/2016	SH	19	3
8	24/04/2016	SH	20	3
9	08/05/2016	SH	17	3
10	22/05/2016	SH	4	2
11	05/06/2016	SH	-10	4
12	19/06/2016	SH	0	5
13	03/07/2016	SH	-9	5
14	17/07/2016	GZ	10	2
15	31/07/2016	GZ	-1	5
16	14/08/2016	GZ	1	5
17	28/08/2016	GZ	25	4
18	11/09/2016	SZ	20	1
19	25/09/2016	SZ	-10	4

数据分组

对DataFrame里面的city这一column做groupby

g=df.groupby(df['city'])
g

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000236B5D8B348>

groups会返回一个结果，分为几个group，下面为四个group，并会列出从哪一行到哪一行

g.groups

{'BJ': Int64Index([0, 1, 2, 3, 4, 5], dtype='int64'),'GZ': Int64Index([14, 15, 16, 17], dtype='int64'),'SH': Int64Index([6, 7, 8, 9, 10, 11, 12, 13], dtype='int64'),'SZ': Int64Index([18, 19], dtype='int64')}

通过get_group这一方法，过滤city这一列是‘BJ’的数据生成一个新的DataFrame

g.get_group('BJ')

	date	city	temperature	wind
0	03/01/2016	BJ	8	5
1	17/01/2016	BJ	12	2
2	31/01/2016	BJ	19	2
3	14/02/2016	BJ	-3	3
4	28/02/2016	BJ	19	2
5	13/03/2016	BJ	5	3

求出对于北京来讲的两个columns——temperature和wind 求平均值

df_bj=g.get_group('BJ')
df_bj.mean()

temperature    10.000000
wind            2.833333
dtype: float64

通过g.mean生成一个新的DataFrame，对于city这一列，groupy之后的几个城市里面的temperature和wind来求平均值

g.mean()

	temperature	wind
city
BJ	10.000	2.833333
GZ	8.750	4.000000
SH	4.625	3.625000
SZ	5.000	2.500000

还可以求最大值

g.max()

	date	temperature	wind
city
BJ	31/01/2016	19	5
GZ	31/07/2016	25	5
SH	27/03/2016	20	5
SZ	25/09/2016	20	4

求最小值

g.min()

	date	temperature	wind
city
BJ	03/01/2016	-3	2
GZ	14/08/2016	-1	2
SH	03/07/2016	-10	2
SZ	11/09/2016	-10	1

对DataFrame求平均值返回一个Series

type(df_bj.mean())

pandas.core.series.Series

对Group求平均值返回一个DataFrame，对一个group里面的单个group求平均值返回一个Series，对group求平均值返回一个DataFarme，就是由group里面每个group算出来平均值后combine成一个DataFarme，过程如下图

type(g.mean())

pandas.core.frame.DataFrame

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000236B5D8B348>

把g转换成一个list

list(g)

[('BJ',date city  temperature  wind0  03/01/2016   BJ            8     51  17/01/2016   BJ           12     22  31/01/2016   BJ           19     23  14/02/2016   BJ           -3     34  28/02/2016   BJ           19     25  13/03/2016   BJ            5     3),('GZ',date city  temperature  wind14  17/07/2016   GZ           10     215  31/07/2016   GZ           -1     516  14/08/2016   GZ            1     517  28/08/2016   GZ           25     4),('SH',date city  temperature  wind6   27/03/2016   SH           -4     47   10/04/2016   SH           19     38   24/04/2016   SH           20     39   08/05/2016   SH           17     310  22/05/2016   SH            4     211  05/06/2016   SH          -10     412  19/06/2016   SH            0     513  03/07/2016   SH           -9     5),('SZ',date city  temperature  wind18  11/09/2016   SZ           20     119  25/09/2016   SZ          -10     4)]

把list变成一个字典，key就是group的名字，values就是一个DataFrame

dict(list(g))

{'BJ':          date city  temperature  wind0  03/01/2016   BJ            8     51  17/01/2016   BJ           12     22  31/01/2016   BJ           19     23  14/02/2016   BJ           -3     34  28/02/2016   BJ           19     25  13/03/2016   BJ            5     3,'GZ':           date city  temperature  wind14  17/07/2016   GZ           10     215  31/07/2016   GZ           -1     516  14/08/2016   GZ            1     517  28/08/2016   GZ           25     4,'SH':           date city  temperature  wind6   27/03/2016   SH           -4     47   10/04/2016   SH           19     38   24/04/2016   SH           20     39   08/05/2016   SH           17     310  22/05/2016   SH            4     211  05/06/2016   SH          -10     412  19/06/2016   SH            0     513  03/07/2016   SH           -9     5,'SZ':           date city  temperature  wind18  11/09/2016   SZ           20     119  25/09/2016   SZ          -10     4}

dict(list(g))['BJ']

	date	city	temperature	wind
0	03/01/2016	BJ	8	5
1	17/01/2016	BJ	12	2
2	31/01/2016	BJ	19	2
3	14/02/2016	BJ	-3	3
4	28/02/2016	BJ	19	2
5	13/03/2016	BJ	5	3

list(g)

[('BJ',date city  temperature  wind0  03/01/2016   BJ            8     51  17/01/2016   BJ           12     22  31/01/2016   BJ           19     23  14/02/2016   BJ           -3     34  28/02/2016   BJ           19     25  13/03/2016   BJ            5     3),('GZ',date city  temperature  wind14  17/07/2016   GZ           10     215  31/07/2016   GZ           -1     516  14/08/2016   GZ            1     517  28/08/2016   GZ           25     4),('SH',date city  temperature  wind6   27/03/2016   SH           -4     47   10/04/2016   SH           19     38   24/04/2016   SH           20     39   08/05/2016   SH           17     310  22/05/2016   SH            4     211  05/06/2016   SH          -10     412  19/06/2016   SH            0     513  03/07/2016   SH           -9     5),('SZ',date city  temperature  wind18  11/09/2016   SZ           20     119  25/09/2016   SZ          -10     4)]

通过for循环访问一个groupby对象的数据

for name,group_df in g:print(name)print(group_df)

BJdate city  temperature  wind
0  03/01/2016   BJ            8     5
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
3  14/02/2016   BJ           -3     3
4  28/02/2016   BJ           19     2
5  13/03/2016   BJ            5     3
GZdate city  temperature  wind
14  17/07/2016   GZ           10     2
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
17  28/08/2016   GZ           25     4
SHdate city  temperature  wind
6   27/03/2016   SH           -4     4
7   10/04/2016   SH           19     3
8   24/04/2016   SH           20     3
9   08/05/2016   SH           17     3
10  22/05/2016   SH            4     2
11  05/06/2016   SH          -10     4
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5
SZdate city  temperature  wind
18  11/09/2016   SZ           20     1
19  25/09/2016   SZ          -10     4

数据库里面的group操作

select * from table_1 group by column_1

数据分组技术GroupBy相关推荐

pandas数据处理实践四（时间序列date_range、数据分箱cut、分组技术GroupBy）
时间序列: 关键函数 pandas.date_range(start = None,end = None,periods = None,freq = None,tz = None,normalize ...
pandas数据分组聚合——groupby()、aggregate()、apply()、transform()和filter()方法详解
数据分组数据分组就是根据一个或多个键(可以是函数.数组或df列名)将数据分成若干组,然后对分组后的数据分别进行汇总计算,并将汇总计算后的结果进行合并,被用作汇总计算的函数称为聚合函数.数据分组的具体 ...
数据科学入门与实战：玩转pandas之七数据分箱技术，分组技术，聚合技术
首先导入相关包 import pandas as pd import numpy as np from pandas import Series,DataFrame #数据分箱技术Binning 数据 ...
python 数据分组后看每组多少个_【Python】分组统计GroupBy技术详解
摘要进行数据分析时,GroupBy分组统计是非常常用的操作,也是十分重要的操作之一.基本上大部分的数据分析都会用到该操作,本文将对Python的GroupBy分组统计操作进行讲解. 1.GroupB ...
pandas合并groupby_pandas数据聚合与分组运算——groupby方法
简介 pandas中一类非常重要的操作是数据聚合与分组运算.通过groupby方法能够实现对数据集的拆分.统计.转换等操作,这个过程一气呵成. 在本文中,你将学到: 选取特定列分组: 对分组进行迭代: ...
pandas 根据列名索引多列数据_Pandas 数据聚合与分组运算[groupby+apply]速查笔记
利用Pandas将数据进行分组,并将各组进行聚合或自定义函数处理. Pandas中Groupby分组与聚合过程导入模块 import pandas as pd 缩写 df表示Dataframe对象 ...
Atitit 数据存储的分组聚合 groupby的实现attilax总结
Atitit 数据存储的分组聚合 groupby的实现attilax总结 1. 聚合操作1 1.1. a.标量聚合流聚合1 1.2. b.哈希聚合2 1.3. 所有的最优计划的选择都是基于现有统计 ...
JavaScript随手笔记---数组中相同的元素进行分组（数据聚合） groupBy函数
文章目录前言一.数据聚合 1.groupBy()函数 2.sortData()函数二.ECS6箭头函数写法 1.sortClass()函数 2.运行结果三.按需聚合(结合实际使用) 1.gro ...
postgres两条结果集合并无法区分那个表的数据结果集_Hulu在OLAP场景下数据缓存技术实战...
点击hadoop123关注我哟知名的大数据中台技术分享基地,涉及大数据架构(hadoop/spark/flink等),数据平台(数据交换.数据服务.数据治理等)和数据产品(BI.AB测试平台)等,也 ...

数据分组技术GroupBy

数据分组技术GroupBy

数据分组技术GroupBy

引入相关库

数据获取

数据分组

数据分组技术GroupBy相关推荐

最新文章

热门文章