hive窗口函数及示例
窗口函数
其实只有over()才是真正的窗口函数,只是over常于其他函数组合使用,采用强大的效果。
一、over()的语法
分析函数 over(partition by 列名 order by 列名 rows between 开始位置 and 结束位置)
1.1、窗口控制,既window子句
over(partition by col order by 排序字段 rows between 1 preceding and 1 fllowing)
二、常用窗口函数
- sum(col) over() : 分组对col累计求和
- count(col) over() : 分组对col累计
- min(col) over() : 分组对col求最小值
- max(col) over() : 分组求col的最大值
- avg(col) over() : 分组求col列的平均值
- first_value(col) over() : 某分组排序后的第一个col值
- last_value(col) over() : 某分组排序后的最后一个col值
- lag(col,n,DEFAULT) : 统计往前n行的col值,n可选,默认为1,DEFAULT当往上第n行为NULL时候,取默认值,如不指定,则为NULL
- lead(col,n,DEFAULT) : 统计往后n行的col值,n可选,默认为1,DEFAULT当往下第n行为NULL时候,取默认值,如不指定,则为NULL
- ntile(n) : 用于将分组数据按照顺序切分成n片,返回当前切片值。注意:n必须为int类型
- row_number() over() : 排名函数,不会重复,适合于生成主键或者不并列排名
- rank() over() : 排名函数,有并列名次,名次不连续。如:1,1,3
- dense_rank() over() : 排名函数,有并列名次,名次连续。如:1,1,2
测试数据
create table shop_data(
shop_id int,
stat_date string,
ordamt double
)
row format delimited
fields terminated by '\t'
lines terminated by '\n'
stored as textfile;-- 插入数据
insert into shop_data values
(10026,'201901230030',5170),
(10026,'201901230100',5669),
(10026,'201901230130',2396),
(10026,'201901230200',1498),
(10026,'201901230230',1997),
(10026,'201901230300',1188),
(10026,'201901230330',598),
(10026,'201901230400',479),
(10026,'201901230430',1587),
(10026,'201901230530',799),
(10027,'201901230030',2170),
(10027,'201901230100',1623),
(10027,'201901230130',3397),
(10027,'201901230200',1434),
(10027,'201901230230',1001),
(10028,'201901230300',1687),
(10028,'201901230330',1298),
(10028,'201901230400',149),
(10029,'201901230430',2587),
(10029,'201901230530',589);
三、案例
count
select shop_id,stat_date,ordamt,
-- 以符合条件的所有行作为窗口
count(shop_id) over() as count1,-- 以按shop_id分组的所有行作为窗口
count(shop_id) over(partition by shop_id) as count2,-- 以按shop_id分组、按stat_date排序的所有行作为窗口
count(shop_id) over(partition by shop_id order by stat_date) as count3,-- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口
count(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as count4,-- 以按shop_id分组、按stat_date排序、按从起点到末尾,默认从起点到末尾和count2结果相同
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as count5,-- 以按shop_id分组、按stat_date排序、按从起点到当前行的前一行
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as count6,-- 以按shop_id分组、按stat_date排序、按从起点到当前行
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as count7,-- 以按shop_id分组、按stat_date排序、按从当前行到末尾
count(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as count8,-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到末尾
count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as count9,-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行
count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as count10
from shop_data;
--查询结果如下
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
| shop_id | stat_date | ordamt | count1 | count2 | count3 | count4 | count5 | count6 | count7 | count8 | count9 | count10 |
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
| 10026 | 201901230030 | 5170.0 | 20 | 10 | 1 | 3 | 10 | 0 | 1 | 10 | 9 | 2 |
| 10026 | 201901230100 | 5669.0 | 20 | 10 | 2 | 4 | 10 | 1 | 2 | 9 | 8 | 2 |
| 10026 | 201901230130 | 2396.0 | 20 | 10 | 3 | 4 | 10 | 2 | 3 | 8 | 7 | 2 |
| 10026 | 201901230200 | 1498.0 | 20 | 10 | 4 | 4 | 10 | 3 | 4 | 7 | 6 | 2 |
| 10026 | 201901230230 | 1997.0 | 20 | 10 | 5 | 4 | 10 | 4 | 5 | 6 | 5 | 2 |
| 10026 | 201901230300 | 1188.0 | 20 | 10 | 6 | 4 | 10 | 5 | 6 | 5 | 4 | 2 |
| 10026 | 201901230330 | 598.0 | 20 | 10 | 7 | 4 | 10 | 6 | 7 | 4 | 3 | 2 |
| 10026 | 201901230400 | 479.0 | 20 | 10 | 8 | 4 | 10 | 7 | 8 | 3 | 2 | 2 |
| 10026 | 201901230430 | 1587.0 | 20 | 10 | 9 | 3 | 10 | 8 | 9 | 2 | 1 | 1 |
| 10026 | 201901230530 | 799.0 | 20 | 10 | 10 | 2 | 10 | 9 | 10 | 1 | 0 | 0 |
| 10027 | 201901230030 | 2170.0 | 20 | 5 | 1 | 3 | 5 | 0 | 1 | 5 | 4 | 2 |
| 10027 | 201901230100 | 1623.0 | 20 | 5 | 2 | 4 | 5 | 1 | 2 | 4 | 3 | 2 |
| 10027 | 201901230130 | 3397.0 | 20 | 5 | 3 | 4 | 5 | 2 | 3 | 3 | 2 | 2 |
| 10027 | 201901230200 | 1434.0 | 20 | 5 | 4 | 3 | 5 | 3 | 4 | 2 | 1 | 1 |
| 10027 | 201901230230 | 1001.0 | 20 | 5 | 5 | 2 | 5 | 4 | 5 | 1 | 0 | 0 |
| 10028 | 201901230300 | 1687.0 | 20 | 3 | 1 | 3 | 3 | 0 | 1 | 3 | 2 | 2 |
| 10028 | 201901230330 | 1298.0 | 20 | 3 | 2 | 3 | 3 | 1 | 2 | 2 | 1 | 1 |
| 10028 | 201901230400 | 149.0 | 20 | 3 | 3 | 2 | 3 | 2 | 3 | 1 | 0 | 0 |
| 10029 | 201901230430 | 2587.0 | 20 | 2 | 1 | 2 | 2 | 0 | 1 | 2 | 1 | 1 |
| 10029 | 201901230530 | 589.0 | 20 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 0 | 0 |
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
sum
select
shop_id, stat_date, ordamt,-- 以按shop_id分组、按stat_date排序、统计每个商品截止到当前时间的销售总额,默认从起点到当前行
sum(ordamt) over(partition by shop_id order by stat_date) as sum_amt1,-- 以按shop_id分组、按stat_date排序、统计每个商品前半小时到后一小时的销售额(按当前行+往前1行+往后2行的行作为窗口)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as sum_amt2,-- 以按shop_id分组、按stat_date排序、统计每个商品的销售总额(从起点到末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as sum_amt3,-- 以按shop_id分组、按stat_date排序、统计截止到前半小时的销售总额(从起点到当前行的前一行)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as sum_amt4,-- 以按shop_id分组、按stat_date排序、统计每个商品截止到当前时间的销售总额,默认从起点到当前行(从起点到当前行)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as sum_amt5,-- 以按shop_id分组、按stat_date排序、统计当前时间及之后的销售总额(从当前行的末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as sum_amt6,-- 以按shop_id分组、按stat_date排序、统计当前时间的后半小时及之后的销售额(当前行后一行到末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as sum_amt7,-- 以按shop_id分组、按stat_date排序、统计当前时间后半小时到后一小时之间的销售额(按从当前行往后一行到当前行往后2行)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as sum_amt8
from shop_data;
查询结果
+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| shop_id | stat_date | ordamt | sum_amt1 | sum_amt2 | sum_amt3 | sum_amt4 | sum_amt5 | sum_amt6 | sum_amt7 | sum_amt8 |
+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| 10026 | 201901230030 | 5170.0 | 5170.0 | 13235.0 | 21381.0 | NULL | 5170.0 | 21381.0 | 16211.0 | 8065.0 |
| 10026 | 201901230100 | 5669.0 | 10839.0 | 14733.0 | 21381.0 | 5170.0 | 10839.0 | 16211.0 | 10542.0 | 3894.0 |
| 10026 | 201901230130 | 2396.0 | 13235.0 | 11560.0 | 21381.0 | 10839.0 | 13235.0 | 10542.0 | 8146.0 | 3495.0 |
| 10026 | 201901230200 | 1498.0 | 14733.0 | 7079.0 | 21381.0 | 13235.0 | 14733.0 | 8146.0 | 6648.0 | 3185.0 |
| 10026 | 201901230230 | 1997.0 | 16730.0 | 5281.0 | 21381.0 | 14733.0 | 16730.0 | 6648.0 | 4651.0 | 1786.0 |
| 10026 | 201901230300 | 1188.0 | 17918.0 | 4262.0 | 21381.0 | 16730.0 | 17918.0 | 4651.0 | 3463.0 | 1077.0 |
| 10026 | 201901230330 | 598.0 | 18516.0 | 3852.0 | 21381.0 | 17918.0 | 18516.0 | 3463.0 | 2865.0 | 2066.0 |
| 10026 | 201901230400 | 479.0 | 18995.0 | 3463.0 | 21381.0 | 18516.0 | 18995.0 | 2865.0 | 2386.0 | 2386.0 |
| 10026 | 201901230430 | 1587.0 | 20582.0 | 2865.0 | 21381.0 | 18995.0 | 20582.0 | 2386.0 | 799.0 | 799.0 |
| 10026 | 201901230530 | 799.0 | 21381.0 | 2386.0 | 21381.0 | 20582.0 | 21381.0 | 799.0 | NULL | NULL |
| 10027 | 201901230030 | 2170.0 | 2170.0 | 7190.0 | 9625.0 | NULL | 2170.0 | 9625.0 | 7455.0 | 5020.0 |
| 10027 | 201901230100 | 1623.0 | 3793.0 | 8624.0 | 9625.0 | 2170.0 | 3793.0 | 7455.0 | 5832.0 | 4831.0 |
| 10027 | 201901230130 | 3397.0 | 7190.0 | 7455.0 | 9625.0 | 3793.0 | 7190.0 | 5832.0 | 2435.0 | 2435.0 |
| 10027 | 201901230200 | 1434.0 | 8624.0 | 5832.0 | 9625.0 | 7190.0 | 8624.0 | 2435.0 | 1001.0 | 1001.0 |
| 10027 | 201901230230 | 1001.0 | 9625.0 | 2435.0 | 9625.0 | 8624.0 | 9625.0 | 1001.0 | NULL | NULL |
| 10028 | 201901230300 | 1687.0 | 1687.0 | 3134.0 | 3134.0 | NULL | 1687.0 | 3134.0 | 1447.0 | 1447.0 |
| 10028 | 201901230330 | 1298.0 | 2985.0 | 3134.0 | 3134.0 | 1687.0 | 2985.0 | 1447.0 | 149.0 | 149.0 |
| 10028 | 201901230400 | 149.0 | 3134.0 | 1447.0 | 3134.0 | 2985.0 | 3134.0 | 149.0 | NULL | NULL |
| 10029 | 201901230430 | 2587.0 | 2587.0 | 3176.0 | 3176.0 | NULL | 2587.0 | 3176.0 | 589.0 | 589.0 |
| 10029 | 201901230530 | 589.0 | 3176.0 | 3176.0 | 3176.0 | 2587.0 | 3176.0 | 589.0 | NULL | NULL |
+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
hive窗口函数及示例相关推荐
- 大数据技术-hive窗口函数详解
有不少同学一听这个标题,hive窗口函数是什么鬼?没听说过还有窗口函数这个东西啊,其实它的用处可大了,下面听小千慢慢道来. hive窗口函数 窗口函数指定了函数工作的数据窗口大小(当前行的上下多少行) ...
- HiveQL学习笔记(四):Hive窗口函数
本系列是本人对Hive的学习进行一个整理,主要包括以下内容: 1.HiveQL学习笔记(一):Hive安装及Hadoop,Hive原理简介 2.HiveQL学习笔记(二):Hive基础语法与常用函数 ...
- hive窗口函数使用
hive窗口函数的使用 前言 一.hive窗口函数语法 1.over()窗口函数的语法结构 1.1.over()函数中的三个函数讲解 2.常与over()一起使用的分析函数 2.1.聚合类 2.2.排 ...
- Hive 窗口函数 实现原理
Hive 窗口函数 实现原理 hive中窗口函数的实现,主要是借助于一个叫做 Windowing Table Function 的Partitioned Table Function Partitio ...
- hive 窗口函数(持续更新)
hive窗口函数语法 avg().sum().max().min()等是分析函数,而over()才是窗口函数,下面我们来看看over()窗口函数的语法结构.及常与over()一起使用的分析函数: 1. ...
- HIVE 窗口函数和分析函数
**HIVE 窗口函数和分析函数** 第一篇,试试水: 一.介绍 分析函数用于计算基于组的某种聚合值,它和聚合函数的不同之处是:对于每个组返回多行,而聚合函数对于每个组只返回一行. 开窗函数指定了分析 ...
- Hive窗口函数之累积值、平均值、首尾值的计算学习
Hive窗口函数可以计算一定范围内.一定值域内.或者一段时间内的累积和以及移动平均值等:可以结合聚集函数SUM() .AVG()等使用:可以结合FIRST_VALUE() 和LAST_VALUE(), ...
- hive窗口函数必备宝典
Hive中提供了越来越多的分析函数,用于完成负责的统计分析.我们先在一一列举,希望能够加深印象,希望大家积极讨论,如有不足,请大家多多指教.... 1.Row_Number,Rank,Dense_Ra ...
- hive 窗口函数_Datatist科技专栏 | Hive排序窗口函数速学教程!
作者:原上野 设计:Cindy 编辑:AI君 在开发过程中经常会遇见排序的场景,比如取top N的问题,这时候row_number(),rank,dense_ran()这三个函数就派上用场了,其中,r ...
最新文章
- Spring注解@Value
- 【c语言】蓝桥杯算法训练 简单加法(基本型)
- jupyter notebook 更改工作目录
- Linux下的at定时执行任务命令详解
- Django中配置自定义日志系统
- 在java中重写方法应遵循规则的包括_Java面试题集合篇二
- mysql的错误代码1064_mysql错误代码之1064的解决方案
- mysql写入数据乱码问题的解决
- ubuntu安装-Caffe依赖
- RTSP、RTMP、HTTP协议区别
- RabbitMq(十七)rabbitmq的四种集群监控
- gpt分区安装的Win7激活工具
- element拼音模糊搜索
- 文本特征提取方法介绍
- COSMOS认证咨询,COSMOS认证推出了天然有机化妆品的标签认证法
- 电工与电子技术实验——单管交流电压放大电路
- KMS激活报错0x8007000D
- 简述python语言的主要功能和特点_python语言的特点有哪些
- Qt 多文本框设置行距和
- 屌丝经济”要突破的痛点在哪儿?
热门文章
- 网易云音乐在Ubuntu下出现部分音乐无法播放的解决方法
- 办公室布线电脑网络布线方案
- PHP TP3.2 音乐文件上传并在本地播放
- 安装sogou输入法
- 微服务项目之电商--17.商品规格数据结构SPU和SKU
- 开发者应该掌握的Java代码优化技能
- idea每次打开总是一直加载indexing library‘maven xxx‘‘,Scanning file to index,如何解决?
- 阿里云域名优惠口令获取方法
- vue中style scoped属性的作用和原理以及scoped穿透
- Linux多线程编程:pthread线程创建、退出、回收、分离、取消