窗口函数

​ 其实只有over()才是真正的窗口函数,只是over常于其他函数组合使用,采用强大的效果。

一、over()的语法

分析函数 over(partition by 列名 order by 列名 rows between 开始位置 and  结束位置)

1.1、窗口控制,既window子句

over(partition by col order by 排序字段 rows between 1 preceding and 1 fllowing)

二、常用窗口函数

  1. sum(col) over() : 分组对col累计求和
  2. count(col) over() : 分组对col累计
  3. min(col) over() : 分组对col求最小值
  4. max(col) over() : 分组求col的最大值
  5. avg(col) over() : 分组求col列的平均值
  6. first_value(col) over() : 某分组排序后的第一个col值
  7. last_value(col) over() : 某分组排序后的最后一个col值
  8. lag(col,n,DEFAULT) : 统计往前n行的col值,n可选,默认为1,DEFAULT当往上第n行为NULL时候,取默认值,如不指定,则为NULL
  9. lead(col,n,DEFAULT) : 统计往后n行的col值,n可选,默认为1,DEFAULT当往下第n行为NULL时候,取默认值,如不指定,则为NULL
  10. ntile(n) : 用于将分组数据按照顺序切分成n片,返回当前切片值。注意:n必须为int类型
  11. row_number() over() : 排名函数,不会重复,适合于生成主键或者不并列排名
  12. rank() over() : 排名函数,有并列名次,名次不连续。如:1,1,3
  13. dense_rank() over() : 排名函数,有并列名次,名次连续。如:1,1,2

测试数据

create table shop_data(
shop_id int,
stat_date string,
ordamt double
)
row format delimited
fields terminated by '\t'
lines terminated by '\n'
stored as textfile;-- 插入数据
insert into shop_data values
(10026,'201901230030',5170),
(10026,'201901230100',5669),
(10026,'201901230130',2396),
(10026,'201901230200',1498),
(10026,'201901230230',1997),
(10026,'201901230300',1188),
(10026,'201901230330',598),
(10026,'201901230400',479),
(10026,'201901230430',1587),
(10026,'201901230530',799),
(10027,'201901230030',2170),
(10027,'201901230100',1623),
(10027,'201901230130',3397),
(10027,'201901230200',1434),
(10027,'201901230230',1001),
(10028,'201901230300',1687),
(10028,'201901230330',1298),
(10028,'201901230400',149),
(10029,'201901230430',2587),
(10029,'201901230530',589);

三、案例

count

select shop_id,stat_date,ordamt,
-- 以符合条件的所有行作为窗口
count(shop_id) over() as count1,-- 以按shop_id分组的所有行作为窗口
count(shop_id) over(partition by shop_id) as count2,-- 以按shop_id分组、按stat_date排序的所有行作为窗口
count(shop_id) over(partition by shop_id order by stat_date) as count3,-- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口
count(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as count4,-- 以按shop_id分组、按stat_date排序、按从起点到末尾,默认从起点到末尾和count2结果相同
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as count5,-- 以按shop_id分组、按stat_date排序、按从起点到当前行的前一行
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as count6,-- 以按shop_id分组、按stat_date排序、按从起点到当前行
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as count7,-- 以按shop_id分组、按stat_date排序、按从当前行到末尾
count(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as count8,-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到末尾
count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as count9,-- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行
count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as count10
from shop_data;
--查询结果如下
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
| shop_id  |   stat_date   | ordamt  | count1  | count2  | count3  | count4  | count5  | count6  | count7  | count8  | count9  | count10  |
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
| 10026    | 201901230030  | 5170.0  | 20      | 10      | 1       | 3       | 10      | 0       | 1       | 10      | 9       | 2        |
| 10026    | 201901230100  | 5669.0  | 20      | 10      | 2       | 4       | 10      | 1       | 2       | 9       | 8       | 2        |
| 10026    | 201901230130  | 2396.0  | 20      | 10      | 3       | 4       | 10      | 2       | 3       | 8       | 7       | 2        |
| 10026    | 201901230200  | 1498.0  | 20      | 10      | 4       | 4       | 10      | 3       | 4       | 7       | 6       | 2        |
| 10026    | 201901230230  | 1997.0  | 20      | 10      | 5       | 4       | 10      | 4       | 5       | 6       | 5       | 2        |
| 10026    | 201901230300  | 1188.0  | 20      | 10      | 6       | 4       | 10      | 5       | 6       | 5       | 4       | 2        |
| 10026    | 201901230330  | 598.0   | 20      | 10      | 7       | 4       | 10      | 6       | 7       | 4       | 3       | 2        |
| 10026    | 201901230400  | 479.0   | 20      | 10      | 8       | 4       | 10      | 7       | 8       | 3       | 2       | 2        |
| 10026    | 201901230430  | 1587.0  | 20      | 10      | 9       | 3       | 10      | 8       | 9       | 2       | 1       | 1        |
| 10026    | 201901230530  | 799.0   | 20      | 10      | 10      | 2       | 10      | 9       | 10      | 1       | 0       | 0        |
| 10027    | 201901230030  | 2170.0  | 20      | 5       | 1       | 3       | 5       | 0       | 1       | 5       | 4       | 2        |
| 10027    | 201901230100  | 1623.0  | 20      | 5       | 2       | 4       | 5       | 1       | 2       | 4       | 3       | 2        |
| 10027    | 201901230130  | 3397.0  | 20      | 5       | 3       | 4       | 5       | 2       | 3       | 3       | 2       | 2        |
| 10027    | 201901230200  | 1434.0  | 20      | 5       | 4       | 3       | 5       | 3       | 4       | 2       | 1       | 1        |
| 10027    | 201901230230  | 1001.0  | 20      | 5       | 5       | 2       | 5       | 4       | 5       | 1       | 0       | 0        |
| 10028    | 201901230300  | 1687.0  | 20      | 3       | 1       | 3       | 3       | 0       | 1       | 3       | 2       | 2        |
| 10028    | 201901230330  | 1298.0  | 20      | 3       | 2       | 3       | 3       | 1       | 2       | 2       | 1       | 1        |
| 10028    | 201901230400  | 149.0   | 20      | 3       | 3       | 2       | 3       | 2       | 3       | 1       | 0       | 0        |
| 10029    | 201901230430  | 2587.0  | 20      | 2       | 1       | 2       | 2       | 0       | 1       | 2       | 1       | 1        |
| 10029    | 201901230530  | 589.0   | 20      | 2       | 2       | 2       | 2       | 1       | 2       | 1       | 0       | 0        |
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+

sum

select
shop_id, stat_date, ordamt,-- 以按shop_id分组、按stat_date排序、统计每个商品截止到当前时间的销售总额,默认从起点到当前行
sum(ordamt) over(partition by shop_id order by stat_date) as sum_amt1,-- 以按shop_id分组、按stat_date排序、统计每个商品前半小时到后一小时的销售额(按当前行+往前1行+往后2行的行作为窗口)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as sum_amt2,-- 以按shop_id分组、按stat_date排序、统计每个商品的销售总额(从起点到末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as sum_amt3,-- 以按shop_id分组、按stat_date排序、统计截止到前半小时的销售总额(从起点到当前行的前一行)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as sum_amt4,-- 以按shop_id分组、按stat_date排序、统计每个商品截止到当前时间的销售总额,默认从起点到当前行(从起点到当前行)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as sum_amt5,-- 以按shop_id分组、按stat_date排序、统计当前时间及之后的销售总额(从当前行的末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as sum_amt6,-- 以按shop_id分组、按stat_date排序、统计当前时间的后半小时及之后的销售额(当前行后一行到末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as sum_amt7,-- 以按shop_id分组、按stat_date排序、统计当前时间后半小时到后一小时之间的销售额(按从当前行往后一行到当前行往后2行)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as sum_amt8
from shop_data;

查询结果

+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| shop_id  |   stat_date   | ordamt  | sum_amt1  | sum_amt2  | sum_amt3  | sum_amt4  | sum_amt5  | sum_amt6  | sum_amt7  | sum_amt8  |
+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| 10026    | 201901230030  | 5170.0  | 5170.0    | 13235.0   | 21381.0   | NULL      | 5170.0    | 21381.0   | 16211.0   | 8065.0    |
| 10026    | 201901230100  | 5669.0  | 10839.0   | 14733.0   | 21381.0   | 5170.0    | 10839.0   | 16211.0   | 10542.0   | 3894.0    |
| 10026    | 201901230130  | 2396.0  | 13235.0   | 11560.0   | 21381.0   | 10839.0   | 13235.0   | 10542.0   | 8146.0    | 3495.0    |
| 10026    | 201901230200  | 1498.0  | 14733.0   | 7079.0    | 21381.0   | 13235.0   | 14733.0   | 8146.0    | 6648.0    | 3185.0    |
| 10026    | 201901230230  | 1997.0  | 16730.0   | 5281.0    | 21381.0   | 14733.0   | 16730.0   | 6648.0    | 4651.0    | 1786.0    |
| 10026    | 201901230300  | 1188.0  | 17918.0   | 4262.0    | 21381.0   | 16730.0   | 17918.0   | 4651.0    | 3463.0    | 1077.0    |
| 10026    | 201901230330  | 598.0   | 18516.0   | 3852.0    | 21381.0   | 17918.0   | 18516.0   | 3463.0    | 2865.0    | 2066.0    |
| 10026    | 201901230400  | 479.0   | 18995.0   | 3463.0    | 21381.0   | 18516.0   | 18995.0   | 2865.0    | 2386.0    | 2386.0    |
| 10026    | 201901230430  | 1587.0  | 20582.0   | 2865.0    | 21381.0   | 18995.0   | 20582.0   | 2386.0    | 799.0     | 799.0     |
| 10026    | 201901230530  | 799.0   | 21381.0   | 2386.0    | 21381.0   | 20582.0   | 21381.0   | 799.0     | NULL      | NULL      |
| 10027    | 201901230030  | 2170.0  | 2170.0    | 7190.0    | 9625.0    | NULL      | 2170.0    | 9625.0    | 7455.0    | 5020.0    |
| 10027    | 201901230100  | 1623.0  | 3793.0    | 8624.0    | 9625.0    | 2170.0    | 3793.0    | 7455.0    | 5832.0    | 4831.0    |
| 10027    | 201901230130  | 3397.0  | 7190.0    | 7455.0    | 9625.0    | 3793.0    | 7190.0    | 5832.0    | 2435.0    | 2435.0    |
| 10027    | 201901230200  | 1434.0  | 8624.0    | 5832.0    | 9625.0    | 7190.0    | 8624.0    | 2435.0    | 1001.0    | 1001.0    |
| 10027    | 201901230230  | 1001.0  | 9625.0    | 2435.0    | 9625.0    | 8624.0    | 9625.0    | 1001.0    | NULL      | NULL      |
| 10028    | 201901230300  | 1687.0  | 1687.0    | 3134.0    | 3134.0    | NULL      | 1687.0    | 3134.0    | 1447.0    | 1447.0    |
| 10028    | 201901230330  | 1298.0  | 2985.0    | 3134.0    | 3134.0    | 1687.0    | 2985.0    | 1447.0    | 149.0     | 149.0     |
| 10028    | 201901230400  | 149.0   | 3134.0    | 1447.0    | 3134.0    | 2985.0    | 3134.0    | 149.0     | NULL      | NULL      |
| 10029    | 201901230430  | 2587.0  | 2587.0    | 3176.0    | 3176.0    | NULL      | 2587.0    | 3176.0    | 589.0     | 589.0     |
| 10029    | 201901230530  | 589.0   | 3176.0    | 3176.0    | 3176.0    | 2587.0    | 3176.0    | 589.0     | NULL      | NULL      |
+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+

hive窗口函数及示例相关推荐

  1. 大数据技术-hive窗口函数详解

    有不少同学一听这个标题,hive窗口函数是什么鬼?没听说过还有窗口函数这个东西啊,其实它的用处可大了,下面听小千慢慢道来. hive窗口函数 窗口函数指定了函数工作的数据窗口大小(当前行的上下多少行) ...

  2. HiveQL学习笔记(四):Hive窗口函数

    本系列是本人对Hive的学习进行一个整理,主要包括以下内容: 1.HiveQL学习笔记(一):Hive安装及Hadoop,Hive原理简介 2.HiveQL学习笔记(二):Hive基础语法与常用函数 ...

  3. hive窗口函数使用

    hive窗口函数的使用 前言 一.hive窗口函数语法 1.over()窗口函数的语法结构 1.1.over()函数中的三个函数讲解 2.常与over()一起使用的分析函数 2.1.聚合类 2.2.排 ...

  4. Hive 窗口函数 实现原理

    Hive 窗口函数 实现原理 hive中窗口函数的实现,主要是借助于一个叫做 Windowing Table Function 的Partitioned Table Function Partitio ...

  5. hive 窗口函数(持续更新)

    hive窗口函数语法 avg().sum().max().min()等是分析函数,而over()才是窗口函数,下面我们来看看over()窗口函数的语法结构.及常与over()一起使用的分析函数: 1. ...

  6. HIVE 窗口函数和分析函数

    **HIVE 窗口函数和分析函数** 第一篇,试试水: 一.介绍 分析函数用于计算基于组的某种聚合值,它和聚合函数的不同之处是:对于每个组返回多行,而聚合函数对于每个组只返回一行. 开窗函数指定了分析 ...

  7. Hive窗口函数之累积值、平均值、首尾值的计算学习

    Hive窗口函数可以计算一定范围内.一定值域内.或者一段时间内的累积和以及移动平均值等:可以结合聚集函数SUM() .AVG()等使用:可以结合FIRST_VALUE() 和LAST_VALUE(), ...

  8. hive窗口函数必备宝典

    Hive中提供了越来越多的分析函数,用于完成负责的统计分析.我们先在一一列举,希望能够加深印象,希望大家积极讨论,如有不足,请大家多多指教.... 1.Row_Number,Rank,Dense_Ra ...

  9. hive 窗口函数_Datatist科技专栏 | Hive排序窗口函数速学教程!

    作者:原上野 设计:Cindy 编辑:AI君 在开发过程中经常会遇见排序的场景,比如取top N的问题,这时候row_number(),rank,dense_ran()这三个函数就派上用场了,其中,r ...

最新文章

  1. Spring注解@Value
  2. 【c语言】蓝桥杯算法训练 简单加法(基本型)
  3. jupyter notebook 更改工作目录
  4. Linux下的at定时执行任务命令详解
  5. Django中配置自定义日志系统
  6. 在java中重写方法应遵循规则的包括_Java面试题集合篇二
  7. mysql的错误代码1064_mysql错误代码之1064的解决方案
  8. mysql写入数据乱码问题的解决
  9. ubuntu安装-Caffe依赖
  10. RTSP、RTMP、HTTP协议区别
  11. RabbitMq(十七)rabbitmq的四种集群监控
  12. gpt分区安装的Win7激活工具
  13. element拼音模糊搜索
  14. 文本特征提取方法介绍
  15. COSMOS认证咨询,COSMOS认证推出了天然有机化妆品的标签认证法
  16. 电工与电子技术实验——单管交流电压放大电路
  17. KMS激活报错0x8007000D
  18. 简述python语言的主要功能和特点_python语言的特点有哪些
  19. Qt 多文本框设置行距和
  20. 屌丝经济”要突破的痛点在哪儿?

热门文章

  1. 网易云音乐在Ubuntu下出现部分音乐无法播放的解决方法
  2. 办公室布线电脑网络布线方案
  3. PHP TP3.2 音乐文件上传并在本地播放
  4. 安装sogou输入法
  5. 微服务项目之电商--17.商品规格数据结构SPU和SKU
  6. 开发者应该掌握的Java代码优化技能
  7. idea每次打开总是一直加载indexing library‘maven xxx‘‘,Scanning file to index,如何解决?
  8. 阿里云域名优惠口令获取方法
  9. vue中style scoped属性的作用和原理以及scoped穿透
  10. Linux多线程编程:pthread线程创建、退出、回收、分离、取消