3.5 窗口函数

窗口函数针对当前行,对一组表行进行计算。这类似于使用聚合函数进行的计算。然而,窗口函数不会像非窗口函数那样将行分组聚合为一行输出,而是这些行保留各自的属性。而且,窗口函数能够不仅仅访问查询结果的当前行。

以下示例,展示了雇员如何与本部门的平均工资进行比较:

mydb=# create table empsalary(depname varchar(20),empno int,salary numeric);

CREATE TABLE

mydb=# insert into empsalary values('develop',11,5200);

INSERT 0 1

mydb=# insert into empsalary values('develop',7,4200);

INSERT 0 1

mydb=# insert into empsalary values('develop',9,4500);

INSERT 0 1

mydb=# insert into empsalary values('develop',8,6000);

INSERT 0 1

mydb=# insert into empsalary values('develop',10,5200);

INSERT 0 1

mydb=# insert into empsalary values('personnel',5,3500);

INSERT 0 1

mydb=# insert into empsalary values('personnel',2,3900);

INSERT 0 1

mydb=# insert into empsalary values('sales',3,4800);

INSERT 0 1

mydb=# insert into empsalary values('sales',1,5000);

INSERT 0 1

mydb=# insert into empsalary values('sales',4,4800);

INSERT 0 1

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY

depname) FROM empsalary;

depname  | empno | salary |          avg

-----------+-------+--------+-----------------------

develop   |    11 |   5200 | 5020.0000000000000000

develop   |     7 |   4200 | 5020.0000000000000000

develop   |     9 |   4500 | 5020.0000000000000000

develop   |     8 |   6000 | 5020.0000000000000000

develop   |    10 |   5200 | 5020.0000000000000000

personnel |     5 |   3500 | 3700.0000000000000000

personnel |     2 |   3900 | 3700.0000000000000000

sales     |     3 |   4800 | 4866.6666666666666667

sales     |     1 |   5000 | 4866.6666666666666667

sales     |     4 |   4800 | 4866.6666666666666667

(10 rows)

输出中,前面三列来自于表 empsalary,每列表中都对应有数据。第四列为具有相同depname的salary的平均值的当前行。(这其实跟avg函数差不多,但是over使得该命令被视为窗口函数,并在窗口区域(此处的窗口区域为depname)内进行计算。)

窗口函数调用,总是在窗口函数名称和参数之后直接加一个over子句。这在语义上区别于普通的函数或者非窗口聚合函数。over子句确定窗口函数如何划分查询返回的行。over中的partition by子句将行按照partition by中的表达式(上例中的depname)进行分组或分区。对于每一行,窗口函数将统一分区中的行进行计算作为当前行(上例中的,相同depname的所有salary的平均值作为当前行)。

还可以在over中使用order by对窗口函数处理的行进行排序。示例如下:

SELECT depname, empno, salary,

rank() OVER (PARTITION BY depname ORDER BY salary DESC)

FROM empsalary;

depname  | empno | salary | rank

-----------+-------+--------+------

develop   |     8 |   6000 |    1

develop   |    10 |   5200 |    2

develop   |    11 |   5200 |    2

develop   |     9 |   4500 |    4

develop   |     7 |   4200 |    5

personnel |     2 |   3900 |    1

personnel |     5 |   3500 |    2

sales     |     1 |   5000 |    1

sales     |     3 |   4800 |    2

sales     |     4 |   4800 |    2

(10 rows)

如上,函数rank分区对order by定义的排序进行了标号(相同的值,标号相同)。因为rank的行为完全受over子句影响,所以它不需要显式的定义参数。

窗口函数处理的行,来自于from子句查询出来的虚拟表。例如,被where条件过滤掉的行不会被窗口函数处理。一个查询中可以使用多个窗口函数,但它们都针对相同的结果行集起作用。

正如我们所见,如果不需要排序,那么可以不使用order by子句。当然如果只有一个分区的时候,也可以忽略partition by子句。

窗口函数中还有一个重要的概念:对于每一行,分区内的所有行称为窗口区域(window frame)。一些窗口函数只会作用于窗口区域内的行,而不是所有的分区。默认情况下,如果使用了order by子句,那么窗口区域包含从分区开始到当前行、以及等于当前行的所有行。如果未使用order by,则窗口区域包含分区内所有行。以下为使用sum的示例:

SELECT salary, sum(salary) OVER () FROM empsalary;

salary | sum

--------+-------

5200 | 47100。

5000 | 47100

3500 | 47100

4800 | 47100

3900 | 47100

4200 | 47100

4500 | 47100

4800 | 47100

6000 | 47100

5200 | 47100

(10 rows)

上例中,因为over子句中没有使用order by,所以窗口区域与分区相同,而因为没有partition by,所以分区为整个表,也就是说,每个sum均为针对整表的,所以sum的每一行均为相同的值。但如果加上order by子句,那么结果将不同:

SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

salary | sum

--------+-------

3500 | 3500

3900 | 7400

4200 | 11600

4500 | 16100

4800 | 25700

4800 | 25700

5000 | 30700

5200 | 41100

5200 | 41100

6000 | 47100

(10 rows)

此处的sum计算了从最小值到当前行的和,如果两行相同,则只计算一次。

窗口函数只可用于select和order by子句中。不能用于如group by、having、where等处。这是因为,逻辑上,窗口函数是在上列子句执行完成后才执行。还有,窗口函数在非窗口函数之后运行。也就是说,可以在窗口函数中调用聚合函数,但反过来不行。

如果有需求,要在窗口函数执行之后进行行的筛选或分组,那么可以使用子查询。例如:

mydb=# alter table empsalary add column enroll_date date;

ALTER TABLE

SELECT depname, empno, salary, enroll_date

FROM

(SELECT depname, empno, salary, enroll_date,

rank() OVER (PARTITION BY depname ORDER BY salary DESC,

empno) AS pos

FROM empsalary

) AS ss

WHERE pos < 3;

以上查询仅返回内部查询中rank小于3的行。

如果查询中涉及多个窗口函数,可以为每个窗口函数分别写over子句,但如果多个函数需要相同的窗口行为,那么这样做会重复,且容易出错。更好的办法是,在窗口子句中命名每个窗口行为,并在over中引用即可。例如:

SELECT sum(salary) OVER w, avg(salary) OVER w

FROM empsalary

WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

关于窗口函数的更多信息可参见第4.2.8节,第9.22节,第7.2.5节和SELECT参考页。

3.5. Window Functions

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like nonwindow aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes,the window function is able to access more than just the current row of the query result.

Here is an example that shows how to compare each employee's salary with the average salary in his or her department:

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY

depname) FROM empsalary;

depname | empno | salary | avg

-----------+-------+--------+-----------------------

develop | 11 | 5200 | 5020.0000000000000000

develop | 7 | 4200 | 5020.0000000000000000

develop | 9 | 4500 | 5020.0000000000000000

develop | 8 | 6000 | 5020.0000000000000000

develop | 10 | 5200 | 5020.0000000000000000

personnel | 5 | 3500 | 3700.0000000000000000

personnel | 2 | 3900 | 3700.0000000000000000

sales | 3 | 4800 | 4866.6666666666666667

sales | 1 | 5000 | 4866.6666666666666667

sales | 4 | 4800 | 4866.6666666666666667

(10 rows)

The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row. (This actually is the same function as the non-window avg aggregate, but the OVER clause causes it to be treated as a window function and computed across the window frame.)

A window function call always contains an OVER clause directly following the window function's name and argument(s). This is what syntactically distinguishes it from a normal function or nonwindow aggregate. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row,the window function is computed across the rows that fall into the same partition as the current row.

You can also control the order in which rows are processed by window functions using ORDER BY within OVER. (The window ORDER BY does not even have to match the order in which the rows are output.) Here is an example:

SELECT depname, empno, salary,

rank() OVER (PARTITION BY depname ORDER BY salary DESC)

FROM empsalary;

depname | empno | salary | rank

-----------+-------+--------+------

develop | 8 | 6000 | 1

develop | 10 | 5200 | 2

develop | 11 | 5200 | 2

develop | 9 | 4500 | 4

develop | 7 | 4200 | 5

personnel | 2 | 3900 | 1

personnel | 5 | 3500 | 2

sales | 1 | 5000 | 1

sales | 4 | 4800 | 2

sales | 3 | 4800 | 2

(10 rows)

As shown here, the rank function produces a numerical rank for each distinct ORDER BY value in the current row's partition, using the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.

The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.

We already saw that ORDER BY can be omitted if the ordering of rows is not important. It is also possible to omit PARTITION BY, in which case there is a single partition containing all rows.

There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. 1 Here is an example using sum:

SELECT salary, sum(salary) OVER () FROM empsalary;

salary | sum

--------+-------

5200 | 47100。

5000 | 47100

3500 | 47100

4800 | 47100

3900 | 47100

4200 | 47100

4500 | 47100

4800 | 47100

6000 | 47100

5200 | 47100

(10 rows)

Above, since there is no ORDER BY in the OVER clause, the window frame is the same as the partition,which for lack of PARTITION BY is the whole table; in other words each sum is taken over the whole table and so we get the same result for each output row. But if we add an ORDER BY clause,we get very different results:

SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

salary | sum

--------+-------

3500 | 3500

3900 | 7400

4200 | 11600

4500 | 16100

4800 | 25700

4800 | 25700

5000 | 30700

5200 | 41100

5200 | 41100

6000 | 47100

(10 rows)

Here the sum is taken from the first (lowest) salary up through the current one, including any duplicates of the current one (notice the results for the duplicated salaries).

Window functions are permitted only in the SELECT list and the ORDER BY clause of the query.They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.

If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select. For example:

SELECT depname, empno, salary, enroll_date

FROM

(SELECT depname, empno, salary, enroll_date,

rank() OVER (PARTITION BY depname ORDER BY salary DESC,

empno) AS pos

FROM empsalary

) AS ss

WHERE pos < 3;

The above query only shows the rows from the inner query having rank less than 3.

When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:

SELECT sum(salary) OVER w, avg(salary) OVER w

FROM empsalary

WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

More details about window functions can be found in Section 4.2.8, Section 9.21, Section 7.2.5, and the SELECT reference page.

3.5 Window Functions相关推荐

  1. mysql 窗口函数最新一条_MySQL 8.0 窗口函数(window functions)

    窗口函数(window functions)是数据库的标准功能之一,主流的数据库比如Oracle,PostgreSQL都支持窗口函数功能,MySQL 直到 8.0 版本才开始支持窗口函数. 窗口函数, ...

  2. 【Clickhouse】Clickhouse 分析函数 window functions 窗口函数

    文章目录 1.概述 1.1.窗口函数: 1.2.标准SQL语法 1.3.分析函数分类: 2.Top N: 3.案例 1.概述 转载:Clickhouse 分析函数 window functions 窗 ...

  3. postgresql Window Functions

    Window Functions 窗口函数不会像非窗口聚合调用那样使行分组到单个输出行中.相反,这些行保留了它们单独的身份.不可以和GROUP BY. WHERE子句一起用.如果在执行窗口计算后需要过 ...

  4. Flink 窗口函数(Window Functions)增量聚合函数

    文章目录 增量聚合函数(incremental aggregation functions) 归约函数(ReduceFunction) 聚合函数(AggregateFunction) 定义了窗口分配器 ...

  5. Presto 文档学习之 窗口函数(Window Functions)排名函数(Ranking Functions)

    Hello!大家好,本人菜鸟一枚,最近在领导的要求下学习Prestodb,没办法只能打开官方文档来自我学习,对于英语渣的我来说无疑是痛苦的,看到的页面和天书一样 谷歌翻译过来的吧 很多句子看不懂想要具 ...

  6. SQL Server Window Function 窗体函数读书笔记二 - A Detailed Look at Window Functions

    这一章主要是介绍 窗体中的 Aggregate 函数, Rank 函数, Distribution 函数以及 Offset 函数. Window Aggregate 函数 Window Aggrega ...

  7. Flink 窗口函数(Window Functions)处理迟到数据

    文章目录 将迟到的数据放入侧输出流 Lambda架构:用一个流处理器,先快速的得到一个正确,近似正确的结果,然后在另外一层是一个批处理器,然后在它是一直等着的,等所有数据都到齐了,计算出一个最终准确的 ...

  8. MariaDB Window Functions窗口函数分组取TOP N记录

    窗口函数在MariaDB10.2版本里实现,其简化了复杂SQL的撰写,提高了可读性. 在某些方面,窗口函数类似于聚集函数, 但它不像聚集函数那样每组只返回一个值,窗口函数可以为每组返回多个值. 作为一 ...

  9. Flink window 用法介绍

    Sink Flink没有类似spark中foreach方法 让用户进行迭代操作 虽有对外的输出操作 都要利用Sink完成 最后通过类似如下方式完成整个任务最终输出操作 stream.addSink(n ...

最新文章

  1. 微软必应从.NET Core 2.1获得了性能提升
  2. 我所知道的flex布局 —— 上篇
  3. Win32 ASM 简单对话框编程Demo
  4. java 开源记账_生鲜配送系统ERP(JAVA开源版)-水产记账ipad
  5. c#使用FluentFtp实现一行代码实现ftp上传下载等
  6. while的用法java_java中的while循环和do while循环
  7. mysql如何计算qps_如何计算MySQL中的QPS及TPS指标
  8. Java 进阶之路:异常处理的内在原理及优雅的处理方式
  9. 打开c盘_为什么你的C盘总是爆满?教你彻底清理C盘空间,瞬间提速50%
  10. centos7 卸载软件
  11. mysql日志监控 zabbix_zabbix日常监控项mysql(七)
  12. 用jQuery做点击下箭头改变方向
  13. 统计插件_头号攻略:怀旧服战场、PVP好用的插件推荐一览,很多都是必备品
  14. vue2.0项目部署到服务器_阿里云服务器搭建及项目部署过程---小白篇
  15. Fundamentals of Power Electronics 第1版第2版 pdf 资源
  16. 做对了什么与留下了什么 小米上市的背后
  17. Java牛客项目课_仿牛客网讨论区_第七章
  18. 有个exe文件删不掉怎么办
  19. python——获取矩形四个角点的坐标
  20. 对于数据混乱程度的判定准则:基尼不纯度、信息熵、方差

热门文章

  1. Kanade Doesn‘t Want to Learn CG HDU7127
  2. TED 若想有所作为,请停止和别人比较
  3. Qt4.8.6程序运行报错及处理方式/add the “-qws“ command-line option / QScreenLinuxFb::connect:/
  4. Mac卡顿 CPU占100%的原因分析及解决办法
  5. 树莓派Ubuntu mate中借助OpenCV实现摄像头定时拍照并保存python语言
  6. 深度可分离卷积、空洞卷积、反卷积、感受野计算、上采样
  7. 给定经纬度计算距离_根据经纬度计算地球上两点之间的距离js实现代码
  8. Java 键盘输入的三种方法
  9. 计算机毕业设计Java校园流浪猫图鉴管理系统的设计与实现(源码+系统+mysql数据库+Lw文档)
  10. 企业级应用与互联网应用