3.5 Window Functions

3.5 窗口函数

窗口函数针对当前行，对一组表行进行计算。这类似于使用聚合函数进行的计算。然而，窗口函数不会像非窗口函数那样将行分组聚合为一行输出，而是这些行保留各自的属性。而且，窗口函数能够不仅仅访问查询结果的当前行。

以下示例，展示了雇员如何与本部门的平均工资进行比较：

mydb=# create table empsalary(depname varchar(20),empno int,salary numeric);

CREATE TABLE

mydb=# insert into empsalary values('develop',11,5200);

INSERT 0 1

mydb=# insert into empsalary values('develop',7,4200);

INSERT 0 1

mydb=# insert into empsalary values('develop',9,4500);

INSERT 0 1

mydb=# insert into empsalary values('develop',8,6000);

INSERT 0 1

mydb=# insert into empsalary values('develop',10,5200);

INSERT 0 1

mydb=# insert into empsalary values('personnel',5,3500);

INSERT 0 1

mydb=# insert into empsalary values('personnel',2,3900);

INSERT 0 1

mydb=# insert into empsalary values('sales',3,4800);

INSERT 0 1

mydb=# insert into empsalary values('sales',1,5000);

INSERT 0 1

mydb=# insert into empsalary values('sales',4,4800);

INSERT 0 1

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY

depname) FROM empsalary;

depname | empno | salary | avg

-----------+-------+--------+-----------------------

develop | 11 | 5200 | 5020.0000000000000000

develop | 7 | 4200 | 5020.0000000000000000

develop | 9 | 4500 | 5020.0000000000000000

develop | 8 | 6000 | 5020.0000000000000000

develop | 10 | 5200 | 5020.0000000000000000

personnel | 5 | 3500 | 3700.0000000000000000

personnel | 2 | 3900 | 3700.0000000000000000

sales | 3 | 4800 | 4866.6666666666666667

sales | 1 | 5000 | 4866.6666666666666667

sales | 4 | 4800 | 4866.6666666666666667

(10 rows)

输出中，前面三列来自于表 empsalary，每列表中都对应有数据。第四列为具有相同depname的salary的平均值的当前行。（这其实跟avg函数差不多，但是over使得该命令被视为窗口函数，并在窗口区域（此处的窗口区域为depname）内进行计算。）

窗口函数调用，总是在窗口函数名称和参数之后直接加一个over子句。这在语义上区别于普通的函数或者非窗口聚合函数。over子句确定窗口函数如何划分查询返回的行。over中的partition by子句将行按照partition by中的表达式（上例中的depname）进行分组或分区。对于每一行，窗口函数将统一分区中的行进行计算作为当前行（上例中的，相同depname的所有salary的平均值作为当前行）。

还可以在over中使用order by对窗口函数处理的行进行排序。示例如下：

SELECT depname, empno, salary,

rank() OVER (PARTITION BY depname ORDER BY salary DESC)

FROM empsalary;

depname | empno | salary | rank

-----------+-------+--------+------

develop | 8 | 6000 | 1

develop | 10 | 5200 | 2

develop | 11 | 5200 | 2

develop | 9 | 4500 | 4

develop | 7 | 4200 | 5

personnel | 2 | 3900 | 1

personnel | 5 | 3500 | 2

sales | 1 | 5000 | 1

sales | 3 | 4800 | 2

sales | 4 | 4800 | 2

(10 rows)

如上，函数rank分区对order by定义的排序进行了标号（相同的值，标号相同）。因为rank的行为完全受over子句影响，所以它不需要显式的定义参数。

窗口函数处理的行，来自于from子句查询出来的虚拟表。例如，被where条件过滤掉的行不会被窗口函数处理。一个查询中可以使用多个窗口函数，但它们都针对相同的结果行集起作用。

正如我们所见，如果不需要排序，那么可以不使用order by子句。当然如果只有一个分区的时候，也可以忽略partition by子句。

窗口函数中还有一个重要的概念：对于每一行，分区内的所有行称为窗口区域（window frame）。一些窗口函数只会作用于窗口区域内的行，而不是所有的分区。默认情况下，如果使用了order by子句，那么窗口区域包含从分区开始到当前行、以及等于当前行的所有行。如果未使用order by，则窗口区域包含分区内所有行。以下为使用sum的示例：

SELECT salary, sum(salary) OVER () FROM empsalary;

salary | sum

--------+-------

5200 | 47100。

5000 | 47100

3500 | 47100

4800 | 47100

3900 | 47100

4200 | 47100

4500 | 47100

4800 | 47100

6000 | 47100

5200 | 47100

(10 rows)

上例中，因为over子句中没有使用order by，所以窗口区域与分区相同，而因为没有partition by，所以分区为整个表，也就是说，每个sum均为针对整表的，所以sum的每一行均为相同的值。但如果加上order by子句，那么结果将不同：

SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

salary | sum

--------+-------

3500 | 3500

3900 | 7400

4200 | 11600

4500 | 16100

4800 | 25700

5000 | 30700

5200 | 41100

6000 | 47100

(10 rows)

此处的sum计算了从最小值到当前行的和，如果两行相同，则只计算一次。

窗口函数只可用于select和order by子句中。不能用于如group by、having、where等处。这是因为，逻辑上，窗口函数是在上列子句执行完成后才执行。还有，窗口函数在非窗口函数之后运行。也就是说，可以在窗口函数中调用聚合函数，但反过来不行。

如果有需求，要在窗口函数执行之后进行行的筛选或分组，那么可以使用子查询。例如：

mydb=# alter table empsalary add column enroll_date date;

ALTER TABLE

SELECT depname, empno, salary, enroll_date

FROM

(SELECT depname, empno, salary, enroll_date,

rank() OVER (PARTITION BY depname ORDER BY salary DESC,

empno) AS pos

FROM empsalary

) AS ss

WHERE pos < 3;

以上查询仅返回内部查询中rank小于3的行。

如果查询中涉及多个窗口函数，可以为每个窗口函数分别写over子句，但如果多个函数需要相同的窗口行为，那么这样做会重复，且容易出错。更好的办法是，在窗口子句中命名每个窗口行为，并在over中引用即可。例如：

SELECT sum(salary) OVER w, avg(salary) OVER w

FROM empsalary

WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

关于窗口函数的更多信息可参见第4.2.8节，第9.22节，第7.2.5节和SELECT参考页。

3.5. Window Functions

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like nonwindow aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes,the window function is able to access more than just the current row of the query result.

Here is an example that shows how to compare each employee's salary with the average salary in his or her department:

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY

depname) FROM empsalary;

depname | empno | salary | avg

-----------+-------+--------+-----------------------

develop | 11 | 5200 | 5020.0000000000000000

develop | 7 | 4200 | 5020.0000000000000000

develop | 9 | 4500 | 5020.0000000000000000

develop | 8 | 6000 | 5020.0000000000000000

develop | 10 | 5200 | 5020.0000000000000000

personnel | 5 | 3500 | 3700.0000000000000000

personnel | 2 | 3900 | 3700.0000000000000000

sales | 3 | 4800 | 4866.6666666666666667

sales | 1 | 5000 | 4866.6666666666666667

sales | 4 | 4800 | 4866.6666666666666667

(10 rows)

The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row. (This actually is the same function as the non-window avg aggregate, but the OVER clause causes it to be treated as a window function and computed across the window frame.)

A window function call always contains an OVER clause directly following the window function's name and argument(s). This is what syntactically distinguishes it from a normal function or nonwindow aggregate. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row,the window function is computed across the rows that fall into the same partition as the current row.

You can also control the order in which rows are processed by window functions using ORDER BY within OVER. (The window ORDER BY does not even have to match the order in which the rows are output.) Here is an example:

SELECT depname, empno, salary,

rank() OVER (PARTITION BY depname ORDER BY salary DESC)

FROM empsalary;

depname | empno | salary | rank

-----------+-------+--------+------

develop | 8 | 6000 | 1

develop | 10 | 5200 | 2

develop | 11 | 5200 | 2

develop | 9 | 4500 | 4

develop | 7 | 4200 | 5

personnel | 2 | 3900 | 1

personnel | 5 | 3500 | 2

sales | 1 | 5000 | 1

sales | 4 | 4800 | 2

sales | 3 | 4800 | 2

(10 rows)

As shown here, the rank function produces a numerical rank for each distinct ORDER BY value in the current row's partition, using the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.

The rows considered by a window function are those of the “virtual table” produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.

We already saw that ORDER BY can be omitted if the ordering of rows is not important. It is also possible to omit PARTITION BY, in which case there is a single partition containing all rows.

There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition. 1 Here is an example using sum:

SELECT salary, sum(salary) OVER () FROM empsalary;

salary | sum

--------+-------

5200 | 47100。

5000 | 47100

3500 | 47100

4800 | 47100

3900 | 47100

4200 | 47100

4500 | 47100

4800 | 47100

6000 | 47100

5200 | 47100

(10 rows)

Above, since there is no ORDER BY in the OVER clause, the window frame is the same as the partition,which for lack of PARTITION BY is the whole table; in other words each sum is taken over the whole table and so we get the same result for each output row. But if we add an ORDER BY clause,we get very different results:

SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

salary | sum

--------+-------

3500 | 3500

3900 | 7400

4200 | 11600

4500 | 16100

4800 | 25700

5000 | 30700

5200 | 41100

6000 | 47100

(10 rows)

Here the sum is taken from the first (lowest) salary up through the current one, including any duplicates of the current one (notice the results for the duplicated salaries).

Window functions are permitted only in the SELECT list and the ORDER BY clause of the query.They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.

If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select. For example:

SELECT depname, empno, salary, enroll_date

FROM

(SELECT depname, empno, salary, enroll_date,

rank() OVER (PARTITION BY depname ORDER BY salary DESC,

empno) AS pos

FROM empsalary

) AS ss

WHERE pos < 3;

The above query only shows the rows from the inner query having rank less than 3.

When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:

SELECT sum(salary) OVER w, avg(salary) OVER w

FROM empsalary

WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

More details about window functions can be found in Section 4.2.8, Section 9.21, Section 7.2.5, and the SELECT reference page.

3.5 Window Functions相关推荐

mysql 窗口函数最新一条_MySQL 8.0 窗口函数(window functions)
窗口函数(window functions)是数据库的标准功能之一,主流的数据库比如Oracle,PostgreSQL都支持窗口函数功能,MySQL 直到 8.0 版本才开始支持窗口函数. 窗口函数, ...
【Clickhouse】Clickhouse 分析函数 window functions 窗口函数
文章目录 1.概述 1.1.窗口函数: 1.2.标准SQL语法 1.3.分析函数分类: 2.Top N: 3.案例 1.概述转载:Clickhouse 分析函数 window functions 窗 ...
postgresql Window Functions
Window Functions 窗口函数不会像非窗口聚合调用那样使行分组到单个输出行中.相反,这些行保留了它们单独的身份.不可以和GROUP BY. WHERE子句一起用.如果在执行窗口计算后需要过 ...
Flink 窗口函数（Window Functions）增量聚合函数
文章目录增量聚合函数(incremental aggregation functions) 归约函数(ReduceFunction) 聚合函数(AggregateFunction) 定义了窗口分配器 ...
Presto 文档学习之窗口函数（Window Functions）排名函数（Ranking Functions）
Hello!大家好,本人菜鸟一枚,最近在领导的要求下学习Prestodb,没办法只能打开官方文档来自我学习,对于英语渣的我来说无疑是痛苦的,看到的页面和天书一样谷歌翻译过来的吧很多句子看不懂想要具 ...
SQL Server Window Function 窗体函数读书笔记二 - A Detailed Look at Window Functions
这一章主要是介绍窗体中的 Aggregate 函数, Rank 函数, Distribution 函数以及 Offset 函数. Window Aggregate 函数 Window Aggrega ...
Flink 窗口函数（Window Functions）处理迟到数据
文章目录将迟到的数据放入侧输出流 Lambda架构:用一个流处理器,先快速的得到一个正确,近似正确的结果,然后在另外一层是一个批处理器,然后在它是一直等着的,等所有数据都到齐了,计算出一个最终准确的 ...
MariaDB Window Functions窗口函数分组取TOP N记录
窗口函数在MariaDB10.2版本里实现,其简化了复杂SQL的撰写,提高了可读性. 在某些方面,窗口函数类似于聚集函数, 但它不像聚集函数那样每组只返回一个值,窗口函数可以为每组返回多个值. 作为一 ...
Flink window 用法介绍
Sink Flink没有类似spark中foreach方法让用户进行迭代操作虽有对外的输出操作都要利用Sink完成最后通过类似如下方式完成整个任务最终输出操作 stream.addSink(n ...

3.5 Window Functions

3.5 Window Functions相关推荐

最新文章

热门文章