Skip to content

窗口函数调用

什么是窗口函数?

窗口函数是一种特殊的SQL函数,它允许我们对查询结果的一个"窗口"(即一组行)进行计算,而不会将这些行合并为单个输出行。与普通聚合函数不同,窗口函数保留所有原始行,但每行都可以访问相关行的数据。

想象一下,你正通过一个窗口查看数据表,这个窗口可以滑动或调整大小,让你在计算每行结果时看到相关的其他行。

窗口函数的常见用途

  • 计算累计总和或移动平均值
  • 对分组内的数据进行排名
  • 访问当前行前后的数据
  • 计算百分比或相对值

窗口函数调用语法

sql
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )

其中,window_definition的语法为:

sql
[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

简单示例

sql
-- 计算每个部门员工的工资排名
SELECT 
    employee_name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;

窗口定义的主要组成部分

1. PARTITION BY(分区)

PARTITION BY子句将数据分成多个组(分区),窗口函数会在每个分区内单独计算。如果省略此子句,整个结果集被视为一个分区。

sql
-- 按部门分区,计算每个部门的平均工资
SELECT 
    employee_name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg_salary
FROM employees;

2. ORDER BY(排序)

ORDER BY子句决定了窗口函数处理分区内行的顺序。对于某些窗口函数(如排名函数),排序是必需的;对于其他函数,排序会影响计算结果。

sql
-- 计算员工的累计工资总和(按工资从低到高排序)
SELECT 
    employee_name,
    salary,
    SUM(salary) OVER (ORDER BY salary) as running_total
FROM employees;

3. 窗口帧(Window Frame)

窗口帧定义了当前分区内哪些行参与计算当前行的函数结果。帧可以用RANGEROWSGROUPS模式指定。

窗口帧语法

sql
{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]

其中frame_startframe_end可以是:

  • UNBOUNDED PRECEDING:从分区的第一行开始
  • offset PRECEDING:从当前行前面指定数量的行/组/范围开始
  • CURRENT ROW:从当前行开始
  • offset FOLLOWING:从当前行后面指定数量的行/组/范围开始
  • UNBOUNDED FOLLOWING:到分区的最后一行结束

frame_exclusion可以是:

  • EXCLUDE CURRENT ROW:排除当前行
  • EXCLUDE GROUP:排除当前行及其排序对等行
  • EXCLUDE TIES:排除与当前行相同的行,但保留当前行
  • EXCLUDE NO OTHERS:不排除任何行(默认行为)

三种窗口帧模式比较

模式描述使用场景示例
ROWS基于物理行数需要精确控制包含多少行ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
RANGE基于当前行值的范围处理具有相同值的行组RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
GROUPS基于排序后的对等组数量处理具有相同排序值的组GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING

ROWS模式示例

sql
-- 计算当前员工及前两名员工的平均工资
SELECT 
    employee_id,
    employee_name,
    salary,
    AVG(salary) OVER (
        ORDER BY employee_id
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as avg_salary_3_employees
FROM employees;

RANGE模式示例

sql
-- 计算当前日期及前30天内的销售总额
SELECT 
    sale_date,
    amount,
    SUM(amount) OVER (
        ORDER BY sale_date
        RANGE BETWEEN INTERVAL '30 days' PRECEDING AND CURRENT ROW
    ) as sales_last_30_days
FROM sales;

GROUPS模式示例

sql
-- 计算当前部门及前后各一个部门的平均员工数
SELECT 
    department,
    employee_count,
    AVG(employee_count) OVER (
        ORDER BY department
        GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) as avg_employee_count_3_depts
FROM dept_stats;

默认窗口帧

如果未指定窗口帧:

  • ORDER BY:默认为RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • ORDER BY:默认为整个分区

窗口函数的类型

1. 排名函数

函数描述示例结果
ROW_NUMBER()唯一的行号(即使值相同)1, 2, 3, 4...
RANK()相同值获得相同排名,会跳过重复的排名1, 1, 3, 4...
DENSE_RANK()相同值获得相同排名,不跳过排名1, 1, 2, 3...
sql
SELECT 
    product_name,
    price,
    ROW_NUMBER() OVER (ORDER BY price DESC) as row_num,
    RANK() OVER (ORDER BY price DESC) as rank,
    DENSE_RANK() OVER (ORDER BY price DESC) as dense_rank
FROM products;

2. 分析函数

函数描述
LEAD(expr, offset)访问后面第n行的值
LAG(expr, offset)访问前面第n行的值
FIRST_VALUE(expr)获取窗口帧中第一行的值
LAST_VALUE(expr)获取窗口帧中最后一行的值
sql
-- 计算当前产品与前一个产品的价格差异
SELECT 
    product_name,
    price,
    price - LAG(price, 1, 0) OVER (ORDER BY price) as price_diff_from_previous
FROM products;

3. 聚合函数作为窗口函数

任何聚合函数都可以作为窗口函数使用:

sql
-- 计算每个部门的工资总和和员工占比
SELECT 
    employee_name,
    department,
    salary,
    SUM(salary) OVER (PARTITION BY department) as dept_total_salary,
    (salary / SUM(salary) OVER (PARTITION BY department)) * 100 as salary_percentage
FROM employees;

FILTER子句

FILTER子句允许你在应用窗口函数之前过滤输入行。只有满足条件的行会被纳入计算。

sql
-- 计算每个部门中高工资员工的平均工资
SELECT 
    employee_name,
    department,
    salary,
    AVG(salary) FILTER (WHERE salary > 5000) OVER (PARTITION BY department) as high_earner_avg
FROM employees;

窗口函数的使用限制

  1. 窗口函数只能在SELECT列表和ORDER BY子句中使用
  2. 窗口函数不能嵌套使用
  3. 窗口函数不能在WHERE、HAVING、GROUP BY子句中使用
  4. 窗口特定函数不允许在函数参数列表中使用DISTINCT或ORDER BY

实际应用示例

示例1:计算月度销售额及同比增长率

sql
SELECT 
    month,
    sales,
    LAG(sales, 12) OVER (ORDER BY month) as sales_prev_year,
    (sales - LAG(sales, 12) OVER (ORDER BY month)) / LAG(sales, 12) OVER (ORDER BY month) * 100 as growth_rate
FROM monthly_sales;

示例2:识别连续登录天数

sql
SELECT 
    user_id,
    login_date,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) as login_streak
FROM user_logins;

示例3:计算移动平均值

sql
SELECT 
    product_id,
    date,
    price,
    AVG(price) OVER (
        PARTITION BY product_id 
        ORDER BY date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as seven_day_moving_avg
FROM daily_prices;

总结

窗口函数是SQL中强大的分析工具,它们允许你在保留详细数据的同时执行复杂的计算。通过适当地定义窗口(分区、排序和帧),你可以执行各种高级数据分析任务,如排名、累计、移动平均和相对比较。

理解窗口函数的关键是掌握如何定义"窗口"——即确定哪些行参与计算当前行的结果。通过实践和实验,你将能够充分利用这一强大功能。