Appearance
窗口函数调用
什么是窗口函数?
窗口函数是一种特殊的SQL函数,它允许我们对查询结果的一个"窗口"(即一组行)进行计算,而不会将这些行合并为单个输出行。与普通聚合函数不同,窗口函数保留所有原始行,但每行都可以访问相关行的数据。
想象一下,你正通过一个窗口查看数据表,这个窗口可以滑动或调整大小,让你在计算每行结果时看到相关的其他行。
窗口函数的常见用途
- 计算累计总和或移动平均值
- 对分组内的数据进行排名
- 访问当前行前后的数据
- 计算百分比或相对值
窗口函数调用语法
sql
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )
其中,window_definition
的语法为:
sql
[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]
简单示例
sql
-- 计算每个部门员工的工资排名
SELECT
employee_name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;
窗口定义的主要组成部分
1. PARTITION BY(分区)
PARTITION BY
子句将数据分成多个组(分区),窗口函数会在每个分区内单独计算。如果省略此子句,整个结果集被视为一个分区。
sql
-- 按部门分区,计算每个部门的平均工资
SELECT
employee_name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary
FROM employees;
2. ORDER BY(排序)
ORDER BY
子句决定了窗口函数处理分区内行的顺序。对于某些窗口函数(如排名函数),排序是必需的;对于其他函数,排序会影响计算结果。
sql
-- 计算员工的累计工资总和(按工资从低到高排序)
SELECT
employee_name,
salary,
SUM(salary) OVER (ORDER BY salary) as running_total
FROM employees;
3. 窗口帧(Window Frame)
窗口帧定义了当前分区内哪些行参与计算当前行的函数结果。帧可以用RANGE
、ROWS
或GROUPS
模式指定。
窗口帧语法
sql
{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]
其中frame_start
和frame_end
可以是:
UNBOUNDED PRECEDING
:从分区的第一行开始offset PRECEDING
:从当前行前面指定数量的行/组/范围开始CURRENT ROW
:从当前行开始offset FOLLOWING
:从当前行后面指定数量的行/组/范围开始UNBOUNDED FOLLOWING
:到分区的最后一行结束
frame_exclusion
可以是:
EXCLUDE CURRENT ROW
:排除当前行EXCLUDE GROUP
:排除当前行及其排序对等行EXCLUDE TIES
:排除与当前行相同的行,但保留当前行EXCLUDE NO OTHERS
:不排除任何行(默认行为)
三种窗口帧模式比较
模式 | 描述 | 使用场景 | 示例 |
---|---|---|---|
ROWS | 基于物理行数 | 需要精确控制包含多少行 | ROWS BETWEEN 3 PRECEDING AND CURRENT ROW |
RANGE | 基于当前行值的范围 | 处理具有相同值的行组 | RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
GROUPS | 基于排序后的对等组数量 | 处理具有相同排序值的组 | GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING |
ROWS模式示例
sql
-- 计算当前员工及前两名员工的平均工资
SELECT
employee_id,
employee_name,
salary,
AVG(salary) OVER (
ORDER BY employee_id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as avg_salary_3_employees
FROM employees;
RANGE模式示例
sql
-- 计算当前日期及前30天内的销售总额
SELECT
sale_date,
amount,
SUM(amount) OVER (
ORDER BY sale_date
RANGE BETWEEN INTERVAL '30 days' PRECEDING AND CURRENT ROW
) as sales_last_30_days
FROM sales;
GROUPS模式示例
sql
-- 计算当前部门及前后各一个部门的平均员工数
SELECT
department,
employee_count,
AVG(employee_count) OVER (
ORDER BY department
GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) as avg_employee_count_3_depts
FROM dept_stats;
默认窗口帧
如果未指定窗口帧:
- 有
ORDER BY
:默认为RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- 无
ORDER BY
:默认为整个分区
窗口函数的类型
1. 排名函数
函数 | 描述 | 示例结果 |
---|---|---|
ROW_NUMBER() | 唯一的行号(即使值相同) | 1, 2, 3, 4... |
RANK() | 相同值获得相同排名,会跳过重复的排名 | 1, 1, 3, 4... |
DENSE_RANK() | 相同值获得相同排名,不跳过排名 | 1, 1, 2, 3... |
sql
SELECT
product_name,
price,
ROW_NUMBER() OVER (ORDER BY price DESC) as row_num,
RANK() OVER (ORDER BY price DESC) as rank,
DENSE_RANK() OVER (ORDER BY price DESC) as dense_rank
FROM products;
2. 分析函数
函数 | 描述 |
---|---|
LEAD(expr, offset) | 访问后面第n行的值 |
LAG(expr, offset) | 访问前面第n行的值 |
FIRST_VALUE(expr) | 获取窗口帧中第一行的值 |
LAST_VALUE(expr) | 获取窗口帧中最后一行的值 |
sql
-- 计算当前产品与前一个产品的价格差异
SELECT
product_name,
price,
price - LAG(price, 1, 0) OVER (ORDER BY price) as price_diff_from_previous
FROM products;
3. 聚合函数作为窗口函数
任何聚合函数都可以作为窗口函数使用:
sql
-- 计算每个部门的工资总和和员工占比
SELECT
employee_name,
department,
salary,
SUM(salary) OVER (PARTITION BY department) as dept_total_salary,
(salary / SUM(salary) OVER (PARTITION BY department)) * 100 as salary_percentage
FROM employees;
FILTER子句
FILTER
子句允许你在应用窗口函数之前过滤输入行。只有满足条件的行会被纳入计算。
sql
-- 计算每个部门中高工资员工的平均工资
SELECT
employee_name,
department,
salary,
AVG(salary) FILTER (WHERE salary > 5000) OVER (PARTITION BY department) as high_earner_avg
FROM employees;
窗口函数的使用限制
- 窗口函数只能在SELECT列表和ORDER BY子句中使用
- 窗口函数不能嵌套使用
- 窗口函数不能在WHERE、HAVING、GROUP BY子句中使用
- 窗口特定函数不允许在函数参数列表中使用DISTINCT或ORDER BY
实际应用示例
示例1:计算月度销售额及同比增长率
sql
SELECT
month,
sales,
LAG(sales, 12) OVER (ORDER BY month) as sales_prev_year,
(sales - LAG(sales, 12) OVER (ORDER BY month)) / LAG(sales, 12) OVER (ORDER BY month) * 100 as growth_rate
FROM monthly_sales;
示例2:识别连续登录天数
sql
SELECT
user_id,
login_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) as login_streak
FROM user_logins;
示例3:计算移动平均值
sql
SELECT
product_id,
date,
price,
AVG(price) OVER (
PARTITION BY product_id
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as seven_day_moving_avg
FROM daily_prices;
总结
窗口函数是SQL中强大的分析工具,它们允许你在保留详细数据的同时执行复杂的计算。通过适当地定义窗口(分区、排序和帧),你可以执行各种高级数据分析任务,如排名、累计、移动平均和相对比较。
理解窗口函数的关键是掌握如何定义"窗口"——即确定哪些行参与计算当前行的结果。通过实践和实验,你将能够充分利用这一强大功能。