Appearance
PostgreSQL 例行清理(Routine Vacuuming)学习笔记
📋 概述
PostgreSQL 的例行清理是数据库维护的核心组成部分。在高并发、高频率更新的业务场景中,如电商平台的订单系统、金融系统的交易记录等,理解和掌握清理机制对于保证数据库性能和稳定性至关重要。
本笔记将通过真实业务场景和完整示例,深入解析 PostgreSQL 的清理机制,帮助您在实际工作中有效管理数据库维护。
🎯 核心价值
PostgreSQL 的 VACUUM 命令解决了四个关键问题:
- 磁盘空间管理:回收删除和更新操作产生的死行空间
- 查询性能优化:维护统计信息,确保查询计划器做出最优决策
- 索引效率提升:更新可见性映射,支持仅索引扫描
- 数据安全保障:防止事务 ID 回绕导致的数据丢失
🔧 1. 清理基础原理
1.1 MVCC 与死行产生机制
PostgreSQL 使用多版本并发控制(MVCC)来处理并发事务。这意味着 UPDATE 和 DELETE 操作不会立即物理删除数据,而是创建新版本或标记为删除。
1.2 业务场景示例:电商订单系统
假设我们有一个电商平台的订单表,每天处理大量订单状态更新:
sql
-- 创建订单表
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER,
order_status VARCHAR(20) DEFAULT 'pending',
order_amount DECIMAL(10,2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- 插入测试数据
INSERT INTO orders (customer_id, order_amount)
SELECT
(random() * 10000)::INTEGER,
(random() * 1000 + 10)::DECIMAL(10,2)
FROM generate_series(1, 100000);
模拟高频更新场景:
sql
-- 模拟订单状态变更(产生大量死行)
UPDATE orders
SET order_status = 'processing',
updated_at = CURRENT_TIMESTAMP
WHERE order_status = 'pending'
AND order_id % 3 = 0;
UPDATE orders
SET order_status = 'shipped',
updated_at = CURRENT_TIMESTAMP
WHERE order_status = 'processing'
AND order_id % 5 = 0;
UPDATE orders
SET order_status = 'completed',
updated_at = CURRENT_TIMESTAMP
WHERE order_status = 'shipped'
AND order_id % 2 = 0;
查看表膨胀情况:
sql
-- 查看表的页面使用情况
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation
FROM pg_stats
WHERE tablename = 'orders';
-- 查看表大小和死行统计
SELECT
relname,
n_tup_ins, -- 插入的行数
n_tup_upd, -- 更新的行数
n_tup_del, -- 删除的行数
n_live_tup, -- 活跃行数
n_dead_tup, -- 死行数
round((n_dead_tup::float / (n_live_tup + n_dead_tup) * 100), 2) as dead_ratio
FROM pg_stat_user_tables
WHERE relname = 'orders';
分析输出示例:
relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | dead_ratio
---------+-----------+-----------+-----------+------------+------------+------------
orders | 100000 | 45000 | 0 | 100000 | 45000 | 31.03
当死行比例超过 20% 时,会显著影响查询性能和存储效率,此时应考虑执行 VACUUM 操作。 :::
1.3 VACUUM 类型对比
类型 | 执行速度 | 空间回收 | 锁级别 | 适用场景 |
---|---|---|---|---|
VACUUM | 快速 | 标记可重用 | 较低锁 | 日常维护 |
VACUUM FULL | 缓慢 | 完全回收 | 排他锁 | 严重膨胀 |
🗄️ 2. 磁盘空间回收策略
2.1 标准 VACUUM 的工作原理
2.2 实践示例:清理订单表
执行标准清理:
sql
-- 执行详细清理,查看处理过程
VACUUM (VERBOSE, ANALYZE) orders;
清理输出解析:
INFO: vacuuming "public.orders"
INFO: "orders": removed 45000 row versions in 500 pages
INFO: "orders": found 45000 removable, 100000 nonremovable row versions in 1000 pages
INFO: "orders": truncated 1000 to 800 pages
DETAIL: CPU: user: 0.15 s, system: 0.05 s, elapsed: 0.25 s
清理效果验证:
sql
-- 清理前后对比查询
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as table_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) as data_size,
n_dead_tup,
round((n_dead_tup::float / (n_live_tup + n_dead_tup) * 100), 2) as dead_ratio
FROM pg_stat_user_tables t
JOIN pg_tables pt ON t.relname = pt.tablename
WHERE t.relname = 'orders';
2.3 业务场景决策矩阵
场景 | 死行比例 | 表大小 | 推荐策略 | 执行频率 |
---|---|---|---|---|
订单系统 | 10-20% | < 10GB | 标准 VACUUM | 每日 |
日志表 | > 50% | > 50GB | VACUUM FULL | 周末维护窗口 |
配置表 | < 5% | < 1GB | 自动清理 | 系统自动 |
历史归档 | > 80% | 任意 | TRUNCATE + 重建 | 按需 |
2.4 空间回收策略实现
监控脚本:
sql
-- 创建监控视图,识别需要清理的表
CREATE OR REPLACE VIEW table_bloat_monitor AS
SELECT
schemaname,
tablename,
n_live_tup,
n_dead_tup,
round((n_dead_tup::float / GREATEST(n_live_tup + n_dead_tup, 1) * 100), 2) as dead_ratio,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as table_size,
last_vacuum,
last_autovacuum,
CASE
WHEN n_dead_tup > 10000 AND (n_dead_tup::float / GREATEST(n_live_tup + n_dead_tup, 1)) > 0.2
THEN 'URGENT'
WHEN n_dead_tup > 5000 AND (n_dead_tup::float / GREATEST(n_live_tup + n_dead_tup, 1)) > 0.1
THEN 'MEDIUM'
ELSE 'LOW'
END as priority
FROM pg_stat_user_tables
ORDER BY dead_ratio DESC;
-- 查看需要清理的表
SELECT * FROM table_bloat_monitor WHERE priority IN ('URGENT', 'MEDIUM');
自动化清理脚本:
sql
-- 批量清理高优先级表的存储过程
CREATE OR REPLACE FUNCTION auto_vacuum_tables()
RETURNS TEXT AS $$
DECLARE
table_record RECORD;
result_text TEXT := '';
BEGIN
-- 遍历需要清理的表
FOR table_record IN
SELECT schemaname, tablename, dead_ratio
FROM table_bloat_monitor
WHERE priority = 'URGENT'
LOOP
-- 执行清理
EXECUTE format('VACUUM (VERBOSE) %I.%I', table_record.schemaname, table_record.tablename);
result_text := result_text || format('Vacuumed %s.%s (dead_ratio: %s%%) ',
table_record.schemaname,
table_record.tablename,
table_record.dead_ratio) || E'\n';
END LOOP;
RETURN result_text;
END;
$$ LANGUAGE plpgsql;
-- 执行批量清理
SELECT auto_vacuum_tables();
生产环境最佳实践
- 监控优先:建立表膨胀监控机制,在问题恶化前主动处理
- 分时执行:在业务低峰期执行大表清理,避免影响正常业务
- 渐进策略:优先使用标准 VACUUM,仅在必要时使用 VACUUM FULL
- 空间预留:为 VACUUM FULL 预留足够的磁盘空间(至少表大小的 2 倍) :::
2.5 特殊场景处理
批量删除场景:
sql
-- 场景:清理90天前的订单数据
-- 不推荐的做法(产生大量死行)
-- DELETE FROM orders WHERE created_at < CURRENT_DATE - INTERVAL '90 days';
-- 推荐的做法:使用 TRUNCATE 配合分区
-- 1. 如果有分区表,直接删除分区
-- 2. 如果没有分区,考虑分批删除 + 及时清理
-- 分批删除示例
DO $$
DECLARE
deleted_count INTEGER;
BEGIN
LOOP
-- 每次删除1000条记录
DELETE FROM orders
WHERE order_id IN (
SELECT order_id
FROM orders
WHERE created_at < CURRENT_DATE - INTERVAL '90 days'
LIMIT 1000
);
GET DIAGNOSTICS deleted_count = ROW_COUNT;
-- 如果删除数量为0,退出循环
IF deleted_count = 0 THEN
EXIT;
END IF;
-- 每删除1000条执行一次清理
VACUUM orders;
-- 稍作暂停,避免长时间占用资源
PERFORM pg_sleep(0.1);
END LOOP;
END $$;
这样的分批处理策略可以:
- 避免长时间锁表
- 及时回收空间
- 减少对业务的影响
- 保持系统响应性
📊 3. 查询计划器统计信息更新
3.1 统计信息的重要性
PostgreSQL 查询计划器依赖统计信息来生成最优的执行计划。过时的统计信息会导致错误的成本估算,从而选择低效的查询路径。
3.2 业务场景:电商商品搜索优化
场景描述:电商平台的商品表经常有价格更新、库存变动等操作,需要及时更新统计信息以保证搜索性能。
sql
-- 创建商品表
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
category_id INTEGER,
product_name VARCHAR(200),
price DECIMAL(10,2),
stock_quantity INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- 创建索引
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_products_price ON products(price);
CREATE INDEX idx_products_stock ON products(stock_quantity);
-- 插入测试数据
INSERT INTO products (category_id, product_name, price, stock_quantity)
SELECT
(random() * 100)::INTEGER,
'Product ' || generate_series,
(random() * 1000 + 10)::DECIMAL(10,2),
(random() * 1000)::INTEGER
FROM generate_series(1, 500000);
3.3 统计信息分析实践
查看当前统计信息:
sql
-- 查看表的基础统计信息
SELECT
schemaname,
tablename,
attname, -- 列名
n_distinct, -- 不同值的数量估计
most_common_vals[1:5], -- 最常见的值
most_common_freqs[1:5], -- 最常见值的频率
correlation -- 物理存储与逻辑顺序的相关性
FROM pg_stats
WHERE tablename = 'products'
AND attname IN ('category_id', 'price', 'stock_quantity')
ORDER BY attname;
统计信息质量评估:
sql
-- 检查统计信息的新鲜度
SELECT
schemaname,
tablename,
n_tup_ins, -- 自上次分析以来插入的行数
n_tup_upd, -- 自上次分析以来更新的行数
n_tup_del, -- 自上次分析以来删除的行数
last_analyze, -- 最后一次手动分析时间
last_autoanalyze, -- 最后一次自动分析时间
-- 计算变化比例
round((n_tup_ins + n_tup_upd + n_tup_del)::numeric /
GREATEST(n_live_tup, 1) * 100, 2) as change_ratio
FROM pg_stat_user_tables
WHERE relname = 'products';
3.4 执行计划优化示例
模拟价格区间查询场景:
sql
-- 模拟大量价格更新(影响统计信息)
UPDATE products
SET price = price * (0.8 + random() * 0.4),
updated_at = CURRENT_TIMESTAMP
WHERE product_id % 10 = 0;
-- 查看执行计划(统计信息过时)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products
WHERE price BETWEEN 100 AND 200
AND stock_quantity > 50;
更新统计信息后的效果对比:
sql
-- 更新统计信息
ANALYZE products;
-- 再次查看执行计划
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products
WHERE price BETWEEN 100 AND 200
AND stock_quantity > 50;
执行计划对比分析:
-- 更新前(过时统计信息):
Seq Scan on products (cost=0.00..15000.00 rows=1000 width=50) (actual time=0.500..120.000 rows=15000 loops=1)
Filter: ((price >= 100) AND (price <= 200) AND (stock_quantity > 50))
Rows Removed by Filter: 485000
-- 更新后(准确统计信息):
Index Scan using idx_products_price on products (cost=0.42..800.00 rows=15000 width=50) (actual time=0.100..25.000 rows=15000 loops=1)
Index Cond: ((price >= 100) AND (price <= 200))
Filter: (stock_quantity > 50)
Rows Removed by Filter: 2000
性能提升说明更新统计信息后,查询计划器正确估算了结果集大小,选择了索引扫描而非全表扫描,执行时间从 120ms 降低到 25ms,提升了约 80% 的性能。 :::
3.5 分列统计信息调优
针对高选择性列调整统计信息详细程度:
sql
-- 为价格列设置更详细的统计信息
ALTER TABLE products ALTER COLUMN price SET STATISTICS 1000;
-- 为低选择性的类别列设置较少的统计信息
ALTER TABLE products ALTER COLUMN category_id SET STATISTICS 50;
-- 重新收集统计信息
ANALYZE products;
-- 查看调整后的统计信息分布
SELECT
attname,
n_distinct,
array_length(most_common_vals, 1) as mcv_count,
array_length(histogram_bounds, 1) as histogram_buckets
FROM pg_stats
WHERE tablename = 'products'
AND attname IN ('category_id', 'price');
👁️ 4. 可见性映射与仅索引扫描
4.1 可见性映射机制
可见性映射是 PostgreSQL 的一个重要优化特性,它跟踪哪些数据页包含的所有元组对所有事务都是可见的。
4.2 仅索引扫描优化实例
创建适合仅索引扫描的场景:
sql
-- 创建用户活动表(适合仅索引扫描的场景)
CREATE TABLE user_activities (
user_id INTEGER,
activity_date DATE,
activity_count INTEGER,
last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- 创建覆盖索引
CREATE INDEX idx_user_activities_covering
ON user_activities (user_id, activity_date)
INCLUDE (activity_count);
-- 插入大量历史数据(稳定数据,适合可见性映射优化)
INSERT INTO user_activities (user_id, activity_date, activity_count)
SELECT
(random() * 10000)::INTEGER,
CURRENT_DATE - (random() * 365)::INTEGER,
(random() * 100)::INTEGER
FROM generate_series(1, 1000000);
执行 VACUUM 以建立可见性映射:
sql
-- 执行清理以更新可见性映射
VACUUM (VERBOSE) user_activities;
验证仅索引扫描效果:
sql
-- 查询能够使用仅索引扫描的SQL
EXPLAIN (ANALYZE, BUFFERS)
SELECT user_id, activity_date, activity_count
FROM user_activities
WHERE user_id = 1001
AND activity_date >= CURRENT_DATE - INTERVAL '30 days';
执行计划分析:
Index Only Scan using idx_user_activities_covering on user_activities
(cost=0.42..15.00 rows=5 width=12) (actual time=0.050..0.250 rows=5 loops=1)
Index Cond: ((user_id = 1001) AND (activity_date >= (CURRENT_DATE - '30 days'::interval)))
Heap Fetches: 0 -- 关键指标:没有访问堆页面
Buffers: shared hit=3
仅索引扫描优化要点
- Heap Fetches: 0 表示完全避免了堆页面访问,这是最佳状态
- 如果 Heap Fetches > 0,说明部分页面的可见性映射尚未更新,需要执行 VACUUM
- 仅索引扫描要求索引包含查询所需的所有列(使用 INCLUDE 子句) :::
4.3 可见性映射监控
创建可见性映射监控视图:
sql
-- 查看表的可见性映射状态
SELECT
schemaname,
tablename,
n_tup_ins,
n_tup_upd,
n_tup_del,
last_vacuum,
last_autovacuum,
-- 计算可能需要更新可见性映射的比例
CASE
WHEN last_vacuum IS NULL AND last_autovacuum IS NULL THEN 'NEVER_VACUUMED'
WHEN (n_tup_upd + n_tup_del) > n_live_tup * 0.1 THEN 'HIGH_ACTIVITY'
WHEN (n_tup_upd + n_tup_del) > n_live_tup * 0.05 THEN 'MEDIUM_ACTIVITY'
ELSE 'LOW_ACTIVITY'
END as vacuum_priority
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY (n_tup_upd + n_tup_del) DESC;
⚠️ 5. 防止事务 ID 回绕失败
5.1 事务 ID 回绕问题原理
PostgreSQL 使用 32 位事务 ID(XID),这意味着最多可以有约 42 亿个事务。当事务 ID 用完后会回绕到 0,这会导致数据可见性问题。
5.2 事务 ID 年龄监控
创建事务 ID 监控脚本:
sql
-- 查看数据库级别的事务ID年龄
SELECT
datname,
age(datfrozenxid) as db_age,
datfrozenxid,
-- 计算距离强制清理的剩余事务数
(2000000000 - age(datfrozenxid)) as transactions_until_wraparound
FROM pg_database
WHERE datname NOT IN ('template0', 'template1')
ORDER BY age(datfrozenxid) DESC;
-- 查看表级别的事务ID年龄
SELECT
schemaname,
tablename,
greatest(age(relfrozenxid), age(relminmxid)) as max_age,
age(relfrozenxid) as xid_age,
age(relminmxid) as mxid_age,
-- 评估紧急程度
CASE
WHEN greatest(age(relfrozenxid), age(relminmxid)) > 1500000000 THEN 'CRITICAL'
WHEN greatest(age(relfrozenxid), age(relminmxid)) > 1000000000 THEN 'WARNING'
WHEN greatest(age(relfrozenxid), age(relminmxid)) > 500000000 THEN 'ATTENTION'
ELSE 'NORMAL'
END as status
FROM pg_stat_user_tables t
JOIN pg_class c ON t.relid = c.oid
ORDER BY max_age DESC;
5.3 预防性清理策略
配置参数优化:
sql
-- 查看当前防回绕相关参数
SELECT
name,
setting,
unit,
short_desc
FROM pg_settings
WHERE name IN (
'autovacuum_freeze_max_age',
'vacuum_freeze_min_age',
'vacuum_freeze_table_age',
'autovacuum_multixact_freeze_max_age'
)
ORDER BY name;
紧急回绕处理程序:
sql
-- 创建紧急处理函数
CREATE OR REPLACE FUNCTION emergency_xid_cleanup()
RETURNS TABLE(
table_name TEXT,
xid_age BIGINT,
action_taken TEXT
) AS $$
DECLARE
table_record RECORD;
emergency_threshold BIGINT := 1800000000; -- 18亿事务的紧急阈值
BEGIN
-- 查找需要紧急处理的表
FOR table_record IN
SELECT
t.schemaname,
t.tablename,
greatest(age(c.relfrozenxid), age(c.relminmxid)) as max_age
FROM pg_stat_user_tables t
JOIN pg_class c ON t.relid = c.oid
WHERE greatest(age(c.relfrozenxid), age(c.relminmxid)) > emergency_threshold
ORDER BY max_age DESC
LOOP
-- 执行强制清理
EXECUTE format('VACUUM (FREEZE, VERBOSE) %I.%I',
table_record.schemaname,
table_record.tablename);
-- 返回处理结果
table_name := table_record.schemaname || '.' || table_record.tablename;
xid_age := table_record.max_age;
action_taken := 'VACUUM FREEZE executed';
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- 执行紧急清理(当收到回绕警告时)
SELECT * FROM emergency_xid_cleanup();
5.4 回绕预警系统
创建监控告警:
sql
-- 创建回绕预警视图
CREATE OR REPLACE VIEW xid_wraparound_alert AS
SELECT
'DATABASE' as object_type,
datname as object_name,
age(datfrozenxid) as current_age,
(2000000000 - age(datfrozenxid)) as remaining_transactions,
CASE
WHEN age(datfrozenxid) > 1800000000 THEN 'CRITICAL - Immediate action required'
WHEN age(datfrozenxid) > 1500000000 THEN 'HIGH - Schedule emergency maintenance'
WHEN age(datfrozenxid) > 1000000000 THEN 'MEDIUM - Plan maintenance window'
ELSE 'LOW - Normal monitoring'
END as alert_level
FROM pg_database
WHERE datname NOT IN ('template0', 'template1')
UNION ALL
SELECT
'TABLE' as object_type,
schemaname || '.' || tablename as object_name,
greatest(age(c.relfrozenxid), age(c.relminmxid)) as current_age,
(2000000000 - greatest(age(c.relfrozenxid), age(c.relminmxid))) as remaining_transactions,
CASE
WHEN greatest(age(c.relfrozenxid), age(c.relminmxid)) > 1800000000 THEN 'CRITICAL - Immediate action required'
WHEN greatest(age(c.relfrozenxid), age(c.relminmxid)) > 1500000000 THEN 'HIGH - Schedule emergency maintenance'
WHEN greatest(age(c.relfrozenxid), age(c.relminmxid)) > 1000000000 THEN 'MEDIUM - Plan maintenance window'
ELSE 'LOW - Normal monitoring'
END as alert_level
FROM pg_stat_user_tables t
JOIN pg_class c ON t.relid = c.oid
ORDER BY current_age DESC;
-- 查看当前告警状态
SELECT * FROM xid_wraparound_alert WHERE alert_level != 'LOW - Normal monitoring';
🤖 6. 自动清理守护进程(Autovacuum)
6.1 Autovacuum 工作机制
6.2 Autovacuum 配置优化
查看当前配置:
sql
-- 查看 autovacuum 相关配置
SELECT
name,
setting,
unit,
short_desc
FROM pg_settings
WHERE name LIKE '%autovacuum%'
OR name LIKE '%vacuum%threshold%'
OR name LIKE '%vacuum%scale%'
ORDER BY name;
业务场景配置实例:
sql
-- 针对不同业务表的个性化配置
-- 1. 高频交易表:降低阈值,增加清理频率
ALTER TABLE orders SET (
autovacuum_vacuum_threshold = 1000,
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_analyze_threshold = 500,
autovacuum_analyze_scale_factor = 0.02
);
-- 2. 日志表:延长清理间隔,降低系统影响
ALTER TABLE user_logs SET (
autovacuum_vacuum_threshold = 10000,
autovacuum_vacuum_scale_factor = 0.2,
autovacuum_vacuum_cost_delay = 50
);
-- 3. 配置表:几乎不变化,减少自动分析
ALTER TABLE system_config SET (
autovacuum_analyze_threshold = 1000,
autovacuum_analyze_scale_factor = 0.1
);
-- 4. 归档表:禁用自动清理(手动管理)
ALTER TABLE historical_data SET (
autovacuum_enabled = false
);
6.3 Autovacuum 性能监控
创建监控仪表板:
sql
-- Autovacuum 活动监控
CREATE OR REPLACE VIEW autovacuum_activity AS
SELECT
datname,
usename,
application_name,
client_addr,
backend_start,
xact_start,
query_start,
state_change,
state,
query,
-- 计算运行时长
CASE
WHEN state = 'active' THEN
extract(epoch from now() - query_start)::INTEGER
ELSE NULL
END as running_seconds
FROM pg_stat_activity
WHERE application_name = 'autovacuum worker'
OR query LIKE '%autovacuum%'
ORDER BY query_start;
-- 表级别的 autovacuum 统计
CREATE OR REPLACE VIEW table_autovacuum_stats AS
SELECT
schemaname,
tablename,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup,
round((n_dead_tup::float / GREATEST(n_live_tup + n_dead_tup, 1) * 100), 2) as dead_ratio,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
vacuum_count,
autovacuum_count,
analyze_count,
autoanalyze_count,
-- 计算 autovacuum 效率
CASE
WHEN autovacuum_count > 0 THEN
round(vacuum_count::float / autovacuum_count, 2)
ELSE NULL
END as manual_to_auto_ratio
FROM pg_stat_user_tables
ORDER BY dead_ratio DESC;
-- 查看需要关注的表
SELECT * FROM table_autovacuum_stats
WHERE dead_ratio > 20 OR last_autovacuum < CURRENT_DATE - INTERVAL '7 days';
6.4 问题诊断与调优
常见问题诊断:
sql
-- 诊断 autovacuum 未按预期工作的原因
WITH autovacuum_config AS (
SELECT
schemaname,
tablename,
n_live_tup,
n_dead_tup,
-- 计算当前的清理阈值
COALESCE(
(SELECT setting::INTEGER FROM pg_settings WHERE name = 'autovacuum_vacuum_threshold'),
50
) as vacuum_threshold,
COALESCE(
(SELECT setting::FLOAT FROM pg_settings WHERE name = 'autovacuum_vacuum_scale_factor'),
0.2
) as vacuum_scale_factor
FROM pg_stat_user_tables
)
SELECT
schemaname,
tablename,
n_live_tup,
n_dead_tup,
vacuum_threshold,
vacuum_scale_factor,
-- 计算实际阈值
(vacuum_threshold + vacuum_scale_factor * n_live_tup)::INTEGER as calculated_threshold,
-- 判断是否应该触发清理
CASE
WHEN n_dead_tup >= (vacuum_threshold + vacuum_scale_factor * n_live_tup)
THEN 'SHOULD_VACUUM'
ELSE 'NOT_NEEDED'
END as vacuum_status,
-- 计算达到阈值还需要多少死行
GREATEST(0, (vacuum_threshold + vacuum_scale_factor * n_live_tup)::INTEGER - n_dead_tup) as rows_until_vacuum
FROM autovacuum_config
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC;
性能调优建议:
sql
-- 生成个性化的 autovacuum 调优建议
CREATE OR REPLACE FUNCTION autovacuum_tuning_advice()
RETURNS TABLE(
table_name TEXT,
current_dead_ratio NUMERIC,
advice TEXT,
suggested_config TEXT
) AS $$
DECLARE
table_record RECORD;
BEGIN
FOR table_record IN
SELECT
t.schemaname,
t.tablename,
t.n_live_tup,
t.n_dead_tup,
round((t.n_dead_tup::float / GREATEST(t.n_live_tup + t.n_dead_tup, 1) * 100), 2) as dead_ratio,
pg_size_pretty(pg_total_relation_size(t.schemaname||'.'||t.tablename)) as table_size,
t.autovacuum_count,
extract(days from now() - COALESCE(t.last_autovacuum, t.last_vacuum)) as days_since_vacuum
FROM pg_stat_user_tables t
WHERE t.n_live_tup > 1000 -- 只关注有一定规模的表
LOOP
table_name := table_record.schemaname || '.' || table_record.tablename;
current_dead_ratio := table_record.dead_ratio;
-- 根据不同情况给出建议
IF table_record.dead_ratio > 50 THEN
advice := 'Critical: High dead tuple ratio, consider immediate manual VACUUM';
suggested_config := format('autovacuum_vacuum_scale_factor = 0.1, autovacuum_vacuum_threshold = %s',
LEAST(1000, table_record.n_live_tup * 0.05)::INTEGER);
ELSIF table_record.dead_ratio > 30 THEN
advice := 'Warning: Moderate dead tuple ratio, tune autovacuum more aggressively';
suggested_config := format('autovacuum_vacuum_scale_factor = 0.15, autovacuum_vacuum_threshold = %s',
LEAST(2000, table_record.n_live_tup * 0.1)::INTEGER);
ELSIF table_record.days_since_vacuum > 7 AND table_record.dead_ratio > 10 THEN
advice := 'Info: Table not vacuumed recently, check autovacuum activity';
suggested_config := 'Check autovacuum_naptime and autovacuum_max_workers settings';
ELSIF table_record.autovacuum_count = 0 AND table_record.dead_ratio > 5 THEN
advice := 'Info: Autovacuum never ran, check if autovacuum is enabled';
suggested_config := 'Verify autovacuum = on and check table-specific settings';
ELSE
advice := 'OK: Table appears to be managed well by autovacuum';
suggested_config := 'Current configuration appears adequate';
END IF;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- 获取调优建议
SELECT * FROM autovacuum_tuning_advice()
WHERE advice NOT LIKE 'OK:%'
ORDER BY current_dead_ratio DESC;
生产环境注意事项
- 资源控制:合理设置
autovacuum_max_workers
和autovacuum_vacuum_cost_delay
- 业务影响:在业务高峰期适当延长
autovacuum_naptime
- 监控告警:建立 autovacuum 长时间运行的告警机制
- 分区表处理:分区表需要单独处理每个分区,合理规划分区大小 :::
📝 总结与最佳实践
核心要点回顾
- 理解 MVCC 机制:UPDATE/DELETE 产生死行,需要 VACUUM 回收空间
- 平衡清理策略:标准 VACUUM 用于日常维护,VACUUM FULL 仅用于特殊情况
- 重视统计信息:及时的 ANALYZE 是查询性能优化的基础
- 监控事务 ID 年龄:防止事务 ID 回绕导致的数据丢失
- 合理配置 Autovacuum:根据业务特点个性化配置清理参数
生产环境检查清单
- 建立表膨胀监控机制
- 配置事务 ID 年龄告警
- 根据业务特点调整 autovacuum 参数
- 定期检查长时间运行的 VACUUM 操作
- 为大表预留 VACUUM FULL 的维护窗口
- 监控可见性映射更新状态,优化仅索引扫描效果
通过系统性地应用这些清理策略,可以确保 PostgreSQL 数据库在高负载环境下保持优秀的性能和稳定性。