Skip to content

PostgreSQL 进度报告详解

概述

PostgreSQL 提供了强大的进度报告功能,让我们能够实时监控长时间运行的数据库操作。这对于生产环境中的运维管理至关重要,特别是在处理大型数据库时,能够帮助我们:

  • 预估完成时间:了解操作还需要多长时间完成
  • 资源监控:观察系统资源使用情况
  • 问题诊断:识别性能瓶颈和异常情况
  • 用户体验:为应用层提供进度反馈

INFO

PostgreSQL 当前支持进度报告的命令包括:ANALYZECLUSTERCREATE INDEXVACUUMCOPYBASE_BACKUP

ANALYZE 进度报告

业务场景

在数据仓库环境中,我们经常需要对大表执行 ANALYZE 来更新统计信息,这对查询优化器生成高效执行计划至关重要。

监控视图:pg_stat_progress_analyze

列名类型描述业务意义
pidinteger后端进程ID用于识别和管理特定的分析任务
datnamename数据库名称确认正在分析的数据库
relidoid表的OID识别正在分析的具体表
phasetext当前阶段了解分析进展到哪个步骤
sample_blks_totalbigint待采样总块数预估工作量
sample_blks_scannedbigint已扫描块数当前进度

实际应用示例

问题陈述:需要对一个包含1000万行记录的订单表执行分析,并实时监控进度。

解决方案

sql
-- 会话1:启动分析任务
ANALYZE orders;

-- 会话2:监控分析进度
SELECT 
    pid,
    datname,
    phase,
    sample_blks_scanned,
    sample_blks_total,
    ROUND(
        (sample_blks_scanned::numeric / sample_blks_total::numeric) * 100, 2
    ) AS progress_percentage,
    ext_stats_computed,
    ext_stats_total
FROM pg_stat_progress_analyze
WHERE relid = 'orders'::regclass;

分析过程

  1. 初始化阶段:PostgreSQL 准备分析任务,时间很短
  2. 采样阶段:按照 default_statistics_target 设置采样数据
  3. 统计计算:基于样本计算表的统计信息
  4. 更新系统表:将统计信息写入 pg_statistic

输入和输出示例

sql
-- 输入:监控查询
SELECT pid, phase, sample_blks_scanned, sample_blks_total 
FROM pg_stat_progress_analyze;

-- 输出:进度信息
 pid  |          phase           | sample_blks_scanned | sample_blks_total 
------+--------------------------+--------------------+-------------------
 1234 | acquiring sample rows    |                1500 |             25000

优化建议

对于大表分析,可以调整 default_statistics_target 参数来平衡分析精度和执行时间

CLUSTER 进度报告

业务场景

在数据整理和性能优化中,CLUSTER 操作能够根据索引物理重排表数据,提高查询性能。这在 OLAP 场景中特别有用。

监控视图:pg_stat_progress_cluster

sql
-- 监控 CLUSTER 操作进度
SELECT 
    pid,
    command,  -- CLUSTER 或 VACUUM FULL
    phase,
    cluster_index_relid::regclass AS cluster_index,
    heap_tuples_scanned,
    heap_tuples_written,
    heap_blks_total,
    heap_blks_scanned,
    ROUND(
        (heap_blks_scanned::numeric / heap_blks_total::numeric) * 100, 2
    ) AS scan_progress_percentage
FROM pg_stat_progress_cluster;

实际应用示例

问题陈述:对按时间分区的日志表执行 CLUSTER 操作,按时间戳索引重新排列数据以提高查询性能。

解决方案

sql
-- 创建测试表和索引
CREATE TABLE access_logs (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ,
    user_id INTEGER,
    action TEXT
);

-- 创建时间戳索引
CREATE INDEX idx_access_logs_timestamp ON access_logs(timestamp);

-- 执行 CLUSTER 操作
CLUSTER access_logs USING idx_access_logs_timestamp;

监控脚本

sql
-- 实时监控 CLUSTER 进度
WITH cluster_progress AS (
    SELECT 
        pid,
        phase,
        heap_tuples_scanned,
        heap_tuples_written,
        heap_blks_total,
        heap_blks_scanned,
        CASE 
            WHEN heap_blks_total > 0 THEN
                ROUND((heap_blks_scanned::numeric / heap_blks_total::numeric) * 100, 2)
            ELSE 0
        END AS progress_pct
    FROM pg_stat_progress_cluster
)
SELECT 
    pid,
    phase,
    progress_pct || '%' AS progress,
    heap_tuples_scanned AS tuples_processed,
    heap_tuples_written AS tuples_written
FROM cluster_progress;

注意事项

CLUSTER 操作会锁定表,在生产环境中建议在维护窗口期间执行

COPY 进度报告

业务场景

数据迁移和批量导入是数据库管理中的常见任务。COPY 命令的进度监控对于大文件导入非常重要。

监控视图:pg_stat_progress_copy

关键指标说明业务价值
bytes_processed已处理字节数了解数据传输进度
bytes_total总文件大小预估完成时间
tuples_processed已处理元组数了解记录导入数量
tuples_excluded被WHERE排除的元组了解过滤效果

实际应用示例

问题陈述:从 CSV 文件导入 500 万条用户数据,需要实时监控导入进度和错误处理。

解决方案

sql
-- 创建目标表
CREATE TABLE users_import (
    id INTEGER,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP,
    status VARCHAR(20)
);

-- 执行 COPY 操作(在一个会话中)
COPY users_import FROM '/data/users.csv' 
WITH (
    FORMAT csv,
    HEADER true,
    DELIMITER ',',
    ON_ERROR stop  -- 遇到错误时停止
);

监控脚本

sql
-- 监控 COPY 进度(在另一个会话中)
SELECT 
    pid,
    command,
    type,
    bytes_processed,
    bytes_total,
    CASE 
        WHEN bytes_total > 0 THEN
            ROUND((bytes_processed::numeric / bytes_total::numeric) * 100, 2)
        ELSE 0
    END AS progress_percentage,
    tuples_processed,
    tuples_excluded,
    tuples_skipped,
    -- 计算处理速度 (MB/s)
    ROUND(bytes_processed / 1024.0 / 1024.0, 2) AS mb_processed
FROM pg_stat_progress_copy;

输出示例

 pid  | command   | type | bytes_processed | bytes_total | progress_percentage | tuples_processed 
------+-----------+------+----------------+-------------+--------------------+------------------
 5678 | COPY FROM | FILE |     104857600  |  524288000  |               20.00 |           100000

高级监控查询

sql
-- 包含速度和预估时间的详细监控
WITH copy_stats AS (
    SELECT 
        pid,
        command,
        bytes_processed,
        bytes_total,
        tuples_processed,
        extract(epoch from (now() - pg_stat_get_backend_start(pid))) AS elapsed_seconds
    FROM pg_stat_progress_copy
)
SELECT 
    pid,
    command,
    ROUND(bytes_processed / 1024.0 / 1024.0, 2) AS mb_processed,
    ROUND(bytes_total / 1024.0 / 1024.0, 2) AS mb_total,
    ROUND((bytes_processed::numeric / bytes_total::numeric) * 100, 2) AS progress_pct,
    tuples_processed,
    ROUND(bytes_processed / 1024.0 / 1024.0 / elapsed_seconds, 2) AS mb_per_second,
    ROUND((bytes_total - bytes_processed) / 1024.0 / 1024.0 / 
          (bytes_processed / 1024.0 / 1024.0 / elapsed_seconds), 0) AS eta_seconds
FROM copy_stats
WHERE bytes_total > 0;
错误处理策略

使用 ON_ERROR 选项可以控制遇到错误时的行为:

  • stop:遇到错误立即停止(默认)
  • ignore:忽略错误行继续处理

CREATE INDEX 进度报告

业务场景

在生产环境中创建索引,特别是在大表上创建索引,可能需要几个小时甚至更长时间。进度监控帮助我们:

  • 评估索引创建对系统性能的影响
  • 决定是否需要并发索引创建
  • 监控不同阶段的资源使用

监控视图:pg_stat_progress_create_index

实际应用示例

问题陈述:在包含 2000 万行的交易表上创建复合索引,需要使用并发方式以减少对业务的影响。

解决方案

sql
-- 创建并发索引以减少锁定时间
CREATE INDEX CONCURRENTLY idx_transactions_user_date 
ON transactions(user_id, transaction_date) 
WHERE status = 'completed';

详细监控脚本

sql
-- 全面的索引创建进度监控
SELECT 
    pid,
    datname,
    schemaname,
    tablename,
    indexname,
    command,
    phase,
    -- 锁等待信息
    lockers_total,
    lockers_done,
    current_locker_pid,
    -- 处理进度
    blocks_total,
    blocks_done,
    CASE 
        WHEN blocks_total > 0 THEN
            ROUND((blocks_done::numeric / blocks_total::numeric) * 100, 2)
        ELSE 0
    END AS blocks_progress_pct,
    -- 元组处理进度
    tuples_total,
    tuples_done,
    CASE 
        WHEN tuples_total > 0 THEN
            ROUND((tuples_done::numeric / tuples_total::numeric) * 100, 2)
        ELSE 0
    END AS tuples_progress_pct,
    -- 分区信息(如果适用)
    partitions_total,
    partitions_done
FROM pg_stat_progress_create_index pci
LEFT JOIN pg_stat_user_indexes pui ON pci.index_relid = pui.indexrelid
LEFT JOIN pg_stat_user_tables put ON pci.relid = put.relid;

阶段解释和监控要点

sql
-- 查看当前是否在初始化阶段
SELECT pid, phase, command 
FROM pg_stat_progress_create_index 
WHERE phase = 'initializing';

-- 这个阶段通常很短,主要是准备工作
sql
-- 监控锁等待情况
SELECT 
    pid,
    phase,
    lockers_total,
    lockers_done,
    current_locker_pid,
    ROUND((lockers_done::numeric / lockers_total::numeric) * 100, 2) AS lock_progress
FROM pg_stat_progress_create_index 
WHERE phase LIKE '%waiting%';
sql
-- 监控索引构建进度
SELECT 
    pid,
    phase,
    blocks_done,
    blocks_total,
    tuples_done,
    tuples_total,
    ROUND((blocks_done::numeric / blocks_total::numeric) * 100, 2) AS progress
FROM pg_stat_progress_create_index 
WHERE phase = 'building index';

性能优化建议

索引创建优化

  1. 增加 work_mem:提高排序操作的内存使用
  2. 调整 maintenance_work_mem:影响索引创建速度
  3. 并发创建:使用 CREATE INDEX CONCURRENTLY 减少锁定时间
  4. 分批创建:对于多个索引,避免同时创建

VACUUM 进度报告

业务场景

VACUUM 是 PostgreSQL 维护的核心操作,定期执行以回收空间和更新统计信息。在大表上的 VACUUM 操作可能耗时很长。

监控视图:pg_stat_progress_vacuum

实际应用示例

问题陈述:对一个频繁更新的用户行为日志表执行 VACUUM 操作,需要监控空间回收效果。

解决方案

sql
-- 执行 VACUUM 操作
VACUUM (VERBOSE, ANALYZE) user_behavior_logs;

综合监控脚本

sql
-- VACUUM 进度的全面监控
WITH vacuum_progress AS (
    SELECT 
        pid,
        datname,
        schemaname,
        tablename,
        phase,
        heap_blks_total,
        heap_blks_scanned,
        heap_blks_vacuumed,
        index_vacuum_count,
        max_dead_tuple_bytes,
        dead_tuple_bytes,
        num_dead_item_ids,
        indexes_total,
        indexes_processed
    FROM pg_stat_progress_vacuum pv
    LEFT JOIN pg_stat_user_tables put ON pv.relid = put.relid
),
vacuum_metrics AS (
    SELECT 
        *,
        CASE 
            WHEN heap_blks_total > 0 THEN
                ROUND((heap_blks_scanned::numeric / heap_blks_total::numeric) * 100, 2)
            ELSE 0
        END AS scan_progress_pct,
        CASE 
            WHEN max_dead_tuple_bytes > 0 THEN
                ROUND((dead_tuple_bytes::numeric / max_dead_tuple_bytes::numeric) * 100, 2)
            ELSE 0
        END AS dead_tuple_fill_pct,
        CASE 
            WHEN indexes_total > 0 THEN
                ROUND((indexes_processed::numeric / indexes_total::numeric) * 100, 2)
            ELSE 0
        END AS index_progress_pct
    FROM vacuum_progress
)
SELECT 
    pid,
    tablename,
    phase,
    scan_progress_pct || '%' AS scan_progress,
    heap_blks_scanned || '/' || heap_blks_total AS blocks_scanned,
    heap_blks_vacuumed AS blocks_vacuumed,
    index_vacuum_count AS index_cycles,
    dead_tuple_fill_pct || '%' AS dead_tuple_buffer,
    ROUND(dead_tuple_bytes / 1024.0 / 1024.0, 2) AS dead_tuple_mb,
    index_progress_pct || '%' AS index_progress
FROM vacuum_metrics;

阶段详细分析

阶段作用监控重点优化建议
扫描堆标记已删除元组heap_blks_scanned / heap_blks_total增加 maintenance_work_mem
清理索引清理索引中的死元组index_vacuum_count监控索引数量和大小
清理堆回收堆中的空间heap_blks_vacuumed观察实际回收的块数
截断堆释放文件末尾的空页无特定指标确保有足够的 ACCESS EXCLUSIVE

死元组管理监控

sql
-- 监控死元组累积情况
SELECT 
    pid,
    tablename,
    phase,
    dead_tuple_bytes,
    max_dead_tuple_bytes,
    num_dead_item_ids,
    ROUND(
        (dead_tuple_bytes::numeric / max_dead_tuple_bytes::numeric) * 100, 2
    ) AS memory_usage_pct,
    CASE 
        WHEN dead_tuple_bytes >= max_dead_tuple_bytes THEN 
            'FULL - 将触发索引清理'
        WHEN dead_tuple_bytes > max_dead_tuple_bytes * 0.8 THEN
            'HIGH - 接近触发阈值'
        ELSE
            'OK'
    END AS memory_status
FROM pg_stat_progress_vacuum pv
LEFT JOIN pg_stat_user_tables put ON pv.relid = put.relid;

重要提醒

dead_tuple_bytes 达到 max_dead_tuple_bytes 时,VACUUM 将自动触发索引清理循环,这可能显著增加执行时间

基本备份进度报告

业务场景

基本备份是 PostgreSQL 灾难恢复的重要组成部分。对于大型数据库,备份可能需要几个小时,进度监控至关重要。

监控视图:pg_stat_progress_basebackup

实际应用示例

问题陈述:执行生产数据库的基本备份,需要监控备份进度和预估完成时间。

解决方案

bash
# 启动 pg_basebackup(在终端中执行)
pg_basebackup -D /backup/postgres_backup_$(date +%Y%m%d) \
  -Ft -z -P -v \
  --wal-method=stream \
  --write-recovery-conf

监控脚本

sql
-- 监控基本备份进度
WITH backup_progress AS (
    SELECT 
        pid,
        phase,
        backup_total,
        backup_streamed,
        tablespaces_total,
        tablespaces_streamed,
        CASE 
            WHEN backup_total > 0 THEN
                ROUND((backup_streamed::numeric / backup_total::numeric) * 100, 2)
            ELSE 0
        END AS backup_progress_pct,
        CASE 
            WHEN tablespaces_total > 0 THEN
                ROUND((tablespaces_streamed::numeric / tablespaces_total::numeric) * 100, 2)
            ELSE 0
        END AS tablespace_progress_pct
    FROM pg_stat_progress_basebackup
),
backup_metrics AS (
    SELECT 
        *,
        ROUND(backup_streamed / 1024.0 / 1024.0 / 1024.0, 2) AS gb_streamed,
        ROUND(backup_total / 1024.0 / 1024.0 / 1024.0, 2) AS gb_total,
        -- 估算剩余时间(假设当前速度持续)
        CASE 
            WHEN backup_streamed > 0 AND backup_total > backup_streamed THEN
                ROUND(
                    (backup_total - backup_streamed) / 
                    (backup_streamed / EXTRACT(epoch FROM (now() - pg_stat_get_backend_start(pid))) / 60), 
                    1
                )
            ELSE 0
        END AS eta_minutes
    FROM backup_progress
)
SELECT 
    pid,
    phase,
    backup_progress_pct || '%' AS backup_progress,
    gb_streamed || '/' || gb_total AS data_gb,
    tablespace_progress_pct || '%' AS tablespace_progress,
    tablespaces_streamed || '/' || tablespaces_total AS tablespaces,
    eta_minutes || ' 分钟' AS estimated_remaining
FROM backup_metrics;

备份速度监控

sql
-- 监控备份速度和性能指标
WITH backup_speed AS (
    SELECT 
        pid,
        phase,
        backup_streamed,
        EXTRACT(epoch FROM (now() - pg_stat_get_backend_start(pid))) AS elapsed_seconds,
        backup_total
    FROM pg_stat_progress_basebackup
)
SELECT 
    pid,
    phase,
    ROUND(backup_streamed / 1024.0 / 1024.0, 2) AS mb_completed,
    ROUND(elapsed_seconds / 60, 1) AS elapsed_minutes,
    ROUND(
        backup_streamed / 1024.0 / 1024.0 / elapsed_seconds, 2
    ) AS mb_per_second,
    ROUND(
        backup_streamed / 1024.0 / 1024.0 / elapsed_seconds * 60, 2
    ) AS mb_per_minute,
    CASE 
        WHEN backup_total > 0 THEN
            ROUND(
                (backup_total - backup_streamed) / 
                (backup_streamed / elapsed_seconds) / 60, 1
            )
        ELSE 0
    END AS eta_minutes
FROM backup_speed
WHERE elapsed_seconds > 0;

阶段解释

各阶段详细说明

初始化阶段

  • WAL 发送进程准备开始备份
  • 通常非常短暂,几秒钟内完成

等待检查点完成

  • 执行 pg_backup_start 准备备份
  • 等待备份开始检查点完成
  • 确保备份的一致性

估计备份大小

  • 计算需要传输的数据总量
  • 为进度计算提供基准

流式传输数据库文件

  • 主要的备份阶段
  • 传输所有数据库文件
  • 占用大部分备份时间

等待 WAL 归档完成

  • 确保备份期间的 WAL 文件已归档
  • 保证备份的完整性

传输 WAL 文件

  • 仅在使用 --wal-method=fetch 时执行
  • 传输备份期间生成的 WAL 文件

进度监控最佳实践

通用监控脚本

创建一个通用的进度监控脚本,能够监控所有类型的长时间运行操作:

sql
-- 通用进度监控视图
CREATE OR REPLACE VIEW v_operation_progress AS
SELECT 
    'ANALYZE' as operation_type,
    pid,
    datname,
    phase,
    COALESCE(
        ROUND((sample_blks_scanned::numeric / NULLIF(sample_blks_total, 0)::numeric) * 100, 2),
        0
    ) as progress_percentage,
    sample_blks_scanned::text || '/' || sample_blks_total::text as detail
FROM pg_stat_progress_analyze

UNION ALL

SELECT 
    'VACUUM' as operation_type,
    pid,
    datname,
    phase,
    COALESCE(
        ROUND((heap_blks_scanned::numeric / NULLIF(heap_blks_total, 0)::numeric) * 100, 2),
        0
    ) as progress_percentage,
    heap_blks_scanned::text || '/' || heap_blks_total::text as detail
FROM pg_stat_progress_vacuum

UNION ALL

SELECT 
    'CREATE INDEX' as operation_type,
    pid,
    datname,
    phase,
    COALESCE(
        ROUND((blocks_done::numeric / NULLIF(blocks_total, 0)::numeric) * 100, 2),
        0
    ) as progress_percentage,
    blocks_done::text || '/' || blocks_total::text as detail
FROM pg_stat_progress_create_index

UNION ALL

SELECT 
    'CLUSTER' as operation_type,
    pid,
    datname,
    phase,
    COALESCE(
        ROUND((heap_blks_scanned::numeric / NULLIF(heap_blks_total, 0)::numeric) * 100, 2),
        0
    ) as progress_percentage,
    heap_blks_scanned::text || '/' || heap_blks_total::text as detail
FROM pg_stat_progress_cluster

UNION ALL

SELECT 
    'COPY' as operation_type,
    pid,
    datname,
    'processing' as phase,
    COALESCE(
        ROUND((bytes_processed::numeric / NULLIF(bytes_total, 0)::numeric) * 100, 2),
        0
    ) as progress_percentage,
    ROUND(bytes_processed/1024.0/1024.0, 2)::text || '/' || 
    ROUND(bytes_total/1024.0/1024.0, 2)::text || ' MB' as detail
FROM pg_stat_progress_copy

UNION ALL

SELECT 
    'BASE BACKUP' as operation_type,
    pid,
    'backup' as datname,
    phase,
    COALESCE(
        ROUND((backup_streamed::numeric / NULLIF(backup_total, 0)::numeric) * 100, 2),
        0
    ) as progress_percentage,
    ROUND(backup_streamed/1024.0/1024.0/1024.0, 2)::text || '/' || 
    ROUND(backup_total/1024.0/1024.0/1024.0, 2)::text || ' GB' as detail
FROM pg_stat_progress_basebackup;

-- 使用通用监控视图
SELECT 
    operation_type,
    pid,
    datname,
    phase,
    progress_percentage || '%' as progress,
    detail,
    pg_stat_get_backend_start(pid) as start_time,
    now() - pg_stat_get_backend_start(pid) as elapsed_time
FROM v_operation_progress
ORDER BY operation_type, pid;

自动化监控脚本

bash
#!/bin/bash
# progress_monitor.sh - PostgreSQL 操作进度监控脚本

PGHOST=${PGHOST:-localhost}
PGPORT=${PGPORT:-5432}
PGUSER=${PGUSER:-postgres}
PGDATABASE=${PGDATABASE:-postgres}

echo "PostgreSQL 操作进度监控 - $(date)"
echo "========================================"

# 检查是否有正在运行的长时间操作
psql -h $PGHOST -p $PGPORT -U $PGUSER -d $PGDATABASE -c "
SELECT 
    operation_type,
    pid,
    datname,
    phase,
    progress_percentage || '%' as progress,
    detail,
    EXTRACT(epoch FROM (now() - pg_stat_get_backend_start(pid)))::int as elapsed_seconds
FROM v_operation_progress
ORDER BY operation_type, pid;
" 2>/dev/null

# 如果没有操作在运行,显示提示信息
if [ $? -eq 0 ]; then
    COUNT=$(psql -h $PGHOST -p $PGPORT -U $PGUSER -d $PGDATABASE -t -c "
        SELECT COUNT(*) FROM v_operation_progress;
    " 2>/dev/null | tr -d ' ')
    
    if [ "$COUNT" = "0" ]; then
        echo "当前没有长时间运行的操作"
    fi
fi

告警和通知

sql
-- 创建监控函数,当操作运行时间过长时发送告警
CREATE OR REPLACE FUNCTION check_long_running_operations(
    max_minutes INTEGER DEFAULT 60
) RETURNS TABLE(
    operation_type TEXT,
    pid INTEGER,
    database_name TEXT,
    elapsed_minutes NUMERIC,
    progress_pct NUMERIC,
    phase TEXT
) AS $$
BEGIN
    RETURN QUERY
    SELECT 
        op.operation_type::TEXT,
        op.pid::INTEGER,
        op.datname::TEXT,
        ROUND(EXTRACT(epoch FROM (now() - pg_stat_get_backend_start(op.pid))) / 60, 1) as elapsed_minutes,
        op.progress_percentage,
        op.phase::TEXT
    FROM v_operation_progress op
    WHERE EXTRACT(epoch FROM (now() - pg_stat_get_backend_start(op.pid))) / 60 > max_minutes;
END;
$$ LANGUAGE plpgsql;

-- 检查运行超过2小时的操作
SELECT * FROM check_long_running_operations(120);

监控建议

  1. 定期检查:每 5-10 分钟检查一次进度
  2. 设置告警:对运行时间异常长的操作设置告警
  3. 资源监控:结合系统资源使用情况(CPU、内存、I/O)
  4. 历史记录:保存历史进度数据用于性能分析
  5. 自动化:将监控集成到现有的监控系统中

总结

PostgreSQL 的进度报告功能为数据库管理员提供了强大的运维工具:

  1. 实时可见性:了解长时间运行操作的当前状态
  2. 性能优化:识别瓶颈并调整配置参数
  3. 容量规划:基于历史数据预估未来操作时间
  4. 用户体验:为应用层提供准确的进度信息
  5. 故障诊断:快速识别异常情况并采取行动

通过合理使用这些进度报告功能,可以显著提高数据库运维效率和系统可靠性。