Skip to content

PostgreSQL 锁监控与管理

概述

在多用户数据库环境中,锁机制是确保数据一致性和并发安全的核心技术。PostgreSQL 提供了强大的锁监控工具,其中 pg_locks 系统表是监控数据库锁状态的重要工具。通过有效的锁监控,我们可以及时发现性能瓶颈、死锁问题和资源争用情况。

为什么需要锁监控?

业务场景问题

在实际业务中,我们经常遇到以下问题:

  1. 订单处理系统:多个用户同时下单时,库存更新可能产生锁争用
  2. 财务报表生成:大批量数据查询与实时交易处理产生锁冲突
  3. 用户账户操作:转账、充值等操作需要确保数据一致性
  4. 批量数据导入:ETL 过程中的锁问题影响系统性能

WARNING

缺乏锁监控可能导致:

  • 应用响应缓慢
  • 死锁导致事务失败
  • 系统吞吐量下降
  • 用户体验恶化

pg_locks 系统表详解

核心功能

pg_locks 是 PostgreSQL 提供的系统视图,用于查看锁管理器中所有未完成的锁信息。

主要应用场景

1. 查看当前所有未完成的锁

问题陈述:系统响应变慢,需要检查当前数据库中所有的锁状态

解决方案

sql
-- 查看所有当前活跃的锁
SELECT 
    locktype,           -- 锁类型
    database,           -- 数据库 OID
    relation,           -- 关系(表)OID
    page,              -- 页面编号
    tuple,             -- 元组编号
    virtualxid,        -- 虚拟事务 ID
    transactionid,     -- 事务 ID
    classid,           -- 系统目录 OID
    objid,             -- 对象 OID
    objsubid,          -- 对象子 ID
    virtualtransaction, -- 虚拟事务 ID
    pid,               -- 进程 ID
    mode,              -- 锁模式
    granted,           -- 是否已授予锁
    fastpath           -- 是否为快速路径锁
FROM pg_locks
ORDER BY pid, locktype;

分析过程

  • locktype:表示锁的类型(relation、tuple、transactionid 等)
  • mode:锁模式,如 AccessShareLock、RowExclusiveLock 等
  • granted:true 表示已获得锁,false 表示正在等待
  • pid:持有或等待锁的进程 ID

实际输出示例

locktyperelationmodegrantedpid
relation16384AccessShareLockt1234
relation16385RowExclusiveLockt1235
transactionid-ExclusiveLockf1236

2. 查看特定数据库中关系上的所有锁

问题陈述:电商系统中商品表频繁出现锁等待,需要定位具体的锁争用情况

解决方案

sql
-- 查看特定数据库中表级锁的详细信息
SELECT 
    pl.locktype,
    pl.mode,
    pl.granted,
    pl.pid,
    pa.usename,                    -- 用户名
    pa.application_name,           -- 应用名称
    pa.client_addr,               -- 客户端地址
    pa.query_start,               -- 查询开始时间
    pa.state,                     -- 连接状态
    pg_stat_get_backend_activity_start(pa.pid) as activity_start,
    c.relname,                    -- 表名
    n.nspname                     -- 模式名
FROM pg_locks pl
JOIN pg_stat_activity pa ON pl.pid = pa.pid
LEFT JOIN pg_class c ON pl.relation = c.oid
LEFT JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE pl.locktype = 'relation'
  AND pa.datname = current_database()  -- 当前数据库
  AND c.relname IS NOT NULL
ORDER BY pl.granted, pa.query_start;

分析过程

  • 结合 pg_stat_activity 获取更多会话信息
  • 通过 pg_classpg_namespace 获取表名和模式名
  • 按锁状态和查询开始时间排序,便于分析

业务场景应用

sql
-- 针对电商商品表的锁监控
SELECT 
    pl.mode,
    pl.granted,
    pa.usename,
    pa.query,                     -- 正在执行的查询
    EXTRACT(EPOCH FROM (now() - pa.query_start))::int as duration_seconds,
    c.relname
FROM pg_locks pl
JOIN pg_stat_activity pa ON pl.pid = pa.pid
LEFT JOIN pg_class c ON pl.relation = c.oid
WHERE c.relname IN ('products', 'inventory', 'orders')  -- 关键业务表
  AND pl.locktype = 'relation'
ORDER BY duration_seconds DESC;

3. 查看特定关系上的所有锁

问题陈述:用户账户表在转账高峰期出现性能问题,需要专门监控该表的锁状态

解决方案

sql
-- 监控特定表的锁状态(以 user_accounts 表为例)
WITH table_locks AS (
    SELECT 
        pl.*,
        pa.usename,
        pa.application_name,
        pa.query,
        pa.state,
        EXTRACT(EPOCH FROM (now() - pa.query_start))::int as duration
    FROM pg_locks pl
    JOIN pg_stat_activity pa ON pl.pid = pa.pid
    JOIN pg_class c ON pl.relation = c.oid
    WHERE c.relname = 'user_accounts'  -- 指定表名
)
SELECT 
    locktype,
    mode,
    granted,
    usename,
    application_name,
    duration,
    CASE 
        WHEN granted THEN '已获得'
        ELSE '等待中'
    END as lock_status,
    LEFT(query, 50) || '...' as query_preview  -- 查询预览
FROM table_locks
ORDER BY granted, duration DESC;

输入数据:假设有多个转账事务同时进行 输出分析

modegrantedusenamedurationlock_statusquery_preview
RowExclusiveLocktapp_user45已获得UPDATE user_accounts SET balance = balance - 100...
RowExclusiveLockfapp_user12等待中UPDATE user_accounts SET balance = balance + 100...

性能优化建议

  • 持续时间超过 30 秒的锁需要重点关注
  • 等待锁数量过多时考虑优化事务粒度
  • 合理使用索引减少锁争用范围

4. 查看特定会话持有的所有锁

问题陈述:某个批量处理程序占用了过多资源,需要查看该进程持有的所有锁

解决方案

sql
-- 查看指定进程 ID 的所有锁信息
SELECT 
    locktype,
    CASE locktype
        WHEN 'relation' THEN c.relname
        WHEN 'transactionid' THEN transactionid::text
        WHEN 'virtualxid' THEN virtualxid::text
        ELSE objid::text
    END as object_name,
    mode,
    granted,
    CASE 
        WHEN NOT granted THEN '等待锁'
        ELSE '持有锁'
    END as status
FROM pg_locks pl
LEFT JOIN pg_class c ON pl.relation = c.oid
WHERE pid = 12345  -- 指定进程 ID
ORDER BY locktype, granted DESC;

实际应用场景

sql
-- 查找长时间运行的会话及其锁信息
WITH long_running_sessions AS (
    SELECT pid, usename, query_start, query
    FROM pg_stat_activity 
    WHERE state = 'active'
      AND query_start < now() - interval '5 minutes'  -- 运行超过5分钟
)
SELECT 
    lrs.pid,
    lrs.usename,
    EXTRACT(EPOCH FROM (now() - lrs.query_start))::int as runtime_seconds,
    pl.locktype,
    pl.mode,
    pl.granted,
    COUNT(*) as lock_count
FROM long_running_sessions lrs
JOIN pg_locks pl ON lrs.pid = pl.pid
GROUP BY lrs.pid, lrs.usename, runtime_seconds, pl.locktype, pl.mode, pl.granted
ORDER BY runtime_seconds DESC, lock_count DESC;

锁争用识别与分析

识别锁争用的关系

问题陈述:需要找出数据库中锁争用最严重的表,定位性能瓶颈

解决方案

sql
-- 统计各表的锁争用情况
SELECT 
    n.nspname as schema_name,
    c.relname as table_name,
    COUNT(*) as total_locks,
    COUNT(*) FILTER (WHERE NOT pl.granted) as waiting_locks,
    COUNT(*) FILTER (WHERE pl.granted) as granted_locks,
    ROUND(
        COUNT(*) FILTER (WHERE NOT pl.granted) * 100.0 / COUNT(*), 2
    ) as waiting_percentage
FROM pg_locks pl
JOIN pg_class c ON pl.relation = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE pl.locktype = 'relation'
  AND c.relkind = 'r'  -- 只查看普通表
GROUP BY n.nspname, c.relname
HAVING COUNT(*) > 1    -- 只显示有多个锁的表
ORDER BY waiting_locks DESC, total_locks DESC;

分析过程

  • waiting_percentage:等待锁的百分比,反映争用程度
  • 按等待锁数量排序,优先关注争用最严重的表
  • 结合业务逻辑分析争用原因

锁等待链分析

sql
-- 分析锁等待链(死锁检测)
WITH RECURSIVE lock_chain AS (
    -- 基础查询:找到所有等待锁的会话
    SELECT 
        pl.pid as waiter_pid,
        pl.locktype,
        pl.relation,
        pa.usename as waiter_user,
        pa.query as waiter_query
    FROM pg_locks pl
    JOIN pg_stat_activity pa ON pl.pid = pa.pid
    WHERE NOT pl.granted
    
    UNION ALL
    
    -- 递归查询:找到持有锁的会话
    SELECT 
        holder.pid,
        holder.locktype,
        holder.relation,
        ha.usename,
        ha.query
    FROM lock_chain lc
    JOIN pg_locks holder ON (
        lc.locktype = holder.locktype AND
        lc.relation = holder.relation AND
        holder.granted
    )
    JOIN pg_stat_activity ha ON holder.pid = ha.pid
    WHERE holder.pid != lc.waiter_pid
)
SELECT DISTINCT * FROM lock_chain
ORDER BY waiter_pid;

性能影响评估

锁对数据库性能的影响监控

sql
-- 创建锁监控视图
CREATE OR REPLACE VIEW lock_monitoring AS
SELECT 
    current_timestamp as check_time,
    COUNT(*) as total_locks,
    COUNT(*) FILTER (WHERE NOT granted) as waiting_locks,
    COUNT(*) FILTER (WHERE mode = 'ExclusiveLock') as exclusive_locks,
    COUNT(*) FILTER (WHERE locktype = 'transactionid') as transaction_locks,
    AVG(EXTRACT(EPOCH FROM (now() - pa.query_start))) as avg_lock_duration
FROM pg_locks pl
LEFT JOIN pg_stat_activity pa ON pl.pid = pa.pid
WHERE pa.state = 'active';

-- 定期监控锁状态
SELECT * FROM lock_monitoring;

锁超时处理

sql
-- 设置语句超时,避免长时间锁等待
SET statement_timeout = '30s';  -- 30秒超时

-- 设置锁等待超时
SET lock_timeout = '10s';       -- 10秒锁等待超时

-- 查看当前超时设置
SHOW statement_timeout;
SHOW lock_timeout;

实际应用:电商订单系统锁监控

业务场景设计

假设我们有一个电商订单系统,包含以下关键操作:

监控脚本实现

sql
-- 电商系统锁监控脚本
CREATE OR REPLACE FUNCTION monitor_ecommerce_locks()
RETURNS TABLE (
    table_name text,
    lock_mode text,
    waiting_count bigint,
    max_wait_time interval,
    blocking_queries text[]
) AS $$
BEGIN
    RETURN QUERY
    WITH ecommerce_tables AS (
        SELECT unnest(ARRAY['products', 'inventory', 'orders', 'users', 'payments']) as table_name
    ),
    lock_info AS (
        SELECT 
            c.relname,
            pl.mode,
            pl.granted,
            pl.pid,
            pa.query,
            pa.query_start
        FROM pg_locks pl
        JOIN pg_class c ON pl.relation = c.oid
        JOIN pg_stat_activity pa ON pl.pid = pa.pid
        WHERE c.relname = ANY(SELECT et.table_name FROM ecommerce_tables et)
          AND pl.locktype = 'relation'
    )
    SELECT 
        li.relname::text,
        li.mode::text,
        COUNT(*) FILTER (WHERE NOT li.granted),
        MAX(now() - li.query_start) FILTER (WHERE NOT li.granted),
        ARRAY_AGG(DISTINCT li.query) FILTER (WHERE li.granted)
    FROM lock_info li
    GROUP BY li.relname, li.mode
    HAVING COUNT(*) FILTER (WHERE NOT li.granted) > 0
    ORDER BY COUNT(*) FILTER (WHERE NOT li.granted) DESC;
END;
$$ LANGUAGE plpgsql;

-- 使用监控函数
SELECT * FROM monitor_ecommerce_locks();

输出示例

table_namelock_modewaiting_countmax_wait_timeblocking_queries
inventoryRowExclusiveLock300:00:45
ordersAccessShareLock100:00:12

最佳实践与优化建议

锁监控最佳实践

监控频率建议

  • 实时监控:每 5-10 秒检查一次锁状态
  • 告警阈值:等待锁超过 30 秒触发告警
  • 历史记录:保留 7 天的锁监控历史数据

性能优化策略

  1. 事务优化
sql
-- 优化前:长事务持有锁时间过长
BEGIN;
SELECT pg_sleep(60);  -- 模拟长时间处理
UPDATE products SET stock = stock - 1 WHERE id = 1;
COMMIT;

-- 优化后:缩短事务时间
BEGIN;
UPDATE products SET stock = stock - 1 WHERE id = 1;
COMMIT;
-- 将其他处理移到事务外部
  1. 索引优化
sql
-- 为经常更新的列创建合适的索引
CREATE INDEX CONCURRENTLY idx_products_stock ON products(stock) 
WHERE stock > 0;  -- 部分索引,减少锁争用
  1. 并发控制
sql
-- 使用 SELECT FOR UPDATE NOWAIT 避免阻塞
SELECT stock FROM products 
WHERE id = 1 
FOR UPDATE NOWAIT;  -- 立即返回,不等待锁

告警和自动化

sql
-- 创建锁告警函数
CREATE OR REPLACE FUNCTION check_lock_alerts()
RETURNS text AS $$
DECLARE
    waiting_locks_count integer;
    long_waiting_locks integer;
    alert_message text := '';
BEGIN
    -- 检查等待锁数量
    SELECT COUNT(*) INTO waiting_locks_count
    FROM pg_locks WHERE NOT granted;
    
    -- 检查长时间等待的锁
    SELECT COUNT(*) INTO long_waiting_locks
    FROM pg_locks pl
    JOIN pg_stat_activity pa ON pl.pid = pa.pid
    WHERE NOT pl.granted 
      AND pa.query_start < now() - interval '30 seconds';
    
    IF waiting_locks_count > 10 THEN
        alert_message := alert_message || format('警告:当前有 %s 个等待锁\n', waiting_locks_count);
    END IF;
    
    IF long_waiting_locks > 0 THEN
        alert_message := alert_message || format('严重:有 %s 个锁等待超过30秒\n', long_waiting_locks);
    END IF;
    
    RETURN COALESCE(alert_message, '锁状态正常');
END;
$$ LANGUAGE plpgsql;

-- 定期检查(可配合 cron 使用)
SELECT check_lock_alerts();

IMPORTANT

通过系统化的锁监控,我们可以:

  • 及时发现性能瓶颈和锁争用问题
  • 预防死锁和长时间阻塞情况
  • 优化数据库设计和查询性能
  • 提升系统整体稳定性和用户体验

锁监控是数据库管理的重要组成部分,结合 pg_locks 系统表和相关的监控工具,我们可以构建完整的数据库性能监控体系,确保业务系统的稳定运行。