Optimize CPU usage of dropping buffers during recovery

  • Jump to comment-1
    Jingtang Zhang<mrdrivingduck@gmail.com>
    Jan 27, 2026, 5:42 PM UTC
    Hi hackers,
    I happen to find a case where a transaction involves dropping a relation
    file, and a relation happens to be empty, the WAL recovery will waste
    a lot of time on scanning the whole buffer pool.
    In DropRelationBuffers/DropRelationsAllBuffers, we will take fast path
    to avoid scanning whole buffer pool, if the buffers to be invalidated
    are less than typically 1/32 of buffer pool. However, since no more WAL
    records are referencing an empty relation, if we want to drop it, startup
    process will never initialize smgrcachednblocks for that relation,
    causing fallback to slow path.
    The situation can be like refreshing a materialized view whose query
    returns no row, or truncating a table and then loading it but the query
    returns no row. It will generate WAL sequence like:
    rmgr: Standby len (rec/tot): 42/ 42, tx: 780, lsn: 0/41238520, prev 0/41238490, desc: LOCK xid 780 db 16384 rel 16416
    rmgr: Storage len (rec/tot): 42/ 42, tx: 780, lsn: 0/412385F0, prev 0/412385C0, desc: CREATE base/16384/16416
    ...
    rmgr: Transaction len (rec/tot): 1073/ 1073, tx: 783, lsn: 0/4136BF70, prev 0/41369F60, desc: COMMIT 2026-01-27 08:07:49.953301 UTC; rels: base/16384/16416
    This makes me think about initializing the cached nblocks to zero when
    redo CREATE record. Note that the 0-size may not correct, because the
    relation may have been zeroextended before server crash, and recovery
    starts from an earlier checkpoint. But we can correct that by invalidating
    the cached value when we really meet a WAL record referencing a block of
    that relation, so that a later smgrnblocks() will fill-in the correct
    file length. It can avoid extending the relation in the middle of a file
    by mistake.
    Quickly proposed a patch for it, with a Perl program for judging the
    performance difference.
    In my environment with 16GB shared buffers, -O2 optimized, debugging off,
    I've run a test of 10 client * 500 transactions using such workload, made
    the server crash and poll it until recovery is done. The CPU time:
    w/o patch: CPU user: 77.58 s, system: 0.27 s
    patched: CPU user: 0.14 s, system: 0.09 s
    Without the patch, the CPU is wasted on BufTagMatchesRelFileLocator:
    Overhead  Command   Shared Object       Symbol
    - 98.23% postgres postgres [.] DropRelationsAllBuffers
     98.21% DropRelationsAllBuffers
        smgrdounlinkall
        DropRelationFiles
        xact_redo_commit
        xact_redo
        PerformWalRecovery
        StartupXLOG
        StartupProcessMain
        postmaster_child_launch
        StartChildProcess
        PostmasterStateMachine
        ServerLoop.isra.0
     0.06%  postgres  postgres            [.] hash_search_with_hash_value
    Not sure if there will be any problem of correctness. Also, not sure about
    what to do about INIT fork.
    Any thoughts?
    --
    Regards, Jingtang