[WIP] Pipelined Recovery

  • Jump to comment-1
    Imran Zaheer<imran.zhir@gmail.com>
    Jan 30, 2026, 2:53 PM UTC
    Hi,
    Based on a suggestion by my colleague Ants Aasma, I worked on this
    idea of adding parallelism to the WAL recovery process.
    The crux of this idea is to decode the WAL using parallel workers. Now
    the replay process can get the records from the shared memory queue
    directly. This way, we can decrease some CPU load on the recovery process.
    Implementing this idea yielded an improvement of around 20% in the
    recovery times, but results may differ based on workloads. I have
    attached some benchmarks for different workloads.
    Following are some recovery tests with the default configs. Here p1
    shows pipeline enabled. (db size) is the backup database size on
    which the recovery happens. You can see more detail related to the
    benchmarks in the attached file `recoveries-benchmark-v01`.
                           elapsed (p0)       elapsed (p1)   % perf    db
    size
    inserts.sql 272s 10ms 197s 570ms 27.37% 480 MB
    updates.sql 177s 420ms 117s 80ms 34.01% 480 MB
    hot-updates.sql 36s 940ms 29s 240ms 20.84% 480 MB
    nonhot.sql 36s 570ms 28s 980ms 20.75% 480 MB
    simple-update 20s 160ms 11s 580ms 42.56% 4913 MB
    tpcb-like 20s 590ms 13s 640ms 33.75% 4913 MB
    Similar approach was also suggested by Matthias van de Meent earlier in a
    separate thread [1]. Right now I am using one bgw for decoding and filling
    up the shared message queue, and the redo apply loop simply receives the
    decoded record
    from the queue. After the redo is finished, the consumer (startup
    process) can request a shutdown from the producer (pipeline bgw)
    before exiting recovery.
    This idea can be coupled with another idea of pinning the buffers in
    parallel before the recovery process needs them. This will try to
    parallelize most of the work being done in
    `XLogReadBufferForRedoExtended`. The Redo can simply receive
    the already pinned buffers from a queue, but for implementing
    this, we still need some R&D on that, as IPC and pinning/unpinning of
    buffers across two processes can be tricky.
    If someone wants to reproduce the benchmark, they can do so using
    these scripts [2].
    Looking forward to your reviews, comments, etc.
    [1]:
    https://www.postgresql.org/message-id/CAEze2Wh6C_QfxLii%2B%2BeZue5%3DKvbVXKkHyZW8PLmtLgyjmFzwCQ%40mail.gmail.com
    [2]: https://github.com/imranzaheer612/pg-recovery-testing
    --
    Regards,
    Imran Zaheer
    CYBERTEC PostgreSQL International GmbH