Separate catalog_xmin from xmin in walsender hot standby feedback

  • Jump to comment-1
    Rui Zhao<zhaorui126@gmail.com>
    Apr 30, 2026, 1:43 PM UTC
    Hi hackers,
    I'd like to propose a fix for a long-standing issue where hot standby
    feedback catalog_xmin incorrectly holds back vacuuming of user data
    tables on the primary when no physical replication slot is used.
    == Problem ==
    When a standby sends hot standby feedback to a primary without a
    physical replication slot, ProcessStandbyHSFeedbackMessage() takes
    min(feedbackCatalogXmin, feedbackXmin) and stores it into
    MyProc->xmin:
    if (TransactionIdIsNormal(feedbackCatalogXmin)
        && TransactionIdPrecedes(feedbackCatalogXmin, feedbackXmin))
        MyProc->xmin = feedbackCatalogXmin;
    else
        MyProc->xmin = feedbackXmin;
    Since ComputeXidHorizons() treats proc->xmin uniformly for both data
    and catalog horizons, the catalog_xmin ends up holding back
    dataoldestnonremovable, preventing vacuum from cleaning dead tuples
    in regular user tables.
    The existing code even acknowledges this limitation:
    "We can only track the catalog xmin separately when using a slot,
     so we store the least of the two provided when not using a slot."
    == Why this matters ==
    One might argue "just use a replication slot." However, many
    production HA deployments intentionally avoid physical replication
    slots because of their lifecycle management complexity:
    - When a primary fails, physical slots on the old primary are lost
    and cannot be automatically migrated to the promoted standby.
    - Other standbys that were using slots on the old primary must
    re-establish their slots on the new primary, potentially requiring
    a fresh base backup.
    - Dangling slots from disconnected standbys can cause unbounded WAL
    accumulation until manually dropped.
    These deployments use walkeepsize or WAL archiving for WAL
    retention, combined with hotstandbyfeedback for visibility horizon
    management. This is a legitimate production configuration -- for
    example, some HA frameworks (Patroni with certain configurations,
    custom HA scripts) operate this way.
    The issue becomes severe when the standby also hosts a logical
    replication slot (e.g., for change data capture or logical replication
    to a downstream). The logical slot's catalog_xmin can be very old
    (retained for logical decoding catalog access), and this old value
    gets propagated to the primary's walsender via hot standby feedback,
    blocking vacuum on ALL user data tables on the primary. This leads
    to table bloat that is difficult to diagnose since the DBA may not
    realize the connection between a standby's logical slot and the
    primary's vacuum behavior.
    == Fix ==
    The patch adds a catalog_xmin field to PGPROC (4 bytes), so the
    walsender can track catalog_xmin separately from xmin even without a
    replication slot. This mirrors how replication slots already separate
    slot->data.xmin from slot->data.catalog_xmin.
    In ComputeXidHorizons(), the new proccatalogxmin is accumulated
    from PGPROC entries and applied only to catalogoldestnonremovable
    and sharedoldestnonremovable -- exactly how slotcatalogxmin is
    already handled. It does NOT affect dataoldestnonremovable.
    GetReplicationHorizons() is updated to include proccatalogxmin in
    the catalog_xmin sent upstream, ensuring correct behavior in
    cascading standby configurations.
    Changes summary:
    - proc.h: add catalog_xmin to PGPROC
    - proc.c: initialize catalog_xmin in InitProcess/InitAuxiliaryProcess
    - procarray.c: accumulate and apply proccatalogxmin in
    ComputeXidHorizons(); include in GetReplicationHorizons()
    - walsender.c: set MyProc->xmin and MyProc->catalog_xmin separately
    in the no-slot path of ProcessStandbyHSFeedbackMessage()
    == Alternatives considered ==
    1. Generalize the ephemeral slot concept (as suggested by the existing
    XXX comment): this would automatically create a temporary slot for
    slot-less walsenders.  More invasive, requires slot allocation
    (max_replication_slots), and adds slot lifecycle management.
    2. Simply ignore catalog_xmin in the no-slot path: simpler but loses
    catalog protection for the standby's logical decoding.
    The proposed approach is minimal, correct, and consistent with how
    slots already handle the separation.
    == Testing ==
    A new TAP test (053hsfeedbackcatalogxmin.pl) verifies:
    1. With hotstandbyfeedback=on and no physical replication slot, when
    the standby has a logical slot with an old catalog_xmin, VACUUM on
    the primary can still clean dead tuples in user data tables.
    2. The standby's logical slot catalog_xmin remains properly set,
    confirming catalog protection is preserved.
    Patch attached.
    Regards,
    Rui Zhao