pgsql: Prevent invalidation of newly synced replication slots.

  • Jump to comment-1
    Amit Kapila<akapila@postgresql.org>
    Jan 27, 2026, 5:56 AM UTC
    Prevent invalidation of newly synced replication slots.
    A race condition could cause a newly synced replication slot to become
    invalidated between its initial sync and the checkpoint.
    When syncing a replication slot to a standby, the slot's initial
    restartlsn is taken from the publisher's remoterestart_lsn. Because slot
    sync happens asynchronously, this value can lag behind the standby's
    current redo pointer. Without any interlocking between WAL reservation and
    checkpoints, a checkpoint may remove WAL required by the newly synced
    slot, causing the slot to be invalidated.
    To fix this, we acquire ReplicationSlotAllocationLock before reserving WAL
    for a newly synced slot, similar to commit 006dd4b2e5. This ensures that
    if WAL reservation happens first, the checkpoint process must wait for
    slotsync to update the slot's restart_lsn before it computes the minimum
    required LSN.
    However, unlike in ReplicationSlotReserveWal(), this lock alone cannot
    protect a newly synced slot if a checkpoint has already run
    CheckPointReplicationSlots() before slotsync updates the slot. In such
    cases, the remote restart_lsn may be stale and earlier than the current
    redo pointer. To prevent relying on an outdated LSN, we use the oldest
    WAL location available if it is greater than the remote restart_lsn.
    This ensures that newly synced slots always start with a safe, non-stale
    restart_lsn and are not invalidated by concurrent checkpoints.
    Author: Zhijie Hou <houzj.fnst@fujitsu.com>
    Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
    Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
    Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
    Reviewed-by: Chao Li <li.evan.chao@gmail.com>
    Backpatch-through: 17
    Discussion: https://postgr.es/m/TY4PR01MB16907E744589B1AB2EE89A31F94D7A%40TY4PR01MB16907.jpnprd01.prod.outlook.com
    Branch
    ------
    master
    Details
    -------
    https://git.postgresql.org/pg/commitdiff/851f6649cc18c4b482fa2b6afddb65b35d035370
    Modified Files
    --------------
    src/backend/access/transam/xlog.c                  |  6 +-
    src/backend/replication/logical/slotsync.c         | 97 +++++++++++-----------
    src/include/access/xlog.h                          |  1 +
    src/test/recovery/t/046_checkpoint_logical_slot.pl | 84 ++++++++++++++++++-
    4 files changed, 136 insertions(+), 52 deletions(-)
    • Jump to comment-1
      Robert Haas<robertmhaas@gmail.com>
      Jan 27, 2026, 2:59 PM UTC
      On Tue, Jan 27, 2026 at 12:56 AM Amit Kapila <akapila@postgresql.org> wrote:
      Prevent invalidation of newly synced replication slots.
      This commit has broken CI for me. On the "Windows - Server 2022, VS
      2019 - Meson & ninja" build, the following shows up in
      046checkpointlogicalslotstandby.log:
      2026-01-27 13:44:44.421 GMT startup[5172] FATAL: could not rename
      file "backuplabel" to "backuplabel.old": Permission denied
      I imagine this is going to break CI for everybody else too, as well as cfbot.
      --
      Robert Haas
      EDB: http://www.enterprisedb.com
      • Jump to comment-1
        Amit Kapila<amit.kapila16@gmail.com>
        Jan 28, 2026, 4:34 AM UTC
        On Tue, Jan 27, 2026 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
        On Tue, Jan 27, 2026 at 12:56 AM Amit Kapila <akapila@postgresql.org> wrote:
        Prevent invalidation of newly synced replication slots.

        This commit has broken CI for me. On the "Windows - Server 2022, VS
        2019 - Meson & ninja" build, the following shows up in
        046checkpointlogicalslotstandby.log:

        2026-01-27 13:44:44.421 GMT startup[5172] FATAL: could not rename
        file "backuplabel" to "backuplabel.old": Permission denied

        I imagine this is going to break CI for everybody else too, as well as cfbot.
        I'll try to reproduce and look into it.
        --
        With Regards,
        Amit Kapila.
      • Jump to comment-1
        Thomas Munro<thomas.munro@gmail.com>
        Jan 27, 2026, 4:16 PM UTC
        On Wed, Jan 28, 2026 at 3:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
        I imagine this is going to break CI for everybody else too, as well as cfbot.
        Just by the way, on that last point, we trained cfbot to watch out for
        CI pass/fail in this account:
        https://github.com/postgres/postgres/commits/master/
        and then use the most recent pass as the base commit when applying
        patches to make test branches. So if master is broken for a while, it
        no longer takes all the cfbot runs with it. Mentioning just in case
        anyone is confused by that...
        As for what's happening... hmm, there are a few holes in the "shared
        locking" stuff you get with the flags we use. For example you can't
        unlink a directory that contains a file that has been unlinked but
        someone still holds open. Doesn't seem to be the case here. But I
        wonder if you can't rename("old", "new") where "new" is a file that
        has already been unlinked (or renamed over) that someone still holds
        open, or something like that...
        • Jump to comment-1
          Andres Freund<andres@anarazel.de>
          Jan 27, 2026, 4:38 PM UTC
          Hi,
          On 2026-01-28 05:16:13 +1300, Thomas Munro wrote:
          On Wed, Jan 28, 2026 at 3:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
          I imagine this is going to break CI for everybody else too, as well as cfbot.

          Just by the way, on that last point, we trained cfbot to watch out for
          CI pass/fail in this account:

          https://github.com/postgres/postgres/commits/master/

          and then use the most recent pass as the base commit when applying
          patches to make test branches. So if master is broken for a while, it
          no longer takes all the cfbot runs with it. Mentioning just in case
          anyone is confused by that...
          Ah. I was indeed confused by that for a bit.
          But I wonder if you can't rename("old", "new") where "new" is a file that
          has already been unlinked (or renamed over) that someone still holds open,
          or something like that...
          I don't see a source of that that would be specific to this test though :(. We
          do wait for pg_basebackup to have shut down, which wrote backup.label (which
          was "manifactured" during streaming by basebackup.c).
          Perhaps we should crank up log level in the test? No idea if it'll help, but
          right now I don't even know where to start looking.
          Greetings,
          Andres Freund
          • Jump to comment-1
            Thomas Munro<thomas.munro@gmail.com>
            Jan 28, 2026, 7:23 AM UTC
            On Tue, Jan 27, 2026 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:
            On 2026-01-28 05:16:13 +1300, Thomas Munro wrote:
            But I wonder if you can't rename("old", "new") where "new" is a file that
            has already been unlinked (or renamed over) that someone still holds open,
            or something like that...

            I don't see a source of that that would be specific to this test though :(. We
            do wait for pg_basebackup to have shut down, which wrote backup.label (which
            was "manifactured" during streaming by basebackup.c).
            I have no specific ideas, but just in case it's helpful for this
            discussion, I looked at my old test suite[1] where I tried to
            catalogue all the edge conditions around this sort of stuff
            empirically, and saw that rename() always fails like that if the file
            is open (that is, it doesn't require a more complicated sequence with
            an earlier unlink/rename of the new name):
            + /*
            + * Windows can't rename over an open non-unlinked file, even with
            + * haveposixunlink_semantics.
            + */
            + pgwin32dirmodloops = 2; / minimize looping to fail fast in testing /
            + PGEXPECTSYS(rename(path, path2) == -1,
            + "Windows: can't rename name1.txt -> name2.txt while name2.txt is open");
            + PGEXPECTEQ(errno, EACCES);
            [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
          • Jump to comment-1
            Robert Haas<robertmhaas@gmail.com>
            Jan 27, 2026, 5:43 PM UTC
            On Tue, Jan 27, 2026 at 11:37 AM Andres Freund <andres@anarazel.de> wrote:
            But I wonder if you can't rename("old", "new") where "new" is a file that
            has already been unlinked (or renamed over) that someone still holds open,
            or something like that...

            I don't see a source of that that would be specific to this test though :(. We
            do wait for pg_basebackup to have shut down, which wrote backup.label (which
            was "manifactured" during streaming by basebackup.c).

            Perhaps we should crank up log level in the test? No idea if it'll help, but
            right now I don't even know where to start looking.
            I tried sticking a pg_sleep(30) in just before starting the standby
            node, and that didn't help, so it doesn't seem like it's a race
            condition.
            Here's what the standby log file looks like with logminmessages=DEBUG2:
            2026-01-27 17:19:25.262 GMT postmaster[4932] DEBUG: registering
            background worker "logical replication launcher"
            2026-01-27 17:19:25.264 GMT postmaster[4932] DEBUG: dynamic shared
            memory system will support 229 segments
            2026-01-27 17:19:25.264 GMT postmaster[4932] DEBUG: created dynamic
            shared memory control segment 3769552926 (9176 bytes)
            2026-01-27 17:19:25.266 GMT postmaster[4932] DEBUG: maxsafefds =
            990, usablefds = 1000, alreadyopen = 3
            2026-01-27 17:19:25.268 GMT postmaster[4932] LOG: starting PostgreSQL
            19devel on x86_64-windows, compiled by msvc-19.29.30159, 64-bit
            2026-01-27 17:19:25.271 GMT postmaster[4932] LOG: listening on Unix
            socket "C:/Windows/TEMP/3xesO1s4ba/.s.PGSQL.17575"
            2026-01-27 17:19:25.273 GMT postmaster[4932] DEBUG: updating PMState
            from PMINIT to PMSTARTUP
            2026-01-27 17:19:25.273 GMT postmaster[4932] DEBUG: assigned pm child
            slot 57 for io worker
            2026-01-27 17:19:25.275 GMT postmaster[4932] DEBUG: assigned pm child
            slot 58 for io worker
            2026-01-27 17:19:25.277 GMT postmaster[4932] DEBUG: assigned pm child
            slot 59 for io worker
            2026-01-27 17:19:25.278 GMT postmaster[4932] DEBUG: assigned pm child
            slot 56 for checkpointer
            2026-01-27 17:19:25.280 GMT postmaster[4932] DEBUG: assigned pm child
            slot 55 for background writer
            2026-01-27 17:19:25.281 GMT postmaster[4932] DEBUG: assigned pm child
            slot 89 for startup
            2026-01-27 17:19:25.308 GMT checkpointer[6560] DEBUG: checkpointer
            updated shared memory configuration values
            2026-01-27 17:19:25.314 GMT startup[2488] LOG: database system was
            interrupted; last known up at 2026-01-27 17:19:21 GMT
            2026-01-27 17:19:25.317 GMT startup[2488] DEBUG: removing all
            temporary WAL segments
            The system cannot find the file specified.
            2026-01-27 17:19:25.336 GMT startup[2488] DEBUG: could not restore
            file "00000002.history" from archive: child process exited with exit
            code 1
            2026-01-27 17:19:25.337 GMT startup[2488] DEBUG: backup time
            2026-01-27 17:19:21 GMT in file "backup_label"
            2026-01-27 17:19:25.337 GMT startup[2488] DEBUG: backup label
            pgbasebackup base backup in file "backuplabel"
            2026-01-27 17:19:25.337 GMT startup[2488] DEBUG: backup timeline 1 in
            file "backup_label"
            2026-01-27 17:19:25.337 GMT startup[2488] LOG: starting backup
            recovery with redo LSN 0/2A000028, checkpoint LSN 0/2A000080, on
            timeline ID 1
            The system cannot find the file specified.
            2026-01-27 17:19:25.352 GMT startup[2488] DEBUG: could not restore
            file "00000001000000000000002A" from archive: child process exited
            with exit code 1
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: checkpoint record is
            at 0/2A000080
            2026-01-27 17:19:25.353 GMT startup[2488] LOG: entering standby mode
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: redo record is at
            0/2A000028; shutdown false
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: next transaction ID:
            769; next OID: 24576
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: next MultiXactId: 1;
            next MultiXactOffset: 1
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: oldest unfrozen
            transaction ID: 760, in database 1
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: oldest MultiXactId:
            1, in database 1
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: commit timestamp Xid
            oldest/newest: 0/0
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: transaction ID wrap
            limit is 2147484407, limited by database with OID 1
            2026-01-27 17:19:25.353 GMT startup[2488] DEBUG: MultiXactId wrap
            limit is 2147483648, limited by database with OID 1
            2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: starting up replication slots
            2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: xmin required by
            slots: data 0, catalog 0
            2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: starting up
            replication origin progress state
            2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: didn't need to
            unlink permanent stats file "pg_stat/pgstat.stat" - didn't exist
            2026-01-27 17:19:38.938 GMT startup[2488] FATAL: could not rename
            file "backuplabel" to "backuplabel.old": Permission denied
            2026-01-27 17:19:38.983 GMT postmaster[4932] DEBUG: releasing pm child slot 89
            2026-01-27 17:19:38.983 GMT postmaster[4932] LOG: startup process
            (PID 2488) exited with exit code 1
            2026-01-27 17:19:38.983 GMT postmaster[4932] LOG: aborting startup
            due to startup process failure
            2026-01-27 17:19:38.983 GMT postmaster[4932] DEBUG: cleaning up
            dynamic shared memory control segment with ID 3769552926
            2026-01-27 17:19:38.985 GMT postmaster[4932] LOG: database system is shut down
            Unfortunately, I don't see any clues there. The "The system cannot
            find the file specified." messages look like they might be a clue, but
            I think they are not, because they also occur in
            040standbyfailoverslotssync_standby1.log, and that test passes. At
            the point where this log file shows the FATAL error, that log file
            continues thus:
            2026-01-27 17:18:36.905 GMT startup[1420] DEBUG: resetting unlogged
            relations: cleanup 1 init 0
            2026-01-27 17:18:36.906 GMT startup[1420] DEBUG: initializing for hot standby
            2026-01-27 17:18:36.906 GMT startup[1420] LOG: redo starts at 0/02000028
            2026-01-27 17:18:36.906 GMT startup[1420] DEBUG: recovery snapshots
            are now enabled
            2026-01-27 17:18:36.906 GMT startup[1420] CONTEXT: WAL redo at
            0/02000048 for Standby/RUNNING_XACTS: nextXid 769 latestCompletedXid
            768 oldestRunningXid 769
            2026-01-27 17:18:36.907 GMT startup[1420] DEBUG: end of backup record reached
            2026-01-27 17:18:36.907 GMT startup[1420] CONTEXT: WAL redo at
            0/02000100 for XLOG/BACKUP_END: 0/02000028
            2026-01-27 17:18:36.907 GMT startup[1420] DEBUG: end of backup reached
            Which again seems totally normal.
            --
            Robert Haas
            EDB: http://www.enterprisedb.com
            • Jump to comment-1
              Andres Freund<andres@anarazel.de>
              Jan 27, 2026, 6:17 PM UTC
              Hi,
              On 2026-01-27 12:42:51 -0500, Robert Haas wrote:
              I tried sticking a pg_sleep(30) in just before starting the standby
              node, and that didn't help, so it doesn't seem like it's a race
              condition.
              Interesting.
              It could be worth trying to run the test in isolation, without all the other
              concurrent tests.
              Greg, have you tried to repro it interactively?
              Bryan, you seem to have become the resident windows expert...
              2026-01-27 17:19:25.337 GMT startup[2488] LOG: starting backup
              recovery with redo LSN 0/2A000028, checkpoint LSN 0/2A000080, on
              timeline ID 1
              The system cannot find the file specified.
              2026-01-27 17:19:25.352 GMT startup[2488] DEBUG: could not restore
              file "00000001000000000000002A" from archive: child process exited
              with exit code 1
              I think that must be a message from "copy" (which we seem to be using for
              restore_command on windows).
              I don't know why the standby is created with has_restoring => 1. But it
              shouldn't be related to the issue, I think?
              Greetings,
              Andres Freund
              • Jump to comment-1
                Greg Burd<greg@burd.me>
                Jan 28, 2026, 6:02 PM UTC
                On Tue, Jan 27, 2026, at 1:17 PM, Andres Freund wrote:
                Hi,

                On 2026-01-27 12:42:51 -0500, Robert Haas wrote:
                I tried sticking a pg_sleep(30) in just before starting the standby
                node, and that didn't help, so it doesn't seem like it's a race
                condition.

                Interesting.

                It could be worth trying to run the test in isolation, without all the other
                concurrent tests.

                Greg, have you tried to repro it interactively?
                Nope, not yet. I'm working on my ailing animals now and updated unicorn to include injection points.
                -greg
                Bryan, you seem to have become the resident windows expert...

                2026-01-27 17:19:25.337 GMT startup[2488] LOG: starting backup
                recovery with redo LSN 0/2A000028, checkpoint LSN 0/2A000080, on
                timeline ID 1
                The system cannot find the file specified.
                2026-01-27 17:19:25.352 GMT startup[2488] DEBUG: could not restore
                file "00000001000000000000002A" from archive: child process exited
                with exit code 1

                I think that must be a message from "copy" (which we seem to be using for
                restore_command on windows).
                I don't know why the standby is created with has_restoring => 1. But it
                shouldn't be related to the issue, I think?

                Greetings,

                Andres Freund
              • Jump to comment-1
                Amit Kapila<amit.kapila16@gmail.com>
                Jan 28, 2026, 11:20 AM UTC
                On Tue, Jan 27, 2026 at 11:47 PM Andres Freund <andres@anarazel.de> wrote:
                I don't know why the standby is created with has_restoring => 1.
                This is not required. I think this is copy-paste oversight.
                But it
                shouldn't be related to the issue, I think?
                Yeah, tried without this as well apart from other experiments.
                --
                With Regards,
                Amit Kapila.
            • Jump to comment-1
              Robert Haas<robertmhaas@gmail.com>
              Jan 27, 2026, 6:16 PM UTC
              On Tue, Jan 27, 2026 at 12:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
              2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: didn't need to
              unlink permanent stats file "pg_stat/pgstat.stat" - didn't exist
              2026-01-27 17:19:38.938 GMT startup[2488] FATAL: could not rename
              file "backuplabel" to "backuplabel.old": Permission denied
              Andrey Borodin pointed out to me off-list that there's a retry loop in
              pgrename(). The 13 second delay between the above two log messages
              almost certainly means that retry loop is iterating until it hits its
              10 second timeout. This almost certainly means that the underlying
              Windows error is ERRORACCESSDENIED, ERRORSHARINGVIOLATION, or
              ERRORLOCKVIOLATION, and that somebody else has the file open. But
              nothing other than Perl touches that directory before we try to start
              the standby:
              my $standby = PostgreSQL::Test::Cluster->new('standby');
              $standby->initfrombackup(
                  $primary, $backup_name,
                  has_streaming => 1,
                  has_restoring => 1);
              $standby->append_conf(
                  'postgresql.conf', qq(
              hotstandbyfeedback = on
              primaryslotname = 'phys_slot'
              primaryconninfo = '$connstr1 dbname=postgres'
              logminmessages = 'debug2'
              ));
              $standby->start;
              As far as I can see, only initfrombackup() touches the backup_label
              file, and that just copies the directory using RecursiveCopy.pm, which
              as far as I can tell is quite careful about closing file handles. So I
              still have no idea what's happening here.
              --
              Robert Haas
              EDB: http://www.enterprisedb.com
              • Jump to comment-1
                Amit Kapila<amit.kapila16@gmail.com>
                Jan 28, 2026, 5:48 AM UTC
                On Tue, Jan 27, 2026 at 11:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
                On Tue, Jan 27, 2026 at 12:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
                2026-01-27 17:19:25.354 GMT startup[2488] DEBUG: didn't need to
                unlink permanent stats file "pg_stat/pgstat.stat" - didn't exist
                2026-01-27 17:19:38.938 GMT startup[2488] FATAL: could not rename
                file "backuplabel" to "backuplabel.old": Permission denied

                Andrey Borodin pointed out to me off-list that there's a retry loop in
                pgrename(). The 13 second delay between the above two log messages
                almost certainly means that retry loop is iterating until it hits its
                10 second timeout.
                Yes, this is correct. I am able to reproduce it. In pgrename(), we use
                MoveFileEx() windows API which fails with errorcode 32 which further
                maps to doserrr 13 via _dosmaperr. It is following mapping
                ERRORSHARINGVIOLATION, EACCES in doserrors struct.
                This almost certainly means that the underlying
                Windows error is ERRORACCESSDENIED, ERRORSHARINGVIOLATION, or
                ERRORLOCKVIOLATION, and that somebody else has the file open.
                It is ERRORSHARINGVIOLATION.
                But
                nothing other than Perl touches that directory before we try to start
                the standby:
                my $standby = PostgreSQL::Test::Cluster->new('standby');
                $standby->initfrombackup(
                $primary, $backup_name,
                has_streaming => 1,
                has_restoring => 1);
                $standby->append_conf(
                'postgresql.conf', qq(
                hotstandbyfeedback = on
                primaryslotname = 'phys_slot'
                primaryconninfo = '$connstr1 dbname=postgres'
                logminmessages = 'debug2'
                ));
                $standby->start;

                As far as I can see, only initfrombackup() touches the backup_label
                file, and that just copies the directory using RecursiveCopy.pm, which
                as far as I can tell is quite careful about closing file handles. So I
                still have no idea what's happening here.
                It is not clear to me either why the similar test like
                040standbyfailoverslotssync is successful and
                046checkpointlogical_slot is failing. I am still thinking about it
                but thought of sharing the information I could gather by debugging.
                Do let me know if you could think of gathering any other information
                which can be of help here.
                --
                With Regards,
                Amit Kapila.
                • Jump to comment-1
                  Amit Kapila<amit.kapila16@gmail.com>
                  Jan 28, 2026, 10:47 AM UTC
                  On Wed, Jan 28, 2026 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

                  It is not clear to me either why the similar test like
                  040standbyfailoverslotssync is successful and
                  046checkpointlogical_slot is failing. I am still thinking about it
                  but thought of sharing the information I could gather by debugging.
                  It seems there is some interaction with previous test in same file
                  which is causing this failure as we are using the primary node from
                  previous test. When I tried to comment out get_changes and its
                  corresponding injection_point in the previous test as attached, the
                  entire test passed. I think if we use a freshly created primary node,
                  this test will pass but I wanted to spend some more time to see
                  how/why previous test is causing this issue?
                  --
                  With Regards,
                  Amit Kapila.
                  • Jump to comment-1
                    Amit Kapila<amit.kapila16@gmail.com>
                    Jan 28, 2026, 12:35 PM UTC
                    On Wed, Jan 28, 2026 at 4:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
                    On Wed, Jan 28, 2026 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

                    It is not clear to me either why the similar test like
                    040standbyfailoverslotssync is successful and
                    046checkpointlogical_slot is failing. I am still thinking about it
                    but thought of sharing the information I could gather by debugging.

                    It seems there is some interaction with previous test in same file
                    which is causing this failure as we are using the primary node from
                    previous test. When I tried to comment out get_changes and its
                    corresponding injection_point in the previous test as attached, the
                    entire test passed. I think if we use a freshly created primary node,
                    this test will pass but I wanted to spend some more time to see
                    how/why previous test is causing this issue?
                    I noticed that the previous test didn't quitted the background psql
                    session used for concurrent checkpoint. By quitting that background
                    session, the test passed for me consistently. See attached. It is
                    written in comments atop background_psql: "Be sure to "quit" the
                    returned object when done with it.". Now, this background session
                    doesn't directly access the backup_label file but it could be
                    accessing one of the parent directories where backup_label is present.
                    One of gen-AI says as follows: "In Windows, MoveFileEx (Error 32:
                    ERRORSHARINGVIOLATION) can fail if a process is accessing the file's
                    parent directory in a way that creates a lock. While the error message
                    usually points to the file itself, the parent folder is a critical
                    part of the operation.". I admit that I don't know the internals of
                    MoveFileEx, so can't say with complete conviction but the attached
                    sounds like a reasonable fix. Can anyone else who can reproduce the
                    issue once test the attached patch and share the results?
                    Does this fix/theory sound plausible?
                    --
                    With Regards,
                    Amit Kapila.
                    • Jump to comment-1
                      Andres Freund<andres@anarazel.de>
                      Jan 28, 2026, 4:54 PM UTC
                      Hi,
                      On 2026-01-28 18:05:10 +0530, Amit Kapila wrote:
                      I noticed that the previous test didn't quitted the background psql
                      session used for concurrent checkpoint. By quitting that background
                      session, the test passed for me consistently. See attached. It is
                      written in comments atop background_psql: "Be sure to "quit" the
                      returned object when done with it.". Now, this background session
                      doesn't directly access the backup_label file but it could be
                      accessing one of the parent directories where backup_label is present.
                      Hm. I've seen (and complained about [1]) weird errors when not shutting down
                      IPC::Run processes - mostly the test hanging at the end though.
                      One of gen-AI says as follows: "In Windows, MoveFileEx (Error 32:
                      ERRORSHARINGVIOLATION) can fail if a process is accessing the file's
                      parent directory in a way that creates a lock. While the error message
                      usually points to the file itself, the parent folder is a critical
                      part of the operation.".
                      I don't see how that could be the plausible reason - after all we have a lot
                      of other open files open in the relevant directories. But: It seems to fix
                      the problem for you, so it's worth going for it, as it's the right thing to do
                      anyway.
                      I think it'd be worth, separately from committing the workaround, trying to
                      figure out what's holding the file open. Andrey observed that the tests pass
                      for him with a much longer timeout. If you can reproduce it locally, I'd try
                      to use something like [2] to see what has handles open to the relevant files,
                      while waiting for the timeout.
                      Greetings,
                      Andres Freund
                      [1] https://postgr.es/m/20240619030727.ldp3mcrjbd5fqwj5%40awork3.anarazel.de
                      [2] https://learn.microsoft.com/en-us/sysinternals/downloads/handle
                      • Jump to comment-1
                        Amit Kapila<amit.kapila16@gmail.com>
                        Jan 29, 2026, 1:36 PM UTC
                        On Wed, Jan 28, 2026 at 10:24 PM Andres Freund <andres@anarazel.de> wrote:

                        I think it'd be worth, separately from committing the workaround, trying to
                        figure out what's holding the file open. Andrey observed that the tests pass
                        for him with a much longer timeout. If you can reproduce it locally, I'd try
                        to use something like [2] to see what has handles open to the relevant files,
                        while waiting for the timeout.
                        Thanks for the suggestion. I did some experiments by using handle.exe
                        and below are the results. To get the results, I added a long sleep
                        before rename of backup_label file.
                        After Fix:
                        ==========
                        handle.exe D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        Nthandle v5.0 - Handle viewer
                        Copyright (C) 1997-2022 Mark Russinovich
                        Sysinternals - www.sysinternals.com
                        No matching handles found.
                        Before Fix:
                        ==========
                        handle.exe D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        Nthandle v5.0 - Handle viewer
                        Copyright (C) 1997-2022 Mark Russinovich
                        Sysinternals - www.sysinternals.com
                        perl.exe pid: 33784 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        pg_ctl.exe pid: 51236 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        cmd.exe pid: 35332 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        postgres.exe pid: 48200 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        postgres.exe pid: 7420 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        postgres.exe pid: 17160 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        postgres.exe pid: 56192 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        postgres.exe pid: 53892 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        postgres.exe pid: 44732 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        postgres.exe pid: 43488 type: File 30C:
                        D:\Workspace\Postgresql\head\postgresql\build\testrun\recovery\046checkpointlogicalslot\data\t046checkpointlogicalslotstandbydata\pgdata\backuplabel
                        All the shown postgres processes are various standby processes. Below
                        are details of each postgres process:
                        43488: startup process
                        XLogCtl->SharedRecoveryState RECOVERYSTATEARCHIVE (1)
                        44732: bgwriter:
                        XLogCtl->SharedRecoveryState RECOVERYSTATEARCHIVE (1)
                        53892: checkpointer
                        XLogCtl->SharedRecoveryState RECOVERYSTATEARCHIVE (1)
                        56192: aio-worker
                        XLogCtl->SharedRecoveryState RECOVERYSTATEARCHIVE (1)
                        17160: aio-worker
                        XLogCtl->SharedRecoveryState RECOVERYSTATEARCHIVE (1)
                        7420: aio-worker
                        XLogCtl->SharedRecoveryState RECOVERYSTATEARCHIVE (1)
                        48200: postmaster
                        XLogCtl->SharedRecoveryState RECOVERYSTATEARCHIVE (1)
                        I printed XLogCtl->SharedRecoveryState to show all are standby processes.
                        The results are a bit strange in the sense that some unfinished psql
                        sessions of primary could lead standby processes to be shown in
                        results of handle.exe.
                        Note: I have access to this environment till tomorrow noon, so I can
                        try to investigate a bit tomorrow if there are more questions related
                        to the above experiment.
                        --
                        With Regards,
                        Amit Kapila.
                      • Jump to comment-1
                        Andrey Borodin<x4mmm@yandex-team.ru>
                        Jan 28, 2026, 6:09 PM UTC
                        On 28 Jan 2026, at 21:53, Andres Freund <andres@anarazel.de> wrote:

                        Andrey observed that the tests pass
                        for him with a much longer timeout.
                        Unfortunately, I was wrong. The job "Windows - Server 2022, MinGW64 - Meson" which failed yesterday did not fail today.
                        But it did not succeed either. CirrusCI seems just did not run it. I do not understand why.
                        Anyway, I cannot prove that it is race condition. On a contrary, test fails on any big timeout (pg_ctl will bail out) deterministically.
                        Best regards, Andrey Borodin.
                    • Jump to comment-1
                      Robert Haas<robertmhaas@gmail.com>
                      Jan 28, 2026, 12:58 PM UTC
                      On Wed, Jan 28, 2026 at 7:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
                      Does this fix/theory sound plausible?
                      I wondered about this yesterday, too. I didn't actually understand how
                      the existence of the background psql could be causing the failure, but
                      I thought it might be. However, I couldn't figure out the correct
                      incantation to get rid of it in my testing, as I thought I would need
                      to detach the injection point first or something.
                      If it fixes it for you, I would suggest committing promptly. I think
                      we are too dependent on CI now to leave it broken for any period of
                      time, and indeed I suggest getting set up so that you test your
                      commits against it before committing.
                      --
                      Robert Haas
                      EDB: http://www.enterprisedb.com
                      • Jump to comment-1
                        Amit Kapila<amit.kapila16@gmail.com>
                        Jan 28, 2026, 3:01 PM UTC
                        On Wed, Jan 28, 2026 at 6:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
                        On Wed, Jan 28, 2026 at 7:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
                        Does this fix/theory sound plausible?

                        I wondered about this yesterday, too. I didn't actually understand how
                        the existence of the background psql could be causing the failure, but
                        I thought it might be. However, I couldn't figure out the correct
                        incantation to get rid of it in my testing, as I thought I would need
                        to detach the injection point first or something.
                        Yeah, it would be better to quit these sessions after the test is
                        complete because there are other two background sessions as well. I
                        used the method to quit these sessions as used in
                        \src\test\modules\testmisc\t\005timeouts.pl. The attached passes for
                        me on both Linux and Windows (check on HEAD only as of now). I'll do
                        some more testing on back branches as well and push tomorrow morning
                        if there are no more comments.
                        --
                        With Regards,
                        Amit Kapila.
                • Jump to comment-1
                  Andrey Borodin<x4mmm@yandex-team.ru>
                  Jan 28, 2026, 10:45 AM UTC
                  On 28 Jan 2026, at 10:47, Amit Kapila <amit.kapila16@gmail.com> wrote:

                  Do let me know if you could think of gathering any other information
                  which can be of help here.
                  Interestingly, increasing timeout in pgrename() to 500 seconds fixes "Windows - Server 2022, VS 2019 - Meson & ninja ", but does not fix "Windows - Server 2022, VS 2019 - Meson & ninja".
                  diff --git a/src/port/dirmod.c b/src/port/dirmod.c
                  index 467b50d6f09..da38e37aa45 100644
                  --- a/src/port/dirmod.c
                  +++ b/src/port/dirmod.c
                  @@ -88,7 +88,7 @@ pgrename(const char from, const char to)
                                      return -1;
                  #endif
                  - if (++loops > 100) / time out after 10 sec /
                  + if (++loops > 5000) / time out after 10 sec /
                                      return -1;
                              pg_usleep(100000);              /* us */
                      }
                  Best regards, Andrey Borodin.
      • Jump to comment-1
        Tom Lane<tgl@sss.pgh.pa.us>
        Jan 27, 2026, 3:11 PM UTC
        Robert Haas <robertmhaas@gmail.com> writes:
        On Tue, Jan 27, 2026 at 12:56 AM Amit Kapila <akapila@postgresql.org> wrote:
        Prevent invalidation of newly synced replication slots.
        This commit has broken CI for me.
        Hmm, I wonder why the buildfarm seems fine with it ... I'm prepared
        to believe a Windows-only problem, but at least hamerkop has run
        since 851f664.
        		regards, tom lane
        • Jump to comment-1
          Robert Haas<robertmhaas@gmail.com>
          Jan 27, 2026, 3:52 PM UTC
          On Tue, Jan 27, 2026 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
          Robert Haas <robertmhaas@gmail.com> writes:
          On Tue, Jan 27, 2026 at 12:56 AM Amit Kapila <akapila@postgresql.org> wrote:
          Prevent invalidation of newly synced replication slots.
          This commit has broken CI for me.

          Hmm, I wonder why the buildfarm seems fine with it ... I'm prepared
          to believe a Windows-only problem, but at least hamerkop has run
          since 851f664.
          I don't understand it, either. There's a bunch of error codes that we
          map to EACCES in _dosmaperr, but I don't know why any of those
          problems would have occurred here:
          ERRORACCESSDENIED, EACCES
          ERRORCURRENTDIRECTORY, EACCES
          ERRORLOCKVIOLATION, EACCES
          ERRORSHARINGVIOLATION, EACCES
          ERRORNETWORKACCESS_DENIED, EACCES
          ERRORCANNOTMAKE, EACCES
          ERRORFAILI24, EACCES
          ERRORDRIVELOCKED, EACCES
          ERRORSEEKON_DEVICE, EACCES
          ERRORNOTLOCKED, EACCES
          ERRORLOCKFAILED, EACCES
          (Side note: Wouldn't it make a lot of sense to go back and kill
          _dosmaperr in favor of display the actual Windows error code string?)
          What's also puzzling is that what this test is doing seems to be
          totally standard. 040standbyfailoverslotssync.pl does this:
          my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
          $standby1->initfrombackup(
              $primary, $backup_name,
              has_streaming => 1,
              has_restoring => 1);
          And 046checkpontlogical_slot.pl does this:
          my $standby = PostgreSQL::Test::Cluster->new('standby');
          $standby->initfrombackup(
          $primary, $backup_name,
          has_streaming => 1,
          has_restoring => 1);
          So why is 046 failing and 040 is fine? I have no idea.
          --
          Robert Haas
          EDB: http://www.enterprisedb.com
          • Jump to comment-1
            Tom Lane<tgl@sss.pgh.pa.us>
            Jan 27, 2026, 4:11 PM UTC
            Robert Haas <robertmhaas@gmail.com> writes:
            What's also puzzling is that what this test is doing seems to be
            totally standard.
            Yeah. I do notice something interesting when running it here:
            046checkpointlogicalslotmike.log shows that we are triggering
            quite a few checkpoints (via pgswitchwal()) in quick succession
            on the primary. I wonder if that is somehow tickling a Windows
            filesystem restriction.
            		regards, tom lane
            • Jump to comment-1
              Robert Haas<robertmhaas@gmail.com>
              Jan 27, 2026, 4:18 PM UTC
              On Tue, Jan 27, 2026 at 11:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
              Robert Haas <robertmhaas@gmail.com> writes:
              What's also puzzling is that what this test is doing seems to be
              totally standard.

              Yeah. I do notice something interesting when running it here:
              046checkpointlogicalslotmike.log shows that we are triggering
              quite a few checkpoints (via pgswitchwal()) in quick succession
              on the primary. I wonder if that is somehow tickling a Windows
              filesystem restriction.
              Maybe, but it seems unlikely to me that this would mess up the
              standby, since it's a totally different node. What I kind of wonder is
              if somehow there's still a process that has backup_label open, or has
              closed it but not recently enough for Windows to unlock it. However, I
              don't see why that would affect this test case and not others.
              --
              Robert Haas
              EDB: http://www.enterprisedb.com
          • Jump to comment-1
            Andres Freund<andres@anarazel.de>
            Jan 27, 2026, 4:17 PM UTC
            Hi,
            On 2026-01-27 10:51:58 -0500, Robert Haas wrote:
            I don't understand it, either. There's a bunch of error codes that we
            map to EACCES in _dosmaperr, but I don't know why any of those
            problems would have occurred here:

            ERRORACCESSDENIED, EACCES
            ERRORCURRENTDIRECTORY, EACCES
            ERRORLOCKVIOLATION, EACCES
            ERRORSHARINGVIOLATION, EACCES
            ERRORNETWORKACCESS_DENIED, EACCES
            ERRORCANNOTMAKE, EACCES
            ERRORFAILI24, EACCES
            ERRORDRIVELOCKED, EACCES
            ERRORSEEKON_DEVICE, EACCES
            ERRORNOTLOCKED, EACCES
            ERRORLOCKFAILED, EACCES

            (Side note: Wouldn't it make a lot of sense to go back and kill
            _dosmaperr in favor of display the actual Windows error code string?)
            It'd be great to somehow preserve the mapping to preserve the original error
            message, but I don't really see how we could just give up on our mapping. We
            rely on e.g. knowing that a read failed due to ENOENT, not
            ERRORFILENOT_FOUND or whatnot.
            What's also puzzling is that what this test is doing seems to be
            totally standard. 040standbyfailoverslotssync.pl does this:
            my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
            $standby1->initfrombackup(
            $primary, $backup_name,
            has_streaming => 1,
            has_restoring => 1);

            And 046checkpontlogical_slot.pl does this:
            my $standby = PostgreSQL::Test::Cluster->new('standby');
            $standby->initfrombackup(
            $primary, $backup_name,
            has_streaming => 1,
            has_restoring => 1);

            So why is 046 failing and 040 is fine? I have no idea.
            046 does a fair bit of stuff before the base backup is being taken, I guess?
            But what that concretely could be, I have no idea.
            It'd be one thing if it failed while creating a base backup, but the fact that
            it allows the base backup being created, but then fails during startup is just
            plain odd. The typical sharing violation issue seems like it'd require that
            we somehow are not waiting for pg_basebackup to actually have terminated?
            Greetings,
            Andres Freund
        • Jump to comment-1
          Tom Lane<tgl@sss.pgh.pa.us>
          Jan 27, 2026, 3:49 PM UTC
          I wrote:
          Robert Haas <robertmhaas@gmail.com> writes:
          This commit has broken CI for me.
          Hmm, I wonder why the buildfarm seems fine with it ... I'm prepared
          to believe a Windows-only problem, but at least hamerkop has run
          since 851f664.
          D'oh: hamerkop doesn't run any TAP tests, let alone ones that require
          --enable-injection-points. So that success proves nothing.
          Our other Windows animals (drongo, fairywren, unicorn) seem to be
          configured with -Dtap_tests=enabled, but nothing about injection
          points, so they will also skip 046checkpointlogical_slot.
          Seems like a bit of a blind spot in the buildfarm.
          		regards, tom lane
          • Jump to comment-1
            Greg Burd<greg@burd.me>
            Jan 27, 2026, 4:53 PM UTC
            On Jan 27, 2026, at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

            I wrote:
            Robert Haas <robertmhaas@gmail.com> writes:
            This commit has broken CI for me.
            Hmm, I wonder why the buildfarm seems fine with it ... I'm prepared
            to believe a Windows-only problem, but at least hamerkop has run
            since 851f664.

            D'oh: hamerkop doesn't run any TAP tests, let alone ones that require
            --enable-injection-points. So that success proves nothing.

            Our other Windows animals (drongo, fairywren, unicorn) seem to be
            configured with -Dtap_tests=enabled, but nothing about injection
            points, so they will also skip 046checkpointlogical_slot.
            Seems like a bit of a blind spot in the buildfarm.

            regards, tom lane
            I'll see if I can update unicorn today to enable injection points to add some coverage on Win11/ARM64/MSVC. No promises that will be diagnostic at all, but it seems like a good idea.
            -Dinjection_points=true
            -greg
            • Jump to comment-1
              Robert Haas<robertmhaas@gmail.com>
              Jan 27, 2026, 5:11 PM UTC
              On Tue, Jan 27, 2026 at 11:53 AM Greg Burd <greg@burd.me> wrote:
              I'll see if I can update unicorn today to enable injection points to add some coverage on Win11/ARM64/MSVC. No promises that will be diagnostic at all, but it seems like a good idea.
              -Dinjection_points=true
              Sounds good!
              Thanks,
              --
              Robert Haas
              EDB: http://www.enterprisedb.com