Add tests for concurrent DML retry paths in logical replication apply

  • Jump to comment-1
    Bharath Rupireddy<bharath.rupireddyforpostgres@gmail.com>
    Apr 27, 2026, 7:31 AM UTC
    Hi hackers,
    While reading the logical replication apply code in execReplication.c, I
    noticed that the retry paths in RelationFindReplTupleByIndex and
    RelationFindReplTupleSeq for concurrent updates and deletes have no test
    coverage [1]. Specifically, when the same tuple is being updated/deleted on
    the publisher and subscriber at the same time, the dirty snapshot finds the
    tuple being modified by another transaction, the apply worker waits and
    retries the index/sequential scan.
    The attached patch adds an injection point before tabletuplelock and a
    TAP test exercising these retry paths, hitting both TM_Updated and
    TM_Deleted.
    While working on this, I also noticed minor issues in the conflict handling
    code:
    1/ In RelationFindReplTupleByIndex, ExecMaterializeSlot was called before
    checking shouldrefetchtuple. If the tuple needs to be refetched due to a
    concurrent modification, this materialization is wasted work. Moved it
    after the retry check, so it only runs when we've successfully locked the
    tuple.
    2/ In RelationFindReplTupleSeq, ExecCopySlot and a separate TupleTableSlot
    allocation were unnecessary. Made this function consistent with
    RelationFindReplTupleByIndex by using outslot directly while scanning the
    heap, avoiding the extra TTS allocation and copy overhead.
    I'm aware that these are not major performance issues in practice, but it
    keeps the two functions consistent and avoids unnecessary TTS materialize
    and copy costs.
    I also think that we could deduplicate these two functions since the code
    looks mostly the same, but that would be an overkill IMHO.
    Please find the attached patch. Appreciate any feedback. Thank you!
    [1]
    https://coverage.postgresql.org/src/backend/executor/execReplication.c.gcov.html
    --
    Bharath Rupireddy
    Amazon Web Services: https://aws.amazon.com
    • Jump to comment-1
      Hayato Kuroda (Fujitsu)<kuroda.hayato@fujitsu.com>
      Apr 28, 2026, 7:48 AM UTC
      Dear Bharath,
      While reading the logical replication apply code in execReplication.c, I noticed
      that the retry paths in RelationFindReplTupleByIndex and RelationFindReplTupleSeq
      for concurrent updates and deletes have no test coverage [1]. Specifically,
      when the same tuple is being updated/deleted on the publisher and subscriber at
      the same time, the dirty snapshot finds the tuple being modified by another
      transaction, the apply worker waits and retries the index/sequential scan.
      Good catch.
      The attached patch adds an injection point before tabletuplelock and a TAP test
      exercising these retry paths, hitting both TMUpdated and TMDeleted.
      I read the code briefly. Here are questions:
      1.
      The test looks like to add the test for retry acquiring the lock. But there is
      another retry path, which waits till xwait finishes. Do you have a reason to
      skip testing?
      2.
      Is it OK to use the same injection point name twice? I cannot find the existing
      case.
      Best regards,
      Hayato Kuroda
      FUJITSU LIMITED
      • Jump to comment-1
        Bharath Rupireddy<bharath.rupireddyforpostgres@gmail.com>
        Apr 29, 2026, 1:46 AM UTC
        Hi,
        On Tue, Apr 28, 2026 at 12:47 AM Hayato Kuroda (Fujitsu)
        <kuroda.hayato@fujitsu.com> wrote:
        While reading the logical replication apply code in execReplication.c, I noticed
        that the retry paths in RelationFindReplTupleByIndex and RelationFindReplTupleSeq
        for concurrent updates and deletes have no test coverage [1]. Specifically,
        when the same tuple is being updated/deleted on the publisher and subscriber at
        the same time, the dirty snapshot finds the tuple being modified by another
        transaction, the apply worker waits and retries the index/sequential scan.

        Good catch.
        The attached patch adds an injection point before tabletuplelock and a TAP test
        exercising these retry paths, hitting both TMUpdated and TMDeleted.

        I read the code briefly. Here are questions:
        Thank you for reviewing!
        1.
        The test looks like to add the test for retry acquiring the lock. But there is
        another retry path, which waits till xwait finishes. Do you have a reason to
        skip testing?
        There are two retry paths, and I added TAP tests covering both. The
        XactLockTableWait path is tested by holding an in-progress transaction
        on the subscriber, which naturally blocks the apply worker until the
        transaction finishes, then it retries the scan. The tabletuplelock
        retry path (TMUpdated / TMDeleted) is tested using an injection
        point that pauses the worker between finding and locking the tuple,
        allowing concurrent DML to intervene.
        2.
        Is it OK to use the same injection point name twice? I cannot find the existing
        case.
        Good catch. I used separate injection points for each path.
        Please find the attached v2 patch for further review. Thank you!
        --
        Bharath Rupireddy
        Amazon Web Services: https://aws.amazon.com