pgsql-hackers
❮
Add tests for concurrent DML retry paths in logical replication apply
- Jump to comment-1Bharath Rupireddy<bharath.rupireddyforpostgres@gmail.com>Apr 27, 2026, 7:31 AM UTCHi hackers,
While reading the logical replication apply code in execReplication.c, I
noticed that the retry paths in RelationFindReplTupleByIndex and
RelationFindReplTupleSeq for concurrent updates and deletes have no test
coverage [1]. Specifically, when the same tuple is being updated/deleted on
the publisher and subscriber at the same time, the dirty snapshot finds the
tuple being modified by another transaction, the apply worker waits and
retries the index/sequential scan.
The attached patch adds an injection point before tabletuplelock and a
TAP test exercising these retry paths, hitting both TM_Updated and
TM_Deleted.
While working on this, I also noticed minor issues in the conflict handling
code:
1/ In RelationFindReplTupleByIndex, ExecMaterializeSlot was called before
checking shouldrefetchtuple. If the tuple needs to be refetched due to a
concurrent modification, this materialization is wasted work. Moved it
after the retry check, so it only runs when we've successfully locked the
tuple.
2/ In RelationFindReplTupleSeq, ExecCopySlot and a separate TupleTableSlot
allocation were unnecessary. Made this function consistent with
RelationFindReplTupleByIndex by using outslot directly while scanning the
heap, avoiding the extra TTS allocation and copy overhead.
I'm aware that these are not major performance issues in practice, but it
keeps the two functions consistent and avoids unnecessary TTS materialize
and copy costs.
I also think that we could deduplicate these two functions since the code
looks mostly the same, but that would be an overkill IMHO.
Please find the attached patch. Appreciate any feedback. Thank you!
[1]
https://coverage.postgresql.org/src/backend/executor/execReplication.c.gcov.html
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com- Jump to comment-1Hayato Kuroda (Fujitsu)<kuroda.hayato@fujitsu.com>Apr 28, 2026, 7:48 AM UTCDear Bharath,
While reading the logical replication apply code in execReplication.c, I noticed
Good catch.
that the retry paths in RelationFindReplTupleByIndex and RelationFindReplTupleSeq
for concurrent updates and deletes have no test coverage [1]. Specifically,
when the same tuple is being updated/deleted on the publisher and subscriber at
the same time, the dirty snapshot finds the tuple being modified by another
transaction, the apply worker waits and retries the index/sequential scan.The attached patch adds an injection point before tabletuplelock and a TAP test
I read the code briefly. Here are questions:
exercising these retry paths, hitting both TMUpdated and TMDeleted.
1.
The test looks like to add the test for retry acquiring the lock. But there is
another retry path, which waits till xwait finishes. Do you have a reason to
skip testing?
2.
Is it OK to use the same injection point name twice? I cannot find the existing
case.
Best regards,
Hayato Kuroda
FUJITSU LIMITED- Jump to comment-1Bharath Rupireddy<bharath.rupireddyforpostgres@gmail.com>Apr 29, 2026, 1:46 AM UTCHi,
On Tue, Apr 28, 2026 at 12:47 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:While reading the logical replication apply code in execReplication.c, I noticed
that the retry paths in RelationFindReplTupleByIndex and RelationFindReplTupleSeq
for concurrent updates and deletes have no test coverage [1]. Specifically,
when the same tuple is being updated/deleted on the publisher and subscriber at
the same time, the dirty snapshot finds the tuple being modified by another
transaction, the apply worker waits and retries the index/sequential scan.
Good catch.The attached patch adds an injection point before tabletuplelock and a TAP test
exercising these retry paths, hitting both TMUpdated and TMDeleted.
Thank you for reviewing!
I read the code briefly. Here are questions:1.
There are two retry paths, and I added TAP tests covering both. The
The test looks like to add the test for retry acquiring the lock. But there is
another retry path, which waits till xwait finishes. Do you have a reason to
skip testing?
XactLockTableWait path is tested by holding an in-progress transaction
on the subscriber, which naturally blocks the apply worker until the
transaction finishes, then it retries the scan. The tabletuplelock
retry path (TMUpdated / TMDeleted) is tested using an injection
point that pauses the worker between finding and locking the tuple,
allowing concurrent DML to intervene.2.
Good catch. I used separate injection points for each path.
Is it OK to use the same injection point name twice? I cannot find the existing
case.
Please find the attached v2 patch for further review. Thank you!
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com