pgsql-hackers
❮
[PATCH] Fix stale relation close in sequence synchronization
- Jump to comment-1Ayush Tiwari<ayushtiwari.slg01@gmail.com>Apr 28, 2026, 11:24 AM UTCHi,
I found a crash in the logical replication sequence synchronization worker
when the publisher returns NULL sequence data for a sequence after at least
one sequence in the same sync batch has already been processed.
One way to reproduce this is to use a subscription that connects to the
publisher as a replication role that can read one published sequence but
cannot read another one. pggetsequence_data() returns NULLs for the
inaccessible sequence. In getandvalidateseqinfo(), that path returns
COPYSEQ_SKIPPED before assigning a new value to the Relation output
argument.
copy_sequences() then still sees the Relation pointer left from the previous
row and calls table_close() on it again. On a cassert build, this trips:
The attached patch clears the output Relation pointer at the start ofTRAP: failed Assert("rel->rd_refcnt > 0"); "relcache.c" file
getandvalidateseqinfo() and clears the local pointer in copy_sequences()
after closing it. That keeps early returns from reusing a relation from a
previous row.
The patch also adds a TAP test to 036_sequences.pl.
Thoughts?
Regards,
Ayush- Jump to comment-1Hayato Kuroda (Fujitsu)<kuroda.hayato@fujitsu.com>Apr 28, 2026, 12:14 PM UTCDear Ayush,
I found a crash in the logical replication sequence synchronization worker
Good catch. I confirmed the HEAD can crash with your added test.
when the publisher returns NULL sequence data for a sequence after at least
one sequence in the same sync batch has already been processed.The attached patch clears the output Relation pointer at the start of
To confirm; can't we declare the sequence_rel in the inner-loop? My first
getandvalidateseqinfo() and clears the local pointer in copy_sequences()
after closing it. That keeps early returns from reusing a relation from a
previous row.
impression was the bug caused by the wrong lifetime. Are there any other
thoughts around here?
Added Vignesh in CC because he was a primary author.
Best regards,
Hayato Kuroda
FUJITSU LIMITED- Jump to comment-1Ayush Tiwari<ayushtiwari.slg01@gmail.com>Apr 28, 2026, 12:34 PM UTCHi,
On Tue, 28 Apr 2026 at 17:44, Hayato Kuroda (Fujitsu) <
kuroda.hayato@fujitsu.com> wrote:Dear Ayush,
I found a crash in the logical replication sequence synchronization
worker
when the publisher returns NULL sequence data for a sequence after at
least
one sequence in the same sync batch has already been processed.
Thanks for checking and confirming the crash.
Good catch. I confirmed the HEAD can crash with your added test.The attached patch clears the output Relation pointer at the start of
getandvalidateseqinfo() and clears the local pointer incopy_sequences()
after closing it. That keeps early returns from reusing a relation from a
previous row.
I agree that declaring sequence_rel in the tuple-processing loop is
To confirm; can't we declare the sequence_rel in the inner-loop? My first
impression was the bug caused by the wrong lifetime. Are there any other
thoughts around here?
cleaner. The relation belongs only to the current publisher result row,
so limiting the variable's lifetime makes the intended ownership clearer
and prevents any value from carrying over to the next row.
I have kept the initialization of the output argument in
getandvalidateseqinfo(), so every return path leaves it in a defined
state. In v2, the caller-side pointer is declared inside the inner loop,
and the explicit reset after table_close() is no longer needed.
Attached is v2 with that change.
Regards,
Ayush- Jump to comment-1vignesh C<vignesh21@gmail.com>Apr 28, 2026, 1:18 PM UTCOn Tue, 28 Apr 2026 at 18:04, Ayush Tiwari <ayushtiwari.slg01@gmail.com> wrote:
Hi,On Tue, 28 Apr 2026 at 17:44, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear Ayush,I found a crash in the logical replication sequence synchronization worker
when the publisher returns NULL sequence data for a sequence after at least
one sequence in the same sync batch has already been processed.
Good catch. I confirmed the HEAD can crash with your added test.
Thanks for checking and confirming the crash.The attached patch clears the output Relation pointer at the start of
getandvalidateseqinfo() and clears the local pointer in copy_sequences()
after closing it. That keeps early returns from reusing a relation from a
previous row.
To confirm; can't we declare the sequence_rel in the inner-loop? My first
impression was the bug caused by the wrong lifetime. Are there any other
thoughts around here?
Thanks for finding and reporting the issue. I was able to reproduce
I agree that declaring sequence_rel in the tuple-processing loop is
cleaner. The relation belongs only to the current publisher result row,
so limiting the variable's lifetime makes the intended ownership clearer
and prevents any value from carrying over to the next row.
I have kept the initialization of the output argument in
getandvalidateseqinfo(), so every return path leaves it in a defined
state. In v2, the caller-side pointer is declared inside the inner loop,
and the explicit reset after table_close() is no longer needed.
Attached is v2 with that change.
the issue with the steps you provided.
Few comments:
1) Here "SELECT nextval('regressnoselect');" and "REVOKE ALL ON
SEQUENCE regressnoselect FROM PUBLIC;" is not required for this test
case:
+$nodepublisher->safepsql(
+ 'postgres', qq(
+ CREATE ROLE regressseqrepl LOGIN REPLICATION;
+ CREATE SEQUENCE regressnoselect;
+ SELECT nextval('regressnoselect');
+ GRANT USAGE ON SCHEMA public TO regressseqrepl;
+ GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO regressseqrepl;
+ REVOKE ALL ON SEQUENCE regressnoselect FROM PUBLIC;
+ REVOKE ALL ON SEQUENCE regressnoselect FROM regressseqrepl;
+));
2) Since the comment about “dropped concurrently” has been removed,
could you merge that context into the new wording:
- * last_value can be NULL if the sequence was dropped concurrently (see/*
- * pggetsequence_data()).
+ * The sequence data can be NULL if it is not accessible on the publisher
+ * (see pggetsequence_data()).
Something like:*/
/*
* The sequence data can be NULL due to insufficient privileges or if
* the sequence was dropped concurrently (see pggetsequence_data()).
*/
3) Can we change this:
##########
# A NULL sequence data row from the publisher must not make the subscriber
# close the previously synchronized sequence relation again.
##########
To something like:
##########
# Ensure that insufficient privileges on the publisher for a sequence
# do not disrupt the subscriber. The subscriber should log a warning
# and continue retrying.
##########
Regards,
Vignesh- Jump to comment-1Ayush Tiwari<ayushtiwari.slg01@gmail.com>Apr 28, 2026, 1:35 PM UTCHi,
Thanks for reviewing and confirming the issue.
On Tue, 28 Apr 2026 at 18:48, vignesh C <vignesh21@gmail.com> wrote:
Thanks for finding and reporting the issue. I was able to reproduce
the issue with the steps you provided.
Few comments:
1) Here "SELECT nextval('regressnoselect');" and "REVOKE ALL ON
SEQUENCE regressnoselect FROM PUBLIC;" is not required for this test
case:+$nodepublisher->safepsql(
+ 'postgres', qq(
Agreed. Removed both statements from the test.
+ CREATE ROLE regressseqrepl LOGIN REPLICATION;
+ CREATE SEQUENCE regressnoselect;
+ SELECT nextval('regressnoselect');
+ GRANT USAGE ON SCHEMA public TO regressseqrepl;
+ GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO regressseqrepl;
+ REVOKE ALL ON SEQUENCE regressnoselect FROM PUBLIC;
+ REVOKE ALL ON SEQUENCE regressnoselect FROM regressseqrepl;
+));2) Since the comment about “dropped concurrently” has been removed,
Done. The comment now mentions both insufficient privileges and
could you merge that context into the new wording:
concurrent drops.
Done. I used that wording for the TAP test block comment.
3) Can we change this:
##########
# A NULL sequence data row from the publisher must not make the subscriber
# close the previously synchronized sequence relation again.
##########
To something like:
##########
# Ensure that insufficient privileges on the publisher for a sequence
# do not disrupt the subscriber. The subscriber should log a warning
# and continue retrying.
##########
Attached is v3 with these changes.
Regards,
Ayush- Jump to comment-1vignesh C<vignesh21@gmail.com>Apr 30, 2026, 4:30 AM UTCOn Tue, 28 Apr 2026 at 19:05, Ayush Tiwari <ayushtiwari.slg01@gmail.com> wrote:
Few comments:
Hi,
Thanks for reviewing and confirming the issue.
Attached is v3 with these changes.
1) Since we are setting the variable to NULL for every sequence now,
this is not required:
@@ -246,6 +246,8 @@ getandvalidateseqinfo(TupleTableSlot *slot,
Relation *sequence_rel,
+ *sequence_rel = NULL;Form_pg_sequence local_seq; LogicalRepSequenceInfo *seqinfo_local;
+
2) Creating a subscription is costly as it has more work to do as it*seqidx = DatumGetInt32(slot_getattr(slot, ++col, &isnull)); Assert(!isnull);
has to sync all relations and requires more processes to be started,
instead shall we use "ALTER SUBSCRIPTION ... SET CONNECTION" followed
by "ALTER SUBSCRIPTION ... REFRESH SEQUENCES"
+$nodesubscriber->safepsql(
+ 'postgres',
+ "CREATE SUBSCRIPTION regressseqsubnoselect CONNECTION
'$publisherlimitedconnstr' PUBLICATION regressseqpub WITH
(disableonerror = true)"
+);
3) You can use sequence name as regress_s5 to be consistent with the
other sequence names nearby or alternatively you can change the privs
of an already existing sequence:
+ CREATE ROLE regressseqrepl LOGIN REPLICATION;
+ CREATE SEQUENCE regressnoselect;
+ GRANT USAGE ON SCHEMA public TO regressseqrepl;
4) I feel this is not required:
+$result = $nodesubscriber->safepsql('postgres', 'SELECT 1');
+is($result, '1',
+ 'subscriber remains running after publisher returns NULL
sequence data');
Regards,
Vignesh- Jump to comment-1Ayush Tiwari<ayushtiwari.slg01@gmail.com>Apr 30, 2026, 4:53 AM UTCHi,
Thanks for the review.
On Thu, 30 Apr 2026 at 10:00, vignesh C <vignesh21@gmail.com> wrote:
I had added it as defence-in-depth, but yeah can be removed.
Few comments:
1) Since we are setting the variable to NULL for every sequence now,
this is not required:
@@ -246,6 +246,8 @@ getandvalidateseqinfo(TupleTableSlot *slot,
Relation *sequence_rel,
Formpgsequence local_seq;
LogicalRepSequenceInfo *seqinfo_local;
+ *sequence_rel = NULL;
+
*seqidx = DatumGetInt32(slot_getattr(slot, ++col, &isnull));
Assert(!isnull);
2) Creating a subscription is costly as it has more work to do as it
has to sync all relations and requires more processes to be started,
instead shall we use "ALTER SUBSCRIPTION ... SET CONNECTION" followed
by "ALTER SUBSCRIPTION ... REFRESH SEQUENCES"+$nodesubscriber->safepsql(
+ 'postgres',
Makes sense.
+ "CREATE SUBSCRIPTION regressseqsubnoselect CONNECTION
'$publisherlimitedconnstr' PUBLICATION regressseqpub WITH
(disableonerror = true)"
+);
Sounds good.
3) You can use sequence name as regress_s5 to be consistent with the
other sequence names nearby or alternatively you can change the privs
of an already existing sequence:
+ CREATE ROLE regressseqrepl LOGIN REPLICATION;
+ CREATE SEQUENCE regressnoselect;
+ GRANT USAGE ON SCHEMA public TO regressseqrepl;
4) I feel this is not required:+$result = $nodesubscriber->safepsql('postgres', 'SELECT 1');
+is($result, '1',
Yeah, you are right.
+ 'subscriber remains running after publisher returns NULL
sequence data');
I've made these changes, attaching patch v4.
Please review and let me know.
Regards,
Ayush- Jump to comment-1vignesh C<vignesh21@gmail.com>Apr 30, 2026, 5:57 AM UTCOn Thu, 30 Apr 2026 at 10:23, Ayush Tiwari <ayushtiwari.slg01@gmail.com> wrote:
Hi,
Thanks for the review.On Thu, 30 Apr 2026 at 10:00, vignesh C <vignesh21@gmail.com> wrote:
Few comments:
1) Since we are setting the variable to NULL for every sequence now,
this is not required:
@@ -246,6 +246,8 @@ getandvalidateseqinfo(TupleTableSlot *slot,
Relation *sequence_rel,
Formpgsequence local_seq;
LogicalRepSequenceInfo *seqinfo_local;
+ *sequence_rel = NULL;
+
*seqidx = DatumGetInt32(slot_getattr(slot, ++col, &isnull));
Assert(!isnull);
I had added it as defence-in-depth, but yeah can be removed.
2) Creating a subscription is costly as it has more work to do as it
has to sync all relations and requires more processes to be started,
instead shall we use "ALTER SUBSCRIPTION ... SET CONNECTION" followed
by "ALTER SUBSCRIPTION ... REFRESH SEQUENCES"+$nodesubscriber->safepsql(
+ 'postgres',
+ "CREATE SUBSCRIPTION regressseqsubnoselect CONNECTION
'$publisherlimitedconnstr' PUBLICATION regressseqpub WITH
(disableonerror = true)"
+);
Makes sense.
3) You can use sequence name as regress_s5 to be consistent with the
other sequence names nearby or alternatively you can change the privs
of an already existing sequence:
+ CREATE ROLE regressseqrepl LOGIN REPLICATION;
+ CREATE SEQUENCE regressnoselect;
+ GRANT USAGE ON SCHEMA public TO regressseqrepl;
Sounds good.
4) I feel this is not required:+$result = $nodesubscriber->safepsql('postgres', 'SELECT 1');
+is($result, '1',
+ 'subscriber remains running after publisher returns NULL
sequence data');
Thanks for the updated patch. After this error occurs, it currently
Yeah, you are right.
I've made these changes, attaching patch v4.
Please review and let me know.
emits the message 'missing sequence on publisher
("public.regressnoselect")'. This can be misleading, as it suggests
the sequence does not exist, whereas the actual cause may be
insufficient privileges. As a result, users may find it difficult to
diagnose and resolve the issue.
Consider updating the error reported in reportsequenceerrors to
convey that the failure could be due to either a missing sequence or
insufficient privileges (e.g., "missing sequence or insufficient
privilege on publisher"). While making this change, also update the
missingseqsidx variable name and the associated comments to
accurately reflect the broader scope of the condition.
Regards,
Vignesh- Jump to comment-1Ayush Tiwari<ayushtiwari.slg01@gmail.com>Apr 30, 2026, 6:46 AM UTCHi,
On Thu, 30 Apr 2026 at 11:27, vignesh C <vignesh21@gmail.com> wrote:
Agreed. I have updated the warning to say "missing or inaccessible
Consider updating the error reported in reportsequenceerrors to
convey that the failure could be due to either a missing sequence or
insufficient privileges (e.g., "missing sequence or insufficient
privilege on publisher"). While making this change, also update the
missingseqsidx variable name and the associated comments to
accurately reflect the broader scope of the condition.
sequence on publisher", so it covers both cases where the sequence is
actually missing and cases where the publisher-side role cannot access
its sequence data.
I also renamed the related variables from missing* to unavailable*,
and updated the associated comments and DEBUG message to avoid implying
that the sequence is necessarily absent on the publisher.
The TAP expectations for both the dropped-sequence case and the
insufficient-privileges case have been updated to match the new wording.
Attached is v5 with these changes.
Thoughts?
Regards,
Ayush- Jump to comment-1Amit Kapila<amit.kapila16@gmail.com>Apr 30, 2026, 11:20 AM UTCOn Thu, Apr 30, 2026 at 12:16 PM Ayush Tiwari
<ayushtiwari.slg01@gmail.com> wrote:On Thu, 30 Apr 2026 at 11:27, vignesh C <vignesh21@gmail.com> wrote:
Consider updating the error reported in reportsequenceerrors to
convey that the failure could be due to either a missing sequence or
insufficient privileges (e.g., "missing sequence or insufficient
privilege on publisher"). While making this change, also update the
missingseqsidx variable name and the associated comments to
accurately reflect the broader scope of the condition.
Sequence sync can also be skipped if the same sequence is a temporary
Agreed.
sequence from some other session on publisher. So, the change in v5
was also incomplete if we want to make it more generic than now. I
have pushed v4 for now but if you or Vignesh would like to make the
message more generic, feel free to propose a new patch.
--
With Regards,
Amit Kapila.