pgsql-hackers
❮
BUG: Cascading standby fails to reconnect after falling back to archive recovery
- Jump to comment-1Marco Nenciarini<marco.nenciarini@enterprisedb.com>Jan 28, 2026, 5:03 PM UTCHi hackers,
I've encountered a bug in PostgreSQL's streaming replication where cascading
standbys fail to reconnect after falling back to archive recovery. The issue
occurs when the upstream standby uses archive-only recovery.
The standby requests streaming from the wrong WAL position (next segment
boundary
instead of the current position), causing connection failures with this
error:
Attached are two shell scripts that reliably reproduce the issue onERROR: requested starting point 0/A000000 is ahead of the WAL flush position of this server 0/9000000
PostgreSQL
17.x and 18.x:
1. reproducerrestartupstream_portable.sh - triggers by restarting upstream
2. reproducercascaderestart_portable.sh - triggers by restarting the
cascade
The scripts set up this topology:
- Primary with archiving enabled
- Standby using only archive recovery (no streaming from primary)
- Cascading standby streaming from the archive-only standby
When the cascade loses its streaming connection and falls back to archive
recovery,
it cannot reconnect. The issue appears to be in xlogrecovery.c around line
3880,
where the position passed to RequestXLogStreaming() determines which segment
boundary is requested.
The cascade restart reproducer shows that even restarting the cascade itself
triggers the bug, which affects routine maintenance operations.
Scripts require PostgreSQL binaries in PATH and use ports 15432-15434.
Best regards,
Marco- Jump to comment-1Xuneng Zhou<xunengzhou@gmail.com>Jan 29, 2026, 12:22 PM UTCHi Marco,
On Thu, Jan 29, 2026 at 1:03 AM Marco Nenciarini
<marco.nenciarini@enterprisedb.com> wrote:
Thanks for your report. I can reliably reproduce the issue on HEAD
Hi hackers,
I've encountered a bug in PostgreSQL's streaming replication where cascading
standbys fail to reconnect after falling back to archive recovery. The issue
occurs when the upstream standby uses archive-only recovery.
The standby requests streaming from the wrong WAL position (next segment boundary
instead of the current position), causing connection failures with this error:
ERROR: requested starting point 0/A000000 is ahead of the WAL flush
position of this server 0/9000000
Attached are two shell scripts that reliably reproduce the issue on PostgreSQL
17.x and 18.x:
1. reproducerrestartupstream_portable.sh - triggers by restarting upstream
2. reproducercascaderestart_portable.sh - triggers by restarting the cascade
The scripts set up this topology:
- Primary with archiving enabled
- Standby using only archive recovery (no streaming from primary)
- Cascading standby streaming from the archive-only standby
When the cascade loses its streaming connection and falls back to archive recovery,
it cannot reconnect. The issue appears to be in xlogrecovery.c around line 3880,
where the position passed to RequestXLogStreaming() determines which segment
boundary is requested.
The cascade restart reproducer shows that even restarting the cascade itself
triggers the bug, which affects routine maintenance operations.
Scripts require PostgreSQL binaries in PATH and use ports 15432-15434.
Best regards,
Marco
using your scripts. I’ve analyzed the problem and am proposing a patch
to fix it.
--- Analysis
When a cascading standby streams from an archive-only upstream:
1. The upstream's GetStandbyFlushRecPtr() returns only replay position
(no received-but-not-replayed buffer since there's no walreceiver)
2. When streaming ends and the cascade falls back to archive recovery,
it can restore WAL segments from its own archive access
3. The cascade's read position (RecPtr) advances beyond what the
upstream has replayed
4. On reconnect, the cascade requests streaming from RecPtr, which the
upstream rejects as "ahead of flush position"
--- Proposed Fix
Track the last confirmed flush position from streaming
(lastStreamedFlush) and clamp the streaming start request when it
exceeds that position:
- Same timeline: clamp to lastStreamedFlush if RecPtr > lastStreamedFlush
- Timeline switch: fall back to timeline switchpoint as safe boundary
This ensures the cascade requests from a position the upstream
definitely has, rather than assuming the upstream can serve whatever
the cascade restored locally from archive.
I’m not a fan of using sleep in TAP tests, but I haven’t found a
better way to reproduce this behavior yet.
--
Best,
Xuneng- Jump to comment-1Fujii Masao<masao.fujii@gmail.com>Jan 30, 2026, 3:13 AM UTCOn Thu, Jan 29, 2026 at 9:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Thanks for your report. I can reliably reproduce the issue on HEAD
I haven't read the patch yet, but doesn't lastStreamedFlush represent
using your scripts. I’ve analyzed the problem and am proposing a patch
to fix it.
--- Analysis
When a cascading standby streams from an archive-only upstream:
1. The upstream's GetStandbyFlushRecPtr() returns only replay position
(no received-but-not-replayed buffer since there's no walreceiver)
2. When streaming ends and the cascade falls back to archive recovery,
it can restore WAL segments from its own archive access
3. The cascade's read position (RecPtr) advances beyond what the
upstream has replayed
4. On reconnect, the cascade requests streaming from RecPtr, which the
upstream rejects as "ahead of flush position"
--- Proposed Fix
Track the last confirmed flush position from streaming
(lastStreamedFlush) and clamp the streaming start request when it
exceeds that position:
the same LSN as tliRecPtr or replayLSN (the arguments to
WaitForWALToBecomeAvailable())? If so, we may not need to introduce
a new variable to track this LSN.
The choice of which LSN is used as the replication start point has varied
over time to handle corner cases (for example, commit 06687198018).
That makes me wonder whether we should first better understand
why WaitForWALToBecomeAvailable() currently uses RecPtr as
the starting point.
BTW, with v1 patch, I was able to reproduce the issue using the following steps:
initdb -D data--------------------------------------------
mkdir arch
cat <<EOF >> data/postgresql.conf
archive_mode = on
archive_command = 'cp %p ../arch/%f'
restore_command = 'cp ../arch/%f %p'
EOF
pg_ctl -D data start
pg_basebackup -D sby1 -c fast
cp -a sby1 sby2
cat <<EOF >> sby1/postgresql.conf
port = 5433
EOF
touch sby1/standby.signal
pg_ctl -D sby1 start
cat <<EOF >> sby2/postgresql.conf
port = 5434
primary_conninfo = 'port=5433'
EOF
touch sby2/standby.signal
pg_ctl -D sby2 start
pgbench -i -s2
pg_ctl -D sby2 restart
In this case, after restarting the standby connecting to another--------------------------------------------
(cascading) standby, I observed the following error.
FATAL: could not receive data from WAL stream: ERROR: requested
starting point 0/04000000 is ahead of the WAL flush position of this
server 0/03FFE8D0
Regards,
--
Fujii Masao- Jump to comment-1Xuneng Zhou<xunengzhou@gmail.com>Jan 30, 2026, 6:01 AM UTCHi Fujii'san,
Thanks for looking into this.
On Fri, Jan 30, 2026 at 11:12 AM Fujii Masao <masao.fujii@gmail.com> wrote:On Thu, Jan 29, 2026 at 9:22 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Thanks for your report. I can reliably reproduce the issue on HEAD
using your scripts. I’ve analyzed the problem and am proposing a patch
to fix it.
--- Analysis
When a cascading standby streams from an archive-only upstream:
1. The upstream's GetStandbyFlushRecPtr() returns only replay position
(no received-but-not-replayed buffer since there's no walreceiver)
2. When streaming ends and the cascade falls back to archive recovery,
it can restore WAL segments from its own archive access
3. The cascade's read position (RecPtr) advances beyond what the
upstream has replayed
4. On reconnect, the cascade requests streaming from RecPtr, which the
upstream rejects as "ahead of flush position"
--- Proposed Fix
Track the last confirmed flush position from streaming
(lastStreamedFlush) and clamp the streaming start request when it
exceeds that position:
I think they refer to different types of LSNs. I don’t have access to my
I haven't read the patch yet, but doesn't lastStreamedFlush represent
the same LSN as tliRecPtr or replayLSN (the arguments to
WaitForWALToBecomeAvailable())? If so, we may not need to introduce
a new variable to track this LSN.
computer at the moment, but I’ll look into it and get back to you shortly.The choice of which LSN is used as the replication start point has varied
steps:
over time to handle corner cases (for example, commit 06687198018).
That makes me wonder whether we should first better understand
why WaitForWALToBecomeAvailable() currently uses RecPtr as
the starting point.
BTW, with v1 patch, I was able to reproduce the issue using the following
--------------------------------------------
initdb -D data
mkdir archcat <<EOF >> data/postgresql.conf
archive_mode = on
archive_command = 'cp %p ../arch/%f'
restore_command = 'cp ../arch/%f %p'
EOF
pg_ctl -D data start
pg_basebackup -D sby1 -c fast
cp -a sby1 sby2cat <<EOF >> sby1/postgresql.conf
port = 5433
EOF
touch sby1/standby.signal
pg_ctl -D sby1 startcat <<EOF >> sby2/postgresql.conf
port = 5434
Best,
primary_conninfo = 'port=5433'
EOF
touch sby2/standby.signal
pg_ctl -D sby2 start
pgbench -i -s2
pg_ctl -D sby2 restart
--------------------------------------------
In this case, after restarting the standby connecting to another
(cascading) standby, I observed the following error.
FATAL: could not receive data from WAL stream: ERROR: requested
starting point 0/04000000 is ahead of the WAL flush position of this
server 0/03FFE8D0
Regards,
--
Fujii Masao
Xuneng
- Jump to comment-1Fujii Masao<masao.fujii@gmail.com>Jan 29, 2026, 11:33 AM UTCOn Thu, Jan 29, 2026 at 2:03 AM Marco Nenciarini
<marco.nenciarini@enterprisedb.com> wrote:
Thanks for the report!
Hi hackers,
I've encountered a bug in PostgreSQL's streaming replication where cascading
standbys fail to reconnect after falling back to archive recovery. The issue
occurs when the upstream standby uses archive-only recovery.
The standby requests streaming from the wrong WAL position (next segment boundary
instead of the current position), causing connection failures with this error:
ERROR: requested starting point 0/A000000 is ahead of the WAL flush
position of this server 0/9000000
I was also able to reproduce this issue on the master branch.
Interestingly, I couldn't reproduce it on v11 using the same test case.
This makes me wonder whether the issue was introduced in v12 or later.
Do you see the same behavior in your environment?
Regards,
--
Fujii Masao