pgsql-hackers
❮
incremental backup breakage in BlockRefTableEntryGetBlocks
- Jump to comment-1Robert Haas<robertmhaas@gmail.com>Apr 4, 2024, 5:38 PM UTCHi,
Yesterday, Tomas Vondra reported to me off-list that he was seeing
what appeared to be data corruption after taking and restoring an
incremental backup. Overnight, Jakub Wartak further experimented with
Tomas's test case, did some initial analysis, and made it very easy to
reproduce. I spent this morning tracking down the problem, for which I
attach a patch.
It doesn't take a whole lot to hit this bug. I think all you need is a
relation that is more than 1GB in size, plus a modification to the
second half of some 1GB file between the full backup and the
incremental backup, plus some way of realizing after you do a restore
that you've gotten the wrong answer. It's obviously quite embarrassing
that this wasn't caught sooner, and to be honest I'm not sure why we
didn't. I think the test cases pass because we avoid committing test
cases that create large relations, but the failure to catch it during
manual testing must be because I never worked hard enough to verify
that the results were fully correct. Ouch.
The actual bug is here:
if (chunkno == stop_chunkno - 1)
Each chunk covers 64k block numbers. The caller specifies a range ofstop_offset = stop_blkno % BLOCKS_PER_CHUNK;
block numbers of interest via startblkno (inclusive) and stopblkno
(non-inclusive). We need to translate those start and stop values for
the overall function call into start and stop values for each
particular chunk. When we're on any chunk but the last one of
interest, the stop offset is BLOCKSPERCHUNK i.e. we care about
blocks all the way through the end of the chunk. The existing code
handles that fine. If stop_blkno is somewhere in the middle of the
last chunk, the existing code also handles that fine. But the code is
wrong when stopblkno is a multiple of BLOCKSPER_CHUNK, because it
then calculates a stop_offset of 0 (which is never right, because that
would mean that we thought the chunk was relevant when it isn't)
rather than BLOCKSPERCHUNK. So this forgets about 64k * BLCKSZ =
512MB of potentially modified blocks whenever the size of a single
relation file is exactly 512MB or exactly 1GB, which our users would
probably find more than slightly distressing.
I'm not posting Jakub's reproducer here because it was sent to me
privately and, although I presume he would have no issue with me
posting it, I can't actually confirm that right at the moment.
Hopefully he'll reply with it, and double-check that this does indeed
fix the issue. My thanks to both Tomas and Jakub.
Regards,
--
Robert Haas
EDB: http://www.enterprisedb.com- Jump to comment-1Tomas Vondra<tomas.vondra@enterprisedb.com>Apr 4, 2024, 7:11 PM UTCOn 4/4/24 19:38, Robert Haas wrote:
Hi,
Thanks, I can confirm this fixes the issue I've observed/reported. On
Yesterday, Tomas Vondra reported to me off-list that he was seeing
what appeared to be data corruption after taking and restoring an
incremental backup. Overnight, Jakub Wartak further experimented with
Tomas's test case, did some initial analysis, and made it very easy to
reproduce. I spent this morning tracking down the problem, for which I
attach a patch.
master 10 out of 10 runs failed, with the patch no failures.
The test is very simple:
1) init pgbench
2) full backup
3) run short pgbench
4) incremental backup
5) compare pg_dumpall on the instance vs. restored backup
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company- Jump to comment-1Jakub Wartak<jakub.wartak@enterprisedb.com>Apr 5, 2024, 6:59 AM UTCOn Thu, Apr 4, 2024 at 9:11 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
On 4/4/24 19:38, Robert Haas wrote:Hi,
Yesterday, Tomas Vondra reported to me off-list that he was seeing
what appeared to be data corruption after taking and restoring an
incremental backup. Overnight, Jakub Wartak further experimented with
Tomas's test case, did some initial analysis, and made it very easy to
reproduce. I spent this morning tracking down the problem, for which I
attach a patch.
Same here, patch fixes it on recent master. I've also run pgbench for
Thanks, I can confirm this fixes the issue I've observed/reported. On
master 10 out of 10 runs failed, with the patch no failures.
~30mins and compared master and incremental and got 0 differences,
should be good.The test is very simple:
Tomas had magic fingers here - he used pgbench -i -s 100 which causes
1) init pgbench
bigger relations (it wouldn't trigger for smaller -s values as Robert
explained - now it makes full sense; in earlier tests I was using much
smaller -s , then transitioned to other workloads (mostly append
only), and final 100GB+/24h+ tests used mostly INSERTs rather than
UPDATEs AFAIR). The other interesting thing is that one of the animals
runs with configure --with-relsegsize=<somesmallvalue> (so new
relations are full much earlier) and it was not catched there either -
Wouldn't it be good idea to to test in src/test/recover/ like that?
And of course i'm attaching reproducer with some braindump notes in
case in future one hits similiar issue and wonders where to even start
looking (it's very primitive though but might help).
-J.- Jump to comment-1Robert Haas<robertmhaas@gmail.com>Apr 5, 2024, 5:51 PM UTCOn Fri, Apr 5, 2024 at 2:59 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:And of course i'm attaching reproducer with some braindump notes in
Thanks. I've committed the patch now.
case in future one hits similiar issue and wonders where to even start
looking (it's very primitive though but might help).
--
Robert Haas
EDB: http://www.enterprisedb.com