incremental backup breakage in BlockRefTableEntryGetBlocks

  • Jump to comment-1
    robertmhaas@gmail.com2024-04-04T17:38:29+00:00
    Hi, Yesterday, Tomas Vondra reported to me off-list that he was seeing what appeared to be data corruption after taking and restoring an incremental backup. Overnight, Jakub Wartak further experimented with Tomas's test case, did some initial analysis, and made it very easy to reproduce. I spent this morning tracking down the problem, for which I attach a patch. It doesn't take a whole lot to hit this bug. I think all you need is a relation that is more than 1GB in size, plus a modification to the second half of some 1GB file between the full backup and the incremental backup, plus some way of realizing after you do a restore that you've gotten the wrong answer. It's obviously quite embarrassing that this wasn't caught sooner, and to be honest I'm not sure why we didn't. I think the test cases pass because we avoid committing test cases that create large relations, but the failure to catch it during manual testing must be because I never worked hard enough to verify that the results were fully correct. Ouch. The actual bug is here: if (chunkno == stop_chunkno - 1) stop_offset = stop_blkno % BLOCKS_PER_CHUNK; Each chunk covers 64k block numbers. The caller specifies a range of block numbers of interest via start_blkno (inclusive) and stop_blkno (non-inclusive). We need to translate those start and stop values for the overall function call into start and stop values for each particular chunk. When we're on any chunk but the last one of interest, the stop offset is BLOCKS_PER_CHUNK i.e. we care about blocks all the way through the end of the chunk. The existing code handles that fine. If stop_blkno is somewhere in the middle of the last chunk, the existing code also handles that fine. But the code is wrong when stop_blkno is a multiple of BLOCKS_PER_CHUNK, because it then calculates a stop_offset of 0 (which is never right, because that would mean that we thought the chunk was relevant when it isn't) rather than BLOCKS_PER_CHUNK. So this forgets about 64k * BLCKSZ = 512MB of potentially modified blocks whenever the size of a single relation file is exactly 512MB or exactly 1GB, which our users would probably find more than slightly distressing. I'm not posting Jakub's reproducer here because it was sent to me privately and, although I presume he would have no issue with me posting it, I can't actually confirm that right at the moment. Hopefully he'll reply with it, and double-check that this does indeed fix the issue. My thanks to both Tomas and Jakub. Regards, -- Robert Haas EDB: http://www.enterprisedb.com
    • Jump to comment-1
      tomas.vondra@enterprisedb.com2024-04-04T19:11:38+00:00
      On 4/4/24 19:38, Robert Haas wrote: > Hi, > > Yesterday, Tomas Vondra reported to me off-list that he was seeing > what appeared to be data corruption after taking and restoring an > incremental backup. Overnight, Jakub Wartak further experimented with > Tomas's test case, did some initial analysis, and made it very easy to > reproduce. I spent this morning tracking down the problem, for which I > attach a patch. > Thanks, I can confirm this fixes the issue I've observed/reported. On master 10 out of 10 runs failed, with the patch no failures. The test is very simple: 1) init pgbench 2) full backup 3) run short pgbench 4) incremental backup 5) compare pg_dumpall on the instance vs. restored backup regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
      • Jump to comment-1
        jakub.wartak@enterprisedb.com2024-04-05T06:59:44+00:00
        On Thu, Apr 4, 2024 at 9:11 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 4/4/24 19:38, Robert Haas wrote: > > Hi, > > > > Yesterday, Tomas Vondra reported to me off-list that he was seeing > > what appeared to be data corruption after taking and restoring an > > incremental backup. Overnight, Jakub Wartak further experimented with > > Tomas's test case, did some initial analysis, and made it very easy to > > reproduce. I spent this morning tracking down the problem, for which I > > attach a patch. > > > > Thanks, I can confirm this fixes the issue I've observed/reported. On > master 10 out of 10 runs failed, with the patch no failures. Same here, patch fixes it on recent master. I've also run pgbench for ~30mins and compared master and incremental and got 0 differences, should be good. > The test is very simple: > > 1) init pgbench Tomas had magic fingers here - he used pgbench -i -s 100 which causes bigger relations (it wouldn't trigger for smaller -s values as Robert explained - now it makes full sense; in earlier tests I was using much smaller -s , then transitioned to other workloads (mostly append only), and final 100GB+/24h+ tests used mostly INSERTs rather than UPDATEs AFAIR). The other interesting thing is that one of the animals runs with configure --with-relsegsize=<somesmallvalue> (so new relations are full much earlier) and it was not catched there either - Wouldn't it be good idea to to test in src/test/recover/ like that? And of course i'm attaching reproducer with some braindump notes in case in future one hits similiar issue and wonders where to even start looking (it's very primitive though but might help). -J.
        • Jump to comment-1
          robertmhaas@gmail.com2024-04-05T17:51:54+00:00
          On Fri, Apr 5, 2024 at 2:59 AM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > And of course i'm attaching reproducer with some braindump notes in > case in future one hits similiar issue and wonders where to even start > looking (it's very primitive though but might help). Thanks. I've committed the patch now. -- Robert Haas EDB: http://www.enterprisedb.com