incremental backup breakage in BlockRefTableEntryGetBlocks

  • Jump to comment-1
    Robert Haas<robertmhaas@gmail.com>
    Apr 4, 2024, 5:38 PM UTC
    Hi,
    Yesterday, Tomas Vondra reported to me off-list that he was seeing
    what appeared to be data corruption after taking and restoring an
    incremental backup. Overnight, Jakub Wartak further experimented with
    Tomas's test case, did some initial analysis, and made it very easy to
    reproduce. I spent this morning tracking down the problem, for which I
    attach a patch.
    It doesn't take a whole lot to hit this bug. I think all you need is a
    relation that is more than 1GB in size, plus a modification to the
    second half of some 1GB file between the full backup and the
    incremental backup, plus some way of realizing after you do a restore
    that you've gotten the wrong answer. It's obviously quite embarrassing
    that this wasn't caught sooner, and to be honest I'm not sure why we
    didn't. I think the test cases pass because we avoid committing test
    cases that create large relations, but the failure to catch it during
    manual testing must be because I never worked hard enough to verify
    that the results were fully correct. Ouch.
    The actual bug is here:
    if (chunkno == stop_chunkno - 1)
    stop_offset = stop_blkno % BLOCKS_PER_CHUNK;
    Each chunk covers 64k block numbers. The caller specifies a range of
    block numbers of interest via startblkno (inclusive) and stopblkno
    (non-inclusive). We need to translate those start and stop values for
    the overall function call into start and stop values for each
    particular chunk. When we're on any chunk but the last one of
    interest, the stop offset is BLOCKSPERCHUNK i.e. we care about
    blocks all the way through the end of the chunk. The existing code
    handles that fine. If stop_blkno is somewhere in the middle of the
    last chunk, the existing code also handles that fine. But the code is
    wrong when stopblkno is a multiple of BLOCKSPER_CHUNK, because it
    then calculates a stop_offset of 0 (which is never right, because that
    would mean that we thought the chunk was relevant when it isn't)
    rather than BLOCKSPERCHUNK. So this forgets about 64k * BLCKSZ =
    512MB of potentially modified blocks whenever the size of a single
    relation file is exactly 512MB or exactly 1GB, which our users would
    probably find more than slightly distressing.
    I'm not posting Jakub's reproducer here because it was sent to me
    privately and, although I presume he would have no issue with me
    posting it, I can't actually confirm that right at the moment.
    Hopefully he'll reply with it, and double-check that this does indeed
    fix the issue. My thanks to both Tomas and Jakub.
    Regards,
    --
    Robert Haas
    EDB: http://www.enterprisedb.com
    • Jump to comment-1
      Tomas Vondra<tomas.vondra@enterprisedb.com>
      Apr 4, 2024, 7:11 PM UTC
      On 4/4/24 19:38, Robert Haas wrote:
      Hi,

      Yesterday, Tomas Vondra reported to me off-list that he was seeing
      what appeared to be data corruption after taking and restoring an
      incremental backup. Overnight, Jakub Wartak further experimented with
      Tomas's test case, did some initial analysis, and made it very easy to
      reproduce. I spent this morning tracking down the problem, for which I
      attach a patch.
      Thanks, I can confirm this fixes the issue I've observed/reported. On
      master 10 out of 10 runs failed, with the patch no failures.
      The test is very simple:
      1) init pgbench
      2) full backup
      3) run short pgbench
      4) incremental backup
      5) compare pg_dumpall on the instance vs. restored backup
      regards
      --
      Tomas Vondra
      EnterpriseDB: http://www.enterprisedb.com
      The Enterprise PostgreSQL Company
      • Jump to comment-1
        Jakub Wartak<jakub.wartak@enterprisedb.com>
        Apr 5, 2024, 6:59 AM UTC
        On Thu, Apr 4, 2024 at 9:11 PM Tomas Vondra
        <tomas.vondra@enterprisedb.com> wrote:

        On 4/4/24 19:38, Robert Haas wrote:
        Hi,

        Yesterday, Tomas Vondra reported to me off-list that he was seeing
        what appeared to be data corruption after taking and restoring an
        incremental backup. Overnight, Jakub Wartak further experimented with
        Tomas's test case, did some initial analysis, and made it very easy to
        reproduce. I spent this morning tracking down the problem, for which I
        attach a patch.

        Thanks, I can confirm this fixes the issue I've observed/reported. On
        master 10 out of 10 runs failed, with the patch no failures.
        Same here, patch fixes it on recent master. I've also run pgbench for
        ~30mins and compared master and incremental and got 0 differences,
        should be good.
        The test is very simple:

        1) init pgbench
        Tomas had magic fingers here - he used pgbench -i -s 100 which causes
        bigger relations (it wouldn't trigger for smaller -s values as Robert
        explained - now it makes full sense; in earlier tests I was using much
        smaller -s , then transitioned to other workloads (mostly append
        only), and final 100GB+/24h+ tests used mostly INSERTs rather than
        UPDATEs AFAIR). The other interesting thing is that one of the animals
        runs with configure --with-relsegsize=<somesmallvalue> (so new
        relations are full much earlier) and it was not catched there either -
        Wouldn't it be good idea to to test in src/test/recover/ like that?
        And of course i'm attaching reproducer with some braindump notes in
        case in future one hits similiar issue and wonders where to even start
        looking (it's very primitive though but might help).
        -J.
        • Jump to comment-1
          Robert Haas<robertmhaas@gmail.com>
          Apr 5, 2024, 5:51 PM UTC
          On Fri, Apr 5, 2024 at 2:59 AM Jakub Wartak
          <jakub.wartak@enterprisedb.com> wrote:
          And of course i'm attaching reproducer with some braindump notes in
          case in future one hits similiar issue and wonders where to even start
          looking (it's very primitive though but might help).
          Thanks. I've committed the patch now.
          --
          Robert Haas
          EDB: http://www.enterprisedb.com