Disallow changing slot's failover option in transaction block

  • Jump to comment-1
    Zhijie Hou (Fujitsu)<houzj.fnst@fujitsu.com>
    Apr 16, 2024, 6:32 AM UTC
    Hi,
    Kuroda-San reported an issue off-list that:
    If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
    and rollback, only the subscription option change can be rolled back, while the
    replication slot's failover change is preserved.
    This is because ALTER SUBSCRIPTION SET (failover) command internally executes
    the replication command ALTERREPLICATIONSLOT to change the replication slot's
    failover property, but this replication command execution cannot be
    rollback.
    To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
    (failover) inside a txn block, which is also consistent to the ALTER
    SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
    patch to address this.
    Best Regards,
    Hou Zhijie
    • Jump to comment-1
      Hayato Kuroda (Fujitsu)<kuroda.hayato@fujitsu.com>
      Apr 16, 2024, 8:15 AM UTC
      Dear Hou,
      Kuroda-San reported an issue off-list that:

      If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
      and rollback, only the subscription option change can be rolled back, while the
      replication slot's failover change is preserved.

      This is because ALTER SUBSCRIPTION SET (failover) command internally
      executes
      the replication command ALTERREPLICATIONSLOT to change the replication
      slot's
      failover property, but this replication command execution cannot be
      rollback.

      To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
      (failover) inside a txn block, which is also consistent to the ALTER
      SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
      patch to address this.
      Thanks for posting the patch, the fix is same as my expectation.
      Also, should we add the restriction to the doc? I feel [1] can be updated.
      [1]:https://www.postgresql.org/docs/devel/sql-altersubscription.html#SQL-ALTERSUBSCRIPTION-PARAMS-SET
      Best Regards,
      Hayato Kuroda
      FUJITSU LIMITED
      https://www.fujitsu.com/
      • Jump to comment-1
        shveta malik<shveta.malik@gmail.com>
        Apr 16, 2024, 11:36 AM UTC
        On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu)
        <kuroda.hayato@fujitsu.com> wrote:

        Dear Hou,
        Kuroda-San reported an issue off-list that:

        If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
        and rollback, only the subscription option change can be rolled back, while the
        replication slot's failover change is preserved.

        This is because ALTER SUBSCRIPTION SET (failover) command internally
        executes
        the replication command ALTERREPLICATIONSLOT to change the replication
        slot's
        failover property, but this replication command execution cannot be
        rollback.

        To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
        (failover) inside a txn block, which is also consistent to the ALTER
        SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
        patch to address this.

        Thanks for posting the patch, the fix is same as my expectation.
        Also, should we add the restriction to the doc? I feel [1] can be updated.
        +1.
        Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
        because we call alterreplicationslot in CREATE SUB as well, for the
        case when slotname is provided and createslot=false. But the tricky
        part is we call alterreplicationslot() when creating subscription
        for both failover=true and false. That means if we want to fix it on
        the similar line of ALTER SUB, we have to disallow user from executing
        the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
        to break some existing use cases. (previously, user can execute such a
        command inside a txn block).
        So, we need to think if there are better ways to fix it. After
        discussion with Hou-San offlist, here are some ideas:
        1. do not alter replication slot's failover option when CREATE
        SUBSCRIPTION WITH failover=false. This means we alter replication
        slot only when failover is set to true. And thus we can disallow
        CREATE SUB WITH (slot_name =xx, failover=true, create_slot=false)
        inside a txn block.
        This option allows user to run CREATE-SUB(create_slot=false) with
        failover=false in txn block like earlier. But on the downside, it
        makes the behavior inconsistent for otherwise simpler option like
        failover, i.e. with failover=true, CREATE SUB is not allowed in txn
        block while with failover=false, it is allowed. It makes it slightly
        complex to be understood by user.
        2. let's not disallow CREATE SUB in txn block as earlier, just don't
        perform internal alter-failover during CREATE SUB for existing
        slots(createslot=false, slotname=xx) i.e. when create_slot is
        false, we will ignore failover parameter of CREATE SUB and it is
        user's responsibility to set it appropriately using ALTER SUB cmd. For
        create_slot=true, the behaviour of CREATE-SUB remains same as earlier.
        This option does not add new restriction for CREATE SUB wrt txn block.
        In context of failover with create_slot=false, we already have a
        similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
        SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
        failover by himself. CREAT SUB can also be documented in similar way.
        This seems simpler to be understood considering existing ALTER SUB's
        behavior as well. Plus, this will make CREATE-SUB code slightly
        simpler and thus easily manageable.
        3. add a alter_slot option for CREATE SUBSCRIPTION, we can only alter
        the slot's failover if alter_slot=true. And so we can disallow CREATE
        SUB WITH (slotname =xx, alterslot=true) inside a txn block.
        This seems a clean way, as everything will be as per user's consent
        based on alter_slot parameter. But on the downside, this will need
        introducing additional parameter and also adding new restriction of
        running CREATE-sub in txn block for a specific case.
        4. Don't alter replication in subscription DDLs. Instead, try to alter
        replication slot's failover in the apply worker. This means we need to
        execute alterreplicationslot each time before starting streaming in
        the apply worker.
        This does not seem appealing to execute alterreplicationslot
        everytime the apply worker starts. But if others think it as a better
        option, it can be further analyzed.
        Currently, our preference is option 2, as that looks a clean solution
        and also aligns with ALTER-SUB behavior which is already documented.
        Thoughts?
        --------------------------------
        Note that we could not refer to the design of two_phase here, because
        two_phase can be considered as a streaming option, so it's fine to
        change the twophase along with STARTREPLICATION command. (the
        two_phase is not changed in subscription DDLs, but get changed in
        START_REPLICATION command). But the failover is closely related to a
        replication slot itself.
        --------------------------------
        Thanks
        Shveta
        • Jump to comment-1
          Hayato Kuroda (Fujitsu)<kuroda.hayato@fujitsu.com>
          Apr 18, 2024, 6:13 AM UTC
          Dear Shveta,
          Sorry for delay response. I missed your post.
          +1.

          Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
          because we call alterreplicationslot in CREATE SUB as well, for the
          case when slotname is provided and createslot=false. But the tricky
          part is we call alterreplicationslot() when creating subscription
          for both failover=true and false. That means if we want to fix it on
          the similar line of ALTER SUB, we have to disallow user from executing
          the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
          to break some existing use cases. (previously, user can execute such a
          command inside a txn block).

          So, we need to think if there are better ways to fix it. After
          discussion with Hou-San offlist, here are some ideas:
          1. do not alter replication slot's failover option when CREATE
          SUBSCRIPTION WITH failover=false. This means we alter replication
          slot only when failover is set to true. And thus we can disallow
          CREATE SUB WITH (slotname =xx, failover=true, createslot=false)
          inside a txn block.

          This option allows user to run CREATE-SUB(create_slot=false) with
          failover=false in txn block like earlier. But on the downside, it
          makes the behavior inconsistent for otherwise simpler option like
          failover, i.e. with failover=true, CREATE SUB is not allowed in txn
          block while with failover=false, it is allowed. It makes it slightly
          complex to be understood by user.

          2. let's not disallow CREATE SUB in txn block as earlier, just don't
          perform internal alter-failover during CREATE SUB for existing
          slots(createslot=false, slotname=xx) i.e. when create_slot is
          false, we will ignore failover parameter of CREATE SUB and it is
          user's responsibility to set it appropriately using ALTER SUB cmd. For
          create_slot=true, the behaviour of CREATE-SUB remains same as earlier.

          This option does not add new restriction for CREATE SUB wrt txn block.
          In context of failover with create_slot=false, we already have a
          similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
          SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
          failover by himself. CREAT SUB can also be documented in similar way.
          This seems simpler to be understood considering existing ALTER SUB's
          behavior as well. Plus, this will make CREATE-SUB code slightly
          simpler and thus easily manageable.

          3. add a alter_slot option for CREATE SUBSCRIPTION, we can only alter
          the slot's failover if alter_slot=true. And so we can disallow CREATE
          SUB WITH (slotname =xx, alterslot=true) inside a txn block.

          This seems a clean way, as everything will be as per user's consent
          based on alter_slot parameter. But on the downside, this will need
          introducing additional parameter and also adding new restriction of
          running CREATE-sub in txn block for a specific case.

          4. Don't alter replication in subscription DDLs. Instead, try to alter
          replication slot's failover in the apply worker. This means we need to
          execute alterreplicationslot each time before starting streaming in
          the apply worker.

          This does not seem appealing to execute alterreplicationslot
          everytime the apply worker starts. But if others think it as a better
          option, it can be further analyzed.
          Thanks for describing, I also prefer 2, because it seems bit strange that
          CREATE statement leads ALTER.
          Currently, our preference is option 2, as that looks a clean solution
          and also aligns with ALTER-SUB behavior which is already documented.
          Thoughts?

          --------------------------------
          Note that we could not refer to the design of two_phase here, because
          two_phase can be considered as a streaming option, so it's fine to
          change the twophase along with STARTREPLICATION command. (the
          two_phase is not changed in subscription DDLs, but get changed in
          START_REPLICATION command). But the failover is closely related to a
          replication slot itself.
          --------------------------------
          Sorry, I cannot find statements. Where did you refer?
          Best Regards,
          Hayato Kuroda
          FUJITSU LIMITED
          https://www.fujitsu.com/
          • Jump to comment-1
            shveta malik<shveta.malik@gmail.com>
            Apr 18, 2024, 9:36 AM UTC
            On Thu, Apr 18, 2024 at 11:40 AM Hayato Kuroda (Fujitsu)
            <kuroda.hayato@fujitsu.com> wrote:

            Dear Shveta,

            Sorry for delay response. I missed your post.
            +1.

            Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
            because we call alterreplicationslot in CREATE SUB as well, for the
            case when slotname is provided and createslot=false. But the tricky
            part is we call alterreplicationslot() when creating subscription
            for both failover=true and false. That means if we want to fix it on
            the similar line of ALTER SUB, we have to disallow user from executing
            the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
            to break some existing use cases. (previously, user can execute such a
            command inside a txn block).

            So, we need to think if there are better ways to fix it. After
            discussion with Hou-San offlist, here are some ideas:
            1. do not alter replication slot's failover option when CREATE
            SUBSCRIPTION WITH failover=false. This means we alter replication
            slot only when failover is set to true. And thus we can disallow
            CREATE SUB WITH (slotname =xx, failover=true, createslot=false)
            inside a txn block.

            This option allows user to run CREATE-SUB(create_slot=false) with
            failover=false in txn block like earlier. But on the downside, it
            makes the behavior inconsistent for otherwise simpler option like
            failover, i.e. with failover=true, CREATE SUB is not allowed in txn
            block while with failover=false, it is allowed. It makes it slightly
            complex to be understood by user.

            2. let's not disallow CREATE SUB in txn block as earlier, just don't
            perform internal alter-failover during CREATE SUB for existing
            slots(createslot=false, slotname=xx) i.e. when create_slot is
            false, we will ignore failover parameter of CREATE SUB and it is
            user's responsibility to set it appropriately using ALTER SUB cmd. For
            create_slot=true, the behaviour of CREATE-SUB remains same as earlier.

            This option does not add new restriction for CREATE SUB wrt txn block.
            In context of failover with create_slot=false, we already have a
            similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
            SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
            failover by himself. CREAT SUB can also be documented in similar way.
            This seems simpler to be understood considering existing ALTER SUB's
            behavior as well. Plus, this will make CREATE-SUB code slightly
            simpler and thus easily manageable.

            3. add a alter_slot option for CREATE SUBSCRIPTION, we can only alter
            the slot's failover if alter_slot=true. And so we can disallow CREATE
            SUB WITH (slotname =xx, alterslot=true) inside a txn block.

            This seems a clean way, as everything will be as per user's consent
            based on alter_slot parameter. But on the downside, this will need
            introducing additional parameter and also adding new restriction of
            running CREATE-sub in txn block for a specific case.

            4. Don't alter replication in subscription DDLs. Instead, try to alter
            replication slot's failover in the apply worker. This means we need to
            execute alterreplicationslot each time before starting streaming in
            the apply worker.

            This does not seem appealing to execute alterreplicationslot
            everytime the apply worker starts. But if others think it as a better
            option, it can be further analyzed.

            Thanks for describing, I also prefer 2, because it seems bit strange that
            CREATE statement leads ALTER.
            Thanks for feedback.
            Currently, our preference is option 2, as that looks a clean solution
            and also aligns with ALTER-SUB behavior which is already documented.
            Thoughts?

            --------------------------------
            Note that we could not refer to the design of two_phase here, because
            two_phase can be considered as a streaming option, so it's fine to
            change the twophase along with STARTREPLICATION command. (the
            two_phase is not changed in subscription DDLs, but get changed in
            START_REPLICATION command). But the failover is closely related to a
            replication slot itself.
            --------------------------------
            Sorry for causing confusion. This is not the statement which is
            documented one, this was an additional note to support our analysis.
            Sorry, I cannot find statements. Where did you refer?
            When I said that option 2 aligns with ALTER-SUB documented behaviour,
            I meant the doc described in [1] stating "When altering the slot_name,
            the failover and two_phase property values of the named slot may
            differ from the counterpart failover and two_phase parameters
            specified in the subscription...."
            [1]: https://www.postgresql.org/docs/devel/sql-altersubscription.html#SQL-ALTERSUBSCRIPTION-PARAMS-SET
            thanks
            Shveta
        • Jump to comment-1
          Amit Kapila<amit.kapila16@gmail.com>
          Apr 18, 2024, 5:52 AM UTC
          On Tue, Apr 16, 2024 at 5:06 PM shveta malik <shveta.malik@gmail.com> wrote:

          On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu)
          <kuroda.hayato@fujitsu.com> wrote:

          Dear Hou,
          Kuroda-San reported an issue off-list that:

          If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
          and rollback, only the subscription option change can be rolled back, while the
          replication slot's failover change is preserved.

          This is because ALTER SUBSCRIPTION SET (failover) command internally
          executes
          the replication command ALTERREPLICATIONSLOT to change the replication
          slot's
          failover property, but this replication command execution cannot be
          rollback.

          To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
          (failover) inside a txn block, which is also consistent to the ALTER
          SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
          patch to address this.

          Thanks for posting the patch, the fix is same as my expectation.
          Also, should we add the restriction to the doc? I feel [1] can be updated.

          +1.

          Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
          because we call alterreplicationslot in CREATE SUB as well, for the
          case when slotname is provided and createslot=false. But the tricky
          part is we call alterreplicationslot() when creating subscription
          for both failover=true and false. That means if we want to fix it on
          the similar line of ALTER SUB, we have to disallow user from executing
          the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
          to break some existing use cases. (previously, user can execute such a
          command inside a txn block).

          So, we need to think if there are better ways to fix it. After
          discussion with Hou-San offlist, here are some ideas:

          1. do not alter replication slot's failover option when CREATE
          SUBSCRIPTION WITH failover=false. This means we alter replication
          slot only when failover is set to true. And thus we can disallow
          CREATE SUB WITH (slotname =xx, failover=true, createslot=false)
          inside a txn block.

          This option allows user to run CREATE-SUB(create_slot=false) with
          failover=false in txn block like earlier. But on the downside, it
          makes the behavior inconsistent for otherwise simpler option like
          failover, i.e. with failover=true, CREATE SUB is not allowed in txn
          block while with failover=false, it is allowed. It makes it slightly
          complex to be understood by user.

          2. let's not disallow CREATE SUB in txn block as earlier, just don't
          perform internal alter-failover during CREATE SUB for existing
          slots(createslot=false, slotname=xx) i.e. when create_slot is
          false, we will ignore failover parameter of CREATE SUB and it is
          user's responsibility to set it appropriately using ALTER SUB cmd. For
          create_slot=true, the behaviour of CREATE-SUB remains same as earlier.

          This option does not add new restriction for CREATE SUB wrt txn block.
          In context of failover with create_slot=false, we already have a
          similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
          SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
          failover by himself. CREAT SUB can also be documented in similar way.
          This seems simpler to be understood considering existing ALTER SUB's
          behavior as well. Plus, this will make CREATE-SUB code slightly
          simpler and thus easily manageable.
          +1 for option 2 as it sounds logical to me and consistent with ALTER
          SUBSCRIPTION. BTW, IIUC, you are referring to: "When altering the
          slotname, the failover and twophase property values of the named
          slot may differ from the counterpart failover and two_phase parameters
          specified in the subscription. When creating the slot, ensure the slot
          properties failover and two_phase match their counterpart parameters
          of the subscription." in docs [1], right?
          --
          With Regards,
          Amit Kapila.
          • Jump to comment-1
            Zhijie Hou (Fujitsu)<houzj.fnst@fujitsu.com>
            Apr 19, 2024, 12:39 AM UTC
            On Thursday, April 18, 2024 1:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
            On Tue, Apr 16, 2024 at 5:06 PM shveta malik <shveta.malik@gmail.com>
            wrote:

            On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu)
            <kuroda.hayato@fujitsu.com> wrote:

            Dear Hou,
            Kuroda-San reported an issue off-list that:

            If user execute ALTER SUBSCRIPTION SET (failover) command inside a
            txn block and rollback, only the subscription option change can be
            rolled back, while the replication slot's failover change is preserved.

            This is because ALTER SUBSCRIPTION SET (failover) command
            internally executes the replication command ALTERREPLICATIONSLOT
            to change the replication slot's failover property, but this
            replication command execution cannot be rollback.

            To fix it, I think we can prevent user from executing ALTER
            SUBSCRIPTION set
            (failover) inside a txn block, which is also consistent to the
            ALTER SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach
            a
            small patch to address this.

            Thanks for posting the patch, the fix is same as my expectation.
            Also, should we add the restriction to the doc? I feel [1] can be updated.

            +1.

            Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
            because we call alterreplicationslot in CREATE SUB as well, for the
            case when slotname is provided and createslot=false. But the tricky
            part is we call alterreplicationslot() when creating subscription
            for both failover=true and false. That means if we want to fix it on
            the similar line of ALTER SUB, we have to disallow user from executing
            the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
            to break some existing use cases. (previously, user can execute such a
            command inside a txn block).

            So, we need to think if there are better ways to fix it. After
            discussion with Hou-San offlist, here are some ideas:

            1. do not alter replication slot's failover option when CREATE
            SUBSCRIPTION WITH failover=false. This means we alter replication
            slot only when failover is set to true. And thus we can disallow
            CREATE SUB WITH (slotname =xx, failover=true, createslot=false)
            inside a txn block.

            This option allows user to run CREATE-SUB(create_slot=false) with
            failover=false in txn block like earlier. But on the downside, it
            makes the behavior inconsistent for otherwise simpler option like
            failover, i.e. with failover=true, CREATE SUB is not allowed in txn
            block while with failover=false, it is allowed. It makes it slightly
            complex to be understood by user.

            2. let's not disallow CREATE SUB in txn block as earlier, just don't
            perform internal alter-failover during CREATE SUB for existing
            slots(createslot=false, slotname=xx) i.e. when create_slot is
            false, we will ignore failover parameter of CREATE SUB and it is
            user's responsibility to set it appropriately using ALTER SUB cmd. For
            create_slot=true, the behaviour of CREATE-SUB remains same as earlier.

            This option does not add new restriction for CREATE SUB wrt txn block.
            In context of failover with create_slot=false, we already have a
            similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
            SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
            failover by himself. CREAT SUB can also be documented in similar way.
            This seems simpler to be understood considering existing ALTER SUB's
            behavior as well. Plus, this will make CREATE-SUB code slightly
            simpler and thus easily manageable.

            +1 for option 2 as it sounds logical to me and consistent with ALTER
            SUBSCRIPTION.
            +1.
            Here is V2 patch which includes the changes for CREATE SUBSCRIPTION as
            suggested. Since we don't connect pub to alter slot when (create_slot=false)
            anymore, the restriction that disallows failover=true when connect=false is
            also removed.
            Best Regards,
            Hou zj
            • Jump to comment-1
              shveta malik<shveta.malik@gmail.com>
              Apr 19, 2024, 10:14 AM UTC
              On Fri, Apr 19, 2024 at 6:09 AM Zhijie Hou (Fujitsu)
              <houzj.fnst@fujitsu.com> wrote:

              Here is V2 patch which includes the changes for CREATE SUBSCRIPTION as
              suggested. Since we don't connect pub to alter slot when (create_slot=false)
              anymore, the restriction that disallows failover=true when connect=false is
              also removed.
              Thanks for the patch. I feel getSubscription() also needs to get
              'subfailover' option independent of dopt->binary_upgrade i.e. it needs
              similar changes as that of dumpSubscription(). I tested pg_dump,
              currently it is not dumping failover parameter for failover-enabled
              subscriptions, perhaps due to the same bug. Create-sub and Alter-sub
              changes look good and work well.
              thanks
              Shveta
            • Jump to comment-1
              Bertrand Drouvot<bertranddrouvot.pg@gmail.com>
              Apr 19, 2024, 8:57 AM UTC
              Hi,
              On Fri, Apr 19, 2024 at 12:39:40AM +0000, Zhijie Hou (Fujitsu) wrote:
              Here is V2 patch which includes the changes for CREATE SUBSCRIPTION as
              suggested. Since we don't connect pub to alter slot when (create_slot=false)
              anymore, the restriction that disallows failover=true when connect=false is
              also removed.
              Thanks!
              + specified in the subscription. When creating the slot, ensure the slot
              + property <literal>failover</literal> matches the counterpart parameter
              + of the subscription.
              The slot could be created before or after the subscription is created, so I think
              it needs a bit of rewording (as here it sounds like the sub is already created),
              , something like?
              "Always ensure the slot property <literal>failover</literal> matches the
              counterpart parameter of the subscription and vice versa."
              Regards,
              --
              Bertrand Drouvot
              PostgreSQL Contributors Team
              RDS Open Source Databases
              Amazon Web Services: https://aws.amazon.com
            • Jump to comment-1
              Hayato Kuroda (Fujitsu)<kuroda.hayato@fujitsu.com>
              Apr 19, 2024, 2:55 AM UTC
              Dear Hou,
              Thanks for updating the patch! Let me confirm the content.
              In your patch, the pg_dump.c was updated. IIUC the main reason was that
              pg_restore executes some queries as single transactions so that ALTER
              SUBSCRIPTION cannot be done, right?
              Also, failover was synchronized only when we were in the upgrade mode, but
              your patch seems to remove the condition. Can you clarify the reason?
              Other than that, the patch LGTM.
              Best Regards,
              Hayato Kuroda
              FUJITSU LIMITED
              https://www.fujitsu.com/
              • Jump to comment-1
                Zhijie Hou (Fujitsu)<houzj.fnst@fujitsu.com>
                Apr 22, 2024, 12:28 AM UTC
                On Friday, April 19, 2024 10:54 AM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
                In your patch, the pg_dump.c was updated. IIUC the main reason was that
                pg_restore executes some queries as single transactions so that ALTER
                SUBSCRIPTION cannot be done, right?
                Yes, and please see below for other reasons.
                Also, failover was synchronized only when we were in the upgrade mode, but
                your patch seems to remove the condition. Can you clarify the reason?
                We used ALTER SUBSCRIPTION in upgrade mode because it was not allowed to use
                connect=false and failover=true together when CREATE SUBSCRIPTION. But since we
                don't have this restriction anymore(we don't alter slot when creating sub
                anymore), we can directly specify failover in CREATE SUBSCRIPTION and do that
                in non-upgrade mode as well.
                Attach the V3 patch which also addressed Shveta[1] and Bertrand[2]'s comments.
                [1] https://www.postgresql.org/message-id/CAJpy0uD3YOeDg-tTCUi3EZ8vznRDfDqO_k6LepJpXUV1Z_%3DgkA%40mail.gmail.com
                [2] https://www.postgresql.org/message-id/ZiIxuaiINsuaWuDK%40ip-10-97-1-34.eu-west-3.compute.internal
                Best Regards,
                Hou zj
                • Jump to comment-1
                  shveta malik<shveta.malik@gmail.com>
                  Apr 22, 2024, 6:02 AM UTC
                  On Mon, Apr 22, 2024 at 5:57 AM Zhijie Hou (Fujitsu)
                  <houzj.fnst@fujitsu.com> wrote:
                  On Friday, April 19, 2024 10:54 AM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
                  In your patch, the pg_dump.c was updated. IIUC the main reason was that
                  pg_restore executes some queries as single transactions so that ALTER
                  SUBSCRIPTION cannot be done, right?

                  Yes, and please see below for other reasons.
                  Also, failover was synchronized only when we were in the upgrade mode, but
                  your patch seems to remove the condition. Can you clarify the reason?

                  We used ALTER SUBSCRIPTION in upgrade mode because it was not allowed to use
                  connect=false and failover=true together when CREATE SUBSCRIPTION. But since we
                  don't have this restriction anymore(we don't alter slot when creating sub
                  anymore), we can directly specify failover in CREATE SUBSCRIPTION and do that
                  in non-upgrade mode as well.

                  Attach the V3 patch which also addressed Shveta[1] and Bertrand[2]'s comments.
                  Tested the patch, works well.
                  thanks
                  Shveta
                  • Jump to comment-1
                    Bertrand Drouvot<bertranddrouvot.pg@gmail.com>
                    Apr 22, 2024, 9:01 AM UTC
                    Hi,
                    On Mon, Apr 22, 2024 at 11:32:14AM +0530, shveta malik wrote:
                    On Mon, Apr 22, 2024 at 5:57 AM Zhijie Hou (Fujitsu)
                    <houzj.fnst@fujitsu.com> wrote:
                    Attach the V3 patch which also addressed Shveta[1] and Bertrand[2]'s comments.
                    Thanks!
                    Tested the patch, works well.
                    Same here.
                    Regards,
                    --
                    Bertrand Drouvot
                    PostgreSQL Contributors Team
                    RDS Open Source Databases
                    Amazon Web Services: https://aws.amazon.com
                    • Jump to comment-1
                      Amit Kapila<amit.kapila16@gmail.com>
                      Apr 24, 2024, 3:27 AM UTC
                      On Mon, Apr 22, 2024 at 2:31 PM Bertrand Drouvot
                      <bertranddrouvot.pg@gmail.com> wrote:

                      On Mon, Apr 22, 2024 at 11:32:14AM +0530, shveta malik wrote:
                      On Mon, Apr 22, 2024 at 5:57 AM Zhijie Hou (Fujitsu)
                      <houzj.fnst@fujitsu.com> wrote:
                      Attach the V3 patch which also addressed Shveta[1] and Bertrand[2]'s comments.

                      Thanks!
                      Tested the patch, works well.

                      Same here.
                      Pushed!
                      --
                      With Regards,
                      Amit Kapila.
          • Jump to comment-1
            Michael Paquier<michael@paquier.xyz>
            Apr 18, 2024, 11:35 PM UTC
            On Thu, Apr 18, 2024 at 11:22:24AM +0530, Amit Kapila wrote:
            +1 for option 2 as it sounds logical to me and consistent with ALTER
            SUBSCRIPTION. BTW, IIUC, you are referring to: "When altering the
            slotname, the failover and twophase property values of the named
            slot may differ from the counterpart failover and two_phase parameters
            specified in the subscription. When creating the slot, ensure the slot
            properties failover and two_phase match their counterpart parameters
            of the subscription." in docs [1], right?
            FWIW, I'd also favor option 2, mostly on consistency ground as it
            would offer a better user-experience. On top of that, you're saying
            that may lead to some simplifications in the CREATE path. Without a
            patch, it's hard to tell, though.
            As far as I can see, this is not tracked as an open item and it should
            be. So I have added one.
            --
            Michael
          • Jump to comment-1
            shveta malik<shveta.malik@gmail.com>
            Apr 18, 2024, 6:09 AM UTC
            On Thu, Apr 18, 2024 at 11:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
            On Tue, Apr 16, 2024 at 5:06 PM shveta malik <shveta.malik@gmail.com> wrote:

            On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu)
            <kuroda.hayato@fujitsu.com> wrote:

            Dear Hou,
            Kuroda-San reported an issue off-list that:

            If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
            and rollback, only the subscription option change can be rolled back, while the
            replication slot's failover change is preserved.

            This is because ALTER SUBSCRIPTION SET (failover) command internally
            executes
            the replication command ALTERREPLICATIONSLOT to change the replication
            slot's
            failover property, but this replication command execution cannot be
            rollback.

            To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
            (failover) inside a txn block, which is also consistent to the ALTER
            SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
            patch to address this.

            Thanks for posting the patch, the fix is same as my expectation.
            Also, should we add the restriction to the doc? I feel [1] can be updated.

            +1.

            Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
            because we call alterreplicationslot in CREATE SUB as well, for the
            case when slotname is provided and createslot=false. But the tricky
            part is we call alterreplicationslot() when creating subscription
            for both failover=true and false. That means if we want to fix it on
            the similar line of ALTER SUB, we have to disallow user from executing
            the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
            to break some existing use cases. (previously, user can execute such a
            command inside a txn block).

            So, we need to think if there are better ways to fix it. After
            discussion with Hou-San offlist, here are some ideas:

            1. do not alter replication slot's failover option when CREATE
            SUBSCRIPTION WITH failover=false. This means we alter replication
            slot only when failover is set to true. And thus we can disallow
            CREATE SUB WITH (slotname =xx, failover=true, createslot=false)
            inside a txn block.

            This option allows user to run CREATE-SUB(create_slot=false) with
            failover=false in txn block like earlier. But on the downside, it
            makes the behavior inconsistent for otherwise simpler option like
            failover, i.e. with failover=true, CREATE SUB is not allowed in txn
            block while with failover=false, it is allowed. It makes it slightly
            complex to be understood by user.

            2. let's not disallow CREATE SUB in txn block as earlier, just don't
            perform internal alter-failover during CREATE SUB for existing
            slots(createslot=false, slotname=xx) i.e. when create_slot is
            false, we will ignore failover parameter of CREATE SUB and it is
            user's responsibility to set it appropriately using ALTER SUB cmd. For
            create_slot=true, the behaviour of CREATE-SUB remains same as earlier.

            This option does not add new restriction for CREATE SUB wrt txn block.
            In context of failover with create_slot=false, we already have a
            similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
            SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
            failover by himself. CREAT SUB can also be documented in similar way.
            This seems simpler to be understood considering existing ALTER SUB's
            behavior as well. Plus, this will make CREATE-SUB code slightly
            simpler and thus easily manageable.

            +1 for option 2 as it sounds logical to me and consistent with ALTER
            SUBSCRIPTION. BTW, IIUC, you are referring to: "When altering the
            slotname, the failover and twophase property values of the named
            slot may differ from the counterpart failover and two_phase parameters
            specified in the subscription. When creating the slot, ensure the slot
            properties failover and two_phase match their counterpart parameters
            of the subscription." in docs [1], right?
            Yes. Here:
            https://www.postgresql.org/docs/devel/sql-altersubscription.html#SQL-ALTERSUBSCRIPTION-PARAMS-SET
            thanks
            Shveta
    • Jump to comment-1
      Bertrand Drouvot<bertranddrouvot.pg@gmail.com>
      Apr 16, 2024, 7:29 AM UTC
      Hi,
      On Tue, Apr 16, 2024 at 06:32:11AM +0000, Zhijie Hou (Fujitsu) wrote:
      Hi,

      Kuroda-San reported an issue off-list that:

      If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block
      and rollback, only the subscription option change can be rolled back, while the
      replication slot's failover change is preserved.
      Nice catch, thanks!
      To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set
      (failover) inside a txn block, which is also consistent to the ALTER
      SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small
      patch to address this.
      Agree. The patch looks pretty straightforward to me. Worth to add this
      case in the doc? (where we already mention that "Commands ALTER SUBSCRIPTION ...
      REFRESH PUBLICATION and ALTER SUBSCRIPTION ... {SET|ADD|DROP} PUBLICATION ...
      with refresh option as true cannot be executed inside a transaction block"
      Regards,
      --
      Bertrand Drouvot
      PostgreSQL Contributors Team
      RDS Open Source Databases
      Amazon Web Services: https://aws.amazon.com