Cleanup: remove unused fields from nodes

  • Jump to comment-1
    Matthias van de Meent<boekewurm+postgres@gmail.com>
    Apr 22, 2024, 3:21 PM UTC
    Hi,
    While working on the 'reduce nodeToString output' patch, I noticed
    that my IDE marked one field in the TidScanState node as 'unused'.
    After checking this seemed to be accurate, and I found a few more such
    fields in Node structs.
    PFA some patches that clean this up: 0001 is plain removal of fields
    that are not accessed anywhere anymore, 0002 and up clean up fields
    that are effectively write-only, with no effective use inside
    PostgreSQL's own code, and no effective usage found on Debian code
    search, nor Github code search.
    I'm quite confident about the correctness of patches 1 and 3 (no usage
    at all, and newly introduced with no meaningful uses), while patches
    2, 4, and 5 could be considered 'as designed'.
    For those last ones I have no strong opinion for removal or against
    keeping them around, this is just to point out we can remove the
    fields, as nobody seems to be using them.
    /cc Tom Lane and Etsuro Fujita: 2 and 4 were introduced with your
    commit afb9249d in 2015.
    /cc Amit Kapila: 0003 was introduced with your spearheaded commit
    6185c973 this year.
    Kind regards,
    Matthias van de Meent
    0001 removes two old fields that are not in use anywhere anymore, but
    at some point these were used.
    0002/0004 remove fields in ExecRowMark which were added for FDWs to
    use, but there are no FDWs which use this: I could only find two FDWs
    who implement RefetchForeignRow, one being BlackHoleFDW, and the other
    a no-op implementation in kafka_fdw [0]. We also don't seem to have
    any testing on this feature.
    0003 drops the input_finfo field on the new JsonExprState struct. It
    wasn't read anywhere, so keeping it around makes little sense IMO.
    0005 drops field DeallocateStmt.isall: the value of the field is
    implied by !name, and that field was used as such.
    [0] https://github.com/cohenjo/kafka_fdw/blob/master/src/kafka_fdw.c#L1793
    • Jump to comment-1
      Tom Lane<tgl@sss.pgh.pa.us>
      Apr 22, 2024, 3:41 PM UTC
      Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
      0001 removes two old fields that are not in use anywhere anymore, but
      at some point these were used.
      +1. They're not being initialized, so they're useless and confusing.
      0002/0004 remove fields in ExecRowMark which were added for FDWs to
      use, but there are no FDWs which use this: I could only find two FDWs
      who implement RefetchForeignRow, one being BlackHoleFDW, and the other
      a no-op implementation in kafka_fdw [0]. We also don't seem to have
      any testing on this feature.
      I'm kind of down on removing either of these. ermExtra is explicitly
      intended for extensions to use, and just because we haven't found any
      users doesn't mean there aren't any, or might not be next week.
      Similarly, ermActive seems like it's at least potentially useful:
      is there another way for onlookers to discover that state?
      0003 drops the input_finfo field on the new JsonExprState struct. It
      wasn't read anywhere, so keeping it around makes little sense IMO.
      +1. The adjacent input_fcinfo field has this info if anyone needs it.
      0005 drops field DeallocateStmt.isall: the value of the field is
      implied by !name, and that field was used as such.
      Seems reasonable.
      I think it would be a good idea to push 0003 for v17, just so nobody
      grows an unnecessary dependency on that field. 0001 and 0005 could
      be left for v18, but on the other hand they're so trivial that it
      could also be sensible to just push them to get them out of the way.
      		regards, tom lane
      • Jump to comment-1
        Matthias van de Meent<boekewurm+postgres@gmail.com>
        Apr 22, 2024, 4:46 PM UTC
        On Mon, 22 Apr 2024 at 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
        Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
        0002/0004 remove fields in ExecRowMark which were added for FDWs to
        use, but there are no FDWs which use this: I could only find two FDWs
        who implement RefetchForeignRow, one being BlackHoleFDW, and the other
        a no-op implementation in kafka_fdw [0]. We also don't seem to have
        any testing on this feature.

        I'm kind of down on removing either of these. ermExtra is explicitly
        intended for extensions to use, and just because we haven't found any
        users doesn't mean there aren't any, or might not be next week.
        That's a good point, and also why I wasn't 100% sure removing it was a
        good idea. I'm not quite sure why this would be used (rather than the
        internal state of the FDW, or no state at all), but haven't looked
        very deep into it either, so I'm quite fine with not channging that.
        Similarly, ermActive seems like it's at least potentially useful:
        is there another way for onlookers to discover that state?
        The ermActive field is always true when RefetchForeignRow is called
        (in ExecLockRows(), in nodeLockRows.c), and we don't seem to care
        about the value of the field afterwards. Additionally, we always set
        erm->curCtid to a valid value when ermActive is also first set in that
        code path.
        In all, it feels like a duplicative field with no real uses inside
        PostgreSQL itself. If an extension (FDW) needs it, it should probably
        use ermExtra instead, as ermActive seemingly doesn't carry any
        meaningful value into the FDW call.
        I think it would be a good idea to push 0003 for v17, just so nobody
        grows an unnecessary dependency on that field. 0001 and 0005 could
        be left for v18, but on the other hand they're so trivial that it
        could also be sensible to just push them to get them out of the way.
        Beta 1 scheduled to be released for quite some time, so I don't think
        there are any problems with fixing these kinds of minor issues in the
        provisional ABI for v17.
        Kind regards,
        Matthias van de Meent
        • Jump to comment-1
          Michael Paquier<michael@paquier.xyz>
          Apr 23, 2024, 1:15 AM UTC
          On Mon, Apr 22, 2024 at 06:46:27PM +0200, Matthias van de Meent wrote:
          On Mon, 22 Apr 2024 at 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
          Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
          0002/0004 remove fields in ExecRowMark which were added for FDWs to
          use, but there are no FDWs which use this: I could only find two FDWs
          who implement RefetchForeignRow, one being BlackHoleFDW, and the other
          a no-op implementation in kafka_fdw [0]. We also don't seem to have
          any testing on this feature.

          I'm kind of down on removing either of these. ermExtra is explicitly
          intended for extensions to use, and just because we haven't found any
          users doesn't mean there aren't any, or might not be next week.

          That's a good point, and also why I wasn't 100% sure removing it was a
          good idea. I'm not quite sure why this would be used (rather than the
          internal state of the FDW, or no state at all), but haven't looked
          very deep into it either, so I'm quite fine with not channging that.
          Custom nodes are one extra possibility? I'd leave ermActive and
          ermExtra be.
          I think it would be a good idea to push 0003 for v17, just so nobody
          grows an unnecessary dependency on that field. 0001 and 0005 could
          be left for v18, but on the other hand they're so trivial that it
          could also be sensible to just push them to get them out of the way.

          Beta 1 scheduled to be released for quite some time, so I don't think
          there are any problems with fixing these kinds of minor issues in the
          provisional ABI for v17.
          Tweaking the APIs should be OK until GA, as long as we agree that the
          current interfaces can be improved.
          0003 is new in v17, so let's apply it now. I don't see much a strong
          argument in waiting for the removal of 0001 and 0005, either, to keep
          the interfaces cleaner moving on. However, this is not a regression
          and these have been around for years, so I'd suggest for v18 to open
          before moving on with the removal.
          I was wondering for a bit about how tss_htup could be abused in the
          open, and the only references I can see come from forks of the
          pre-2019 area, where this uses TidNext(). As a whole, ripping it out
          does not stress me much.
          --
          Michael
          • Jump to comment-1
            Tom Lane<tgl@sss.pgh.pa.us>
            Apr 23, 2024, 5:01 PM UTC
            Michael Paquier <michael@paquier.xyz> writes:
            On Mon, Apr 22, 2024 at 06:46:27PM +0200, Matthias van de Meent wrote:
            On Mon, 22 Apr 2024 at 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
            I think it would be a good idea to push 0003 for v17, just so nobody
            grows an unnecessary dependency on that field. 0001 and 0005 could
            be left for v18, but on the other hand they're so trivial that it
            could also be sensible to just push them to get them out of the way.
            Tweaking the APIs should be OK until GA, as long as we agree that the
            current interfaces can be improved.
            0003 is new in v17, so let's apply it now. I don't see much a strong
            argument in waiting for the removal of 0001 and 0005, either, to keep
            the interfaces cleaner moving on. However, this is not a regression
            and these have been around for years, so I'd suggest for v18 to open
            before moving on with the removal.
            I went ahead and pushed 0001 and 0003, figuring there was little
            point in waiting on 0001. I'd intended to push 0005 (remove "isall")
            as well, but it failed check-world:
            diff -U3 /home/postgres/pgsql/contrib/pgstatstatements/expected/utility.out /home/postgres/pgsql/contrib/pgstatstatements/results/utility.out
            --- /home/postgres/pgsql/contrib/pgstatstatements/expected/utility.out 2023-12-08 15:14:55.689347888 -0500
            +++ /home/postgres/pgsql/contrib/pgstatstatements/results/utility.out 2024-04-23 12:17:22.187721947 -0400
            @@ -536,12 +536,11 @@
             SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
              calls | rows |                       query                        
             -------+------+----------------------------------------------------
            -     2 |    0 | DEALLOCATE $1
            -     2 |    0 | DEALLOCATE ALL
            +     4 |    0 | DEALLOCATE $1
                  2 |    2 | PREPARE stat_select AS SELECT $1 AS a
                  1 |    1 | SELECT $1 as a
                  1 |    1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
            -(5 rows)
            +(4 rows)
            
             SELECT pg_stat_statements_reset() IS NOT NULL AS t;
            That is, query jumbling no longer distinguishes "DEALLOCATE x" from
            "DEALLOCATE ALL", because the DeallocateStmt.name field is marked
            queryjumbleignore. Now maybe that's fine, but it's a point
            we'd not considered so far in this thread. Thoughts?
            		regards, tom lane
            • Jump to comment-1
              Michael Paquier<michael@paquier.xyz>
              Apr 24, 2024, 2:57 AM UTC
              On Tue, Apr 23, 2024 at 01:01:04PM -0400, Tom Lane wrote:
              That is, query jumbling no longer distinguishes "DEALLOCATE x" from
              "DEALLOCATE ALL", because the DeallocateStmt.name field is marked
              queryjumbleignore. Now maybe that's fine, but it's a point
              we'd not considered so far in this thread. Thoughts?
              And of course, I've managed to forget about bb45156f342c and the
              reason behind the addition of the field is to be able to make the
              difference between the named and ALL cases for DEALLOCATE, around
              here:
              https://www.postgresql.org/message-id/ZNq9kRwWbKzvR%2B2a%40paquier.xyz
              This is new in v17, so perhaps it could be changed, but I think that's
              important to make the difference here for monitoring purposes as
              DEALLOCATE ALL could be used as a way to clean up prepared statements
              in connection poolers (for example, pgbouncer's serverresetquery).
              And doing this tweak in the Node structure of DeallocateStmt is
              simpler than having to maintain a new pgnodeattr for query jumbling.
              --
              Michael
              • Jump to comment-1
                Tom Lane<tgl@sss.pgh.pa.us>
                Apr 24, 2024, 3:03 AM UTC
                Michael Paquier <michael@paquier.xyz> writes:
                On Tue, Apr 23, 2024 at 01:01:04PM -0400, Tom Lane wrote:
                That is, query jumbling no longer distinguishes "DEALLOCATE x" from
                "DEALLOCATE ALL", because the DeallocateStmt.name field is marked
                queryjumbleignore. Now maybe that's fine, but it's a point
                we'd not considered so far in this thread. Thoughts?
                And of course, I've managed to forget about bb45156f342c and the
                reason behind the addition of the field is to be able to make the
                difference between the named and ALL cases for DEALLOCATE, around
                here:
                https://www.postgresql.org/message-id/ZNq9kRwWbKzvR%2B2a%40paquier.xyz
                Hah. Seems like the comment for isall needs to explain that it
                exists for this purpose, so we don't make this mistake again.
                		regards, tom lane
                • Jump to comment-1
                  Michael Paquier<michael@paquier.xyz>
                  Apr 24, 2024, 7:28 AM UTC
                  On Tue, Apr 23, 2024 at 11:03:40PM -0400, Tom Lane wrote:
                  Hah. Seems like the comment for isall needs to explain that it
                  exists for this purpose, so we don't make this mistake again.
                  How about something like the attached?
                  --
                  Michael
                  • Jump to comment-1
                    Tom Lane<tgl@sss.pgh.pa.us>
                    Apr 24, 2024, 12:32 PM UTC
                    Michael Paquier <michael@paquier.xyz> writes:
                    On Tue, Apr 23, 2024 at 11:03:40PM -0400, Tom Lane wrote:
                    Hah. Seems like the comment for isall needs to explain that it
                    exists for this purpose, so we don't make this mistake again.
                    How about something like the attached?
                    I was thinking about wording like
                     * True if DEALLOCATE ALL.  This is redundant with "name == NULL",
                     * but we make it a separate field so that exactly this condition
                     * (and not the precise name) will be accounted for in query jumbling.
                    
                    		regards, tom lane
                    • Jump to comment-1
                      Michael Paquier<michael@paquier.xyz>
                      Apr 25, 2024, 1:26 AM UTC
                      On Wed, Apr 24, 2024 at 08:31:57AM -0400, Tom Lane wrote:
                      I was thinking about wording like

                      * True if DEALLOCATE ALL. This is redundant with "name == NULL",
                      * but we make it a separate field so that exactly this condition
                      * (and not the precise name) will be accounted for in query jumbling.
                      Fine by me. I've just used that and applied a patch to doing so.
                      Thanks.
                      --
                      Michael
                  • Jump to comment-1
                    Matthias van de Meent<boekewurm+postgres@gmail.com>
                    Apr 24, 2024, 9:51 AM UTC
                    On Wed, 24 Apr 2024 at 09:28, Michael Paquier <michael@paquier.xyz> wrote:

                    On Tue, Apr 23, 2024 at 11:03:40PM -0400, Tom Lane wrote:
                    Hah. Seems like the comment for isall needs to explain that it
                    exists for this purpose, so we don't make this mistake again.

                    How about something like the attached?
                    LGTM.
                    Thanks,
                    Matthias