DROP OWNED BY fails to clean out pg_init_privs grants

  • Jump to comment-1
    tgl@sss.pgh.pa.us2024-04-05T23:11:05+00:00
    I wondered why buildfarm member copperhead has started to fail xversion-upgrade-HEAD-HEAD tests. I soon reproduced the problem here: pg_restore: creating ACL "regress_pg_dump_schema.TYPE "test_type"" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 4355; 0 0 ACL TYPE "test_type" buildfarm pg_restore: error: could not execute query: ERROR: role "74603" does not exist Command was: SELECT pg_catalog.binary_upgrade_set_record_init_privs(true); GRANT ALL ON TYPE "regress_pg_dump_schema"."test_type" TO "74603"; SELECT pg_catalog.binary_upgrade_set_record_init_privs(false); REVOKE ALL ON TYPE "regress_pg_dump_schema"."test_type" FROM "74603"; (So now I'm wondering why *only* copperhead has shown this so far. Are our other cross-version-upgrade testing animals AWOL?) I believe this is a longstanding problem that was exposed by accident by commit 936e3fa37. If you run "make installcheck" in HEAD's src/test/modules/test_pg_dump, and then poke around in the leftover contrib_regression database, you can find dangling grants in pg_init_privs: contrib_regression=# table pg_init_privs; objoid | classoid | objsubid | privtype | initprivs --------+----------+----------+----------+-------------------------------------- --------------------------- ... es} 43134 | 1259 | 0 | e | {postgres=rwU/postgres,43125=U/postgr es} 43128 | 1259 | 0 | e | {postgres=arwdDxtm/postgres,43125=r/p ostgres} ... The fact that the DROP ROLE added by 936e3fa37 succeeded indicates that these role references weren't captured in pg_shdepend. I imagine that we also lack code that would allow DROP OWNED BY to follow up on such entries if they existed, but I've not checked that for sure. In any case, there's probably a nontrivial amount of code to be written to make this work. Given the lack of field complaints, I suspect that extension scripts simply don't grant privileges to random roles that aren't the extension's owner. So I wonder a little bit if this is even worth fixing, as opposed to blocking off somehow. But probably we should first try to fix it. I doubt this is something we'll have fixed by Monday, so I will go add an open item for it. regards, tom lane
    • Jump to comment-1
      noah@leadboat.com2024-04-06T01:46:39+00:00
      On Fri, Apr 05, 2024 at 07:10:59PM -0400, Tom Lane wrote: > I wondered why buildfarm member copperhead has started to fail > xversion-upgrade-HEAD-HEAD tests. I soon reproduced the problem here: > > pg_restore: creating ACL "regress_pg_dump_schema.TYPE "test_type"" > pg_restore: while PROCESSING TOC: > pg_restore: from TOC entry 4355; 0 0 ACL TYPE "test_type" buildfarm > pg_restore: error: could not execute query: ERROR: role "74603" does not exist > Command was: SELECT pg_catalog.binary_upgrade_set_record_init_privs(true); > GRANT ALL ON TYPE "regress_pg_dump_schema"."test_type" TO "74603"; > SELECT pg_catalog.binary_upgrade_set_record_init_privs(false); > REVOKE ALL ON TYPE "regress_pg_dump_schema"."test_type" FROM "74603"; > > (So now I'm wondering why *only* copperhead has shown this so far. > Are our other cross-version-upgrade testing animals AWOL?) > > I believe this is a longstanding problem that was exposed by accident > by commit 936e3fa37. If you run "make installcheck" in HEAD's > src/test/modules/test_pg_dump, and then poke around in the leftover > contrib_regression database, you can find dangling grants in > pg_init_privs: > > contrib_regression=# table pg_init_privs; > objoid | classoid | objsubid | privtype | initprivs > > --------+----------+----------+----------+-------------------------------------- > --------------------------- > ... > es} > 43134 | 1259 | 0 | e | {postgres=rwU/postgres,43125=U/postgr > es} > 43128 | 1259 | 0 | e | {postgres=arwdDxtm/postgres,43125=r/p > ostgres} > ... > > The fact that the DROP ROLE added by 936e3fa37 succeeded indicates > that these role references weren't captured in pg_shdepend. > I imagine that we also lack code that would allow DROP OWNED BY to > follow up on such entries if they existed, but I've not checked that > for sure. In any case, there's probably a nontrivial amount of code > to be written to make this work. > > Given the lack of field complaints, I suspect that extension scripts > simply don't grant privileges to random roles that aren't the > extension's owner. So I wonder a little bit if this is even worth > fixing, as opposed to blocking off somehow. But probably we should > first try to fix it. This sounds closely-related to the following thread: https://www.postgresql.org/message-id/flat/1573808483712.96817%40Optiver.com
      • Jump to comment-1
        tgl@sss.pgh.pa.us2024-04-06T02:10:15+00:00
        Noah Misch <noah@leadboat.com> writes: > On Fri, Apr 05, 2024 at 07:10:59PM -0400, Tom Lane wrote: >> The fact that the DROP ROLE added by 936e3fa37 succeeded indicates >> that these role references weren't captured in pg_shdepend. >> I imagine that we also lack code that would allow DROP OWNED BY to >> follow up on such entries if they existed, but I've not checked that >> for sure. In any case, there's probably a nontrivial amount of code >> to be written to make this work. >> >> Given the lack of field complaints, I suspect that extension scripts >> simply don't grant privileges to random roles that aren't the >> extension's owner. So I wonder a little bit if this is even worth >> fixing, as opposed to blocking off somehow. But probably we should >> first try to fix it. > This sounds closely-related to the following thread: > https://www.postgresql.org/message-id/flat/1573808483712.96817%40Optiver.com Oh, interesting, I'd forgotten that thread completely. So Stephen was pushing back against dealing with the case because he thought that the SQL commands issued in that example should not have produced pg_init_privs entries in the first place. Which nobody else wanted to opine on, so the thread stalled. However, in the case of the test_pg_dump extension, the test_pg_dump--1.0.sql script absolutely did grant those privileges so it's very hard for me to think that they shouldn't be listed in pg_init_privs. Hence, I think we've accidentally stumbled across a case where we do need all that mechanism --- unless somebody wants to argue that what test_pg_dump--1.0.sql is doing should be disallowed. regards, tom lane
    • Jump to comment-1
      daniel@yesql.se2024-04-06T07:22:10+00:00
      > On 6 Apr 2024, at 01:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (So now I'm wondering why *only* copperhead has shown this so far. > Are our other cross-version-upgrade testing animals AWOL?) Clicking around searching for Xversion animals I didn't spot any, but without access to the database it's nontrivial to know which animal does what. > I doubt this is something we'll have fixed by Monday, so I will > go add an open item for it. +1. Having opened the can of worms I'll have a look at it next week. -- Daniel Gustafsson
      • Jump to comment-1
        tgl@sss.pgh.pa.us2024-04-21T21:08:19+00:00
        Daniel Gustafsson <daniel@yesql.se> writes: >> On 6 Apr 2024, at 01:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (So now I'm wondering why *only* copperhead has shown this so far. >> Are our other cross-version-upgrade testing animals AWOL?) > Clicking around searching for Xversion animals I didn't spot any, but without > access to the database it's nontrivial to know which animal does what. I believe I see why this is (or isn't) happening. The animals currently running xversion tests are copperhead, crake, drongo, and fairywren. copperhead is using the makefiles while the others are using meson. And I find this in src/test/modules/test_pg_dump/meson.build (from 3f0e786cc): # doesn't delete its user 'runningcheck': false, So the meson animals are not running the test that sets up the problematic data. I think we should remove the above, since (a) the reason to have it is gone, and (b) it seems really bad that the set of tests run by meson is different from that run by the makefiles. However, once we do that, those other three animals will presumably go red, greatly complicating detection of any Windows-specific problems. So I'm inclined to not do it till just before we intend to commit a fix for the underlying problem. (Enough before that we can confirm that they do go red.) Speaking of which ... >> I doubt this is something we'll have fixed by Monday, so I will >> go add an open item for it. > +1. Having opened the can of worms I'll have a look at it next week. ... were you going to look at it? I can take a whack if it's too far down your priority list. regards, tom lane
        • Jump to comment-1
          daniel@yesql.se2024-04-22T21:10:03+00:00
          > On 21 Apr 2024, at 23:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > ... were you going to look at it? I can take a whack if it's too far down your priority list. Yeah, I’m working on a patchset right now. ./daniel
        • Jump to comment-1
          daniel@yesql.se2024-04-26T13:42:17+00:00
          > On 21 Apr 2024, at 23:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Daniel Gustafsson <daniel@yesql.se> writes: >>> On 6 Apr 2024, at 01:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> (So now I'm wondering why *only* copperhead has shown this so far. >>> Are our other cross-version-upgrade testing animals AWOL?) > >> Clicking around searching for Xversion animals I didn't spot any, but without >> access to the database it's nontrivial to know which animal does what. > > I believe I see why this is (or isn't) happening. The animals > currently running xversion tests are copperhead, crake, drongo, > and fairywren. copperhead is using the makefiles while the others > are using meson. And I find this in > src/test/modules/test_pg_dump/meson.build (from 3f0e786cc): > > # doesn't delete its user > 'runningcheck': false, > > So the meson animals are not running the test that sets up the > problematic data. ugh =/ > I think we should remove the above, since (a) the reason to have > it is gone, and (b) it seems really bad that the set of tests > run by meson is different from that run by the makefiles. Agreed. > However, once we do that, those other three animals will presumably go > red, greatly complicating detection of any Windows-specific problems. > So I'm inclined to not do it till just before we intend to commit > a fix for the underlying problem. (Enough before that we can confirm > that they do go red.) Agreed, we definitely want that but compromising the ability to find Windows issues at this point in the cycle seems bad. > ... were you going to look at it? I can take a whack if it's > too far down your priority list. I took a look at this, reading code and the linked thread. My gut feeling is that Stephen is right in that the underlying bug is these privileges ending up in pg_init_privs to begin with. That being said, I wasn't able to fix that in a way that doesn't seem like a terrible hack. The attached POC hack fixes it for me but I'm not sure how to fix it properly. Your wisdom would be much appreciated. Clusters which already has such entries aren't helped by a fix for this though, fixing that would either require pg_dump to skip them, or pg_upgrade to have a check along with instructions for fixing the issue. Not sure what's the best strategy here, the lack of complaints could indicate this isn't terribly common so spending cycles on it for every pg_dump might be excessive compared to a pg_upgrade check? -- Daniel Gustafsson
          • Jump to comment-1
            tgl@sss.pgh.pa.us2024-04-26T21:03:57+00:00
            Daniel Gustafsson <daniel@yesql.se> writes: > On 21 Apr 2024, at 23:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So the meson animals are not running the test that sets up the >> problematic data. > I took a look at this, reading code and the linked thread. My gut feeling is > that Stephen is right in that the underlying bug is these privileges ending up > in pg_init_privs to begin with. That being said, I wasn't able to fix that in > a way that doesn't seem like a terrible hack. Hmm, can't we put the duplicate logic inside recordExtensionInitPriv? Even if these calls need a different result from others, adding a flag parameter seems superior to having N copies of the logic. A bigger problem though is that I think you are addressing the original complaint from the older thread, which while it's a fine thing to fix seems orthogonal to the failure we're seeing in the buildfarm. The buildfarm's problem is not that we're recording incorrect pg_init_privs entries, it's that when we do create such entries we're failing to show their dependency on the grantee role in pg_shdepend. We've missed spotting that so far because it's so seldom that pg_init_privs entries reference any but built-in roles (or at least roles that'd likely outlive the extension). regards, tom lane
            • Jump to comment-1
              tgl@sss.pgh.pa.us2024-04-27T22:45:49+00:00
              I wrote: > A bigger problem though is that I think you are addressing the > original complaint from the older thread, which while it's a fine > thing to fix seems orthogonal to the failure we're seeing in the > buildfarm. The buildfarm's problem is not that we're recording > incorrect pg_init_privs entries, it's that when we do create such > entries we're failing to show their dependency on the grantee role > in pg_shdepend. We've missed spotting that so far because it's > so seldom that pg_init_privs entries reference any but built-in > roles (or at least roles that'd likely outlive the extension). Here's a draft patch that attacks that. It seems to fix the problem with test_pg_dump: no dangling pg_init_privs grants are left behind. A lot of the changes here are just involved with needing to pass the object's owner OID to recordExtensionInitPriv so that it can be passed to updateAclDependencies. One thing I'm a bit worried about is that some of the new code assumes that all object types that are of interest here will have catcaches on OID, so that it's possible to fetch the owner OID for a generic object-with-privileges using the catcache and objectaddress.c's tables of object properties. That assumption seems to exist already, eg ExecGrant_common also assumes it, but it's not obvious that it must be so. regards, tom lane
              • Jump to comment-1
                tgl@sss.pgh.pa.us2024-04-28T18:52:55+00:00
                I wrote: > Here's a draft patch that attacks that. It seems to fix the > problem with test_pg_dump: no dangling pg_init_privs grants > are left behind. Here's a v2 that attempts to add some queries to test_pg_dump.sql to provide visual verification that pg_shdepend and pg_init_privs are updated correctly during DROP OWNED BY. It's a little bit nasty to look at the ACL column of pg_init_privs, because that text involves the bootstrap superuser's name which is site-dependent. What I did to try to make the test stable is replace(initprivs::text, current_user, 'postgres') AS initprivs This is of course not bulletproof: with a sufficiently weird bootstrap superuser name, we could get false matches to parts of "regress_dump_test_role" or to privilege strings. That seems unlikely enough to live with, but I wonder if anybody has a better idea. regards, tom lane
                • Jump to comment-1
                  daniel@yesql.se2024-04-29T18:29:58+00:00
                  > On 28 Apr 2024, at 20:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: >> Here's a draft patch that attacks that. It seems to fix the >> problem with test_pg_dump: no dangling pg_init_privs grants >> are left behind. Reading this I can't find any sharp edges, and I prefer your changes to recordExtensionInitPriv over the version I had half-baked by the time you had this finished. Trying to break it with the testcases I had devised also failed, so +1. > Here's a v2 that attempts to add some queries to test_pg_dump.sql > to provide visual verification that pg_shdepend and pg_init_privs > are updated correctly during DROP OWNED BY. It's a little bit > nasty to look at the ACL column of pg_init_privs, because that text > involves the bootstrap superuser's name which is site-dependent. > What I did to try to make the test stable is > > replace(initprivs::text, current_user, 'postgres') AS initprivs Maybe that part warrants a small comment in the testfile to keep it from sending future readers into rabbitholes? > This is of course not bulletproof: with a sufficiently weird > bootstrap superuser name, we could get false matches to parts > of "regress_dump_test_role" or to privilege strings. That > seems unlikely enough to live with, but I wonder if anybody has > a better idea. I think that will be bulletproof enough to keep it working in the buildfarm and among 99% of hackers. -- Daniel Gustafsson
                  • Jump to comment-1
                    tgl@sss.pgh.pa.us2024-04-29T19:15:17+00:00
                    Daniel Gustafsson <daniel@yesql.se> writes: >> On 28 Apr 2024, at 20:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... It's a little bit >> nasty to look at the ACL column of pg_init_privs, because that text >> involves the bootstrap superuser's name which is site-dependent. >> What I did to try to make the test stable is >> replace(initprivs::text, current_user, 'postgres') AS initprivs > Maybe that part warrants a small comment in the testfile to keep it from > sending future readers into rabbitholes? Agreed. >> This is of course not bulletproof: with a sufficiently weird >> bootstrap superuser name, we could get false matches to parts >> of "regress_dump_test_role" or to privilege strings. That >> seems unlikely enough to live with, but I wonder if anybody has >> a better idea. > I think that will be bulletproof enough to keep it working in the buildfarm and > among 99% of hackers. It occurred to me to use "aclexplode" to expand the initprivs, and then we can substitute names with simple equality tests. The test query is a bit more complicated, but I feel better about it. v3 attached also has a bit more work on code comments. regards, tom lane
                    • Jump to comment-1
                      daniel@yesql.se2024-04-29T20:53:06+00:00
                      > On 29 Apr 2024, at 21:15, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It occurred to me to use "aclexplode" to expand the initprivs, and > then we can substitute names with simple equality tests. The test > query is a bit more complicated, but I feel better about it. Nice, I didn't even remember that function existed. I agree that it's an improvement even at the increased query complexity. > v3 attached also has a bit more work on code comments. LGTM. -- Daniel Gustafsson
                      • Jump to comment-1
                        tgl@sss.pgh.pa.us2024-04-29T23:27:01+00:00
                        Daniel Gustafsson <daniel@yesql.se> writes: > On 29 Apr 2024, at 21:15, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> v3 attached also has a bit more work on code comments. > LGTM. Pushed, thanks for reviewing! regards, tom lane
                        • Jump to comment-1
                          tgl@sss.pgh.pa.us2024-04-29T23:50:23+00:00
                          I wrote: > Pushed, thanks for reviewing! Argh, I forgot I'd meant to push b0c5b215d first not second. Oh well, it was only neatnik-ism that made me want to see those other animals fail --- and a lot of the buildfarm is red right now for $other_reasons anyway. regards, tom lane
                    • Jump to comment-1
                      david.g.johnston@gmail.com2024-04-30T04:00:28+00:00
                      On Monday, April 29, 2024, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Gustafsson <daniel@yesql.se> writes: > >> On 28 Apr 2024, at 20:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > >> This is of course not bulletproof: with a sufficiently weird > >> bootstrap superuser name, we could get false matches to parts > >> of "regress_dump_test_role" or to privilege strings. That > >> seems unlikely enough to live with, but I wonder if anybody has > >> a better idea. > > > I think that will be bulletproof enough to keep it working in the > buildfarm and > > among 99% of hackers. > > It occurred to me to use "aclexplode" to expand the initprivs, and > then we can substitute names with simple equality tests. The test > query is a bit more complicated, but I feel better about it. > My solution to this was to rely on the fact that the bootstrap superuser is assigned OID 10 regardless of its name. David J.
                      • Jump to comment-1
                        tgl@sss.pgh.pa.us2024-04-30T04:10:09+00:00
                        "David G. Johnston" <david.g.johnston@gmail.com> writes: > My solution to this was to rely on the fact that the bootstrap superuser is > assigned OID 10 regardless of its name. Yeah, I wrote it that way to start with too, but reconsidered because (1) I don't like hard-coding numeric OIDs. We can avoid that in C code but it's harder to do in SQL. (2) It's not clear to me that this test couldn't be run by a non-bootstrap superuser. I think "current_user" is actually the correct thing for the role executing the test. regards, tom lane
                        • Jump to comment-1
                          david.g.johnston@gmail.com2024-04-30T04:40:54+00:00
                          On Monday, April 29, 2024, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > My solution to this was to rely on the fact that the bootstrap superuser > is > > assigned OID 10 regardless of its name. > > Yeah, I wrote it that way to start with too, but reconsidered > because > > (1) I don't like hard-coding numeric OIDs. We can avoid that in C > code but it's harder to do in SQL. If the tests don’t involve, e.g., the predefined role pg_monitor and its grantor of the memberships in the other predefined roles, this indeed can be avoided. So I think my test still needs to check for 10 even if some other superuser is allowed to produce the test output since a key output in my case was the bootstrap superuser and the initdb roles. > (2) It's not clear to me that this test couldn't be run by a > non-bootstrap superuser. I think "current_user" is actually > the correct thing for the role executing the test. > Agreed, testing against current_role is correct if the things being queried were created while executing the test. I would need to do this as well to remove the current requirement that my tests be run by the bootstrap superuser. David J.