Tarball builds in the new world order

  • Jump to comment-1
    tgl@sss.pgh.pa.us2024-04-23T22:05:59+00:00
    With one eye on the beta-release calendar, I thought it'd be a good idea to test whether our tarball build script works for the new plan where we'll use "git archive" instead of the traditional process. It doesn't. It makes tarballs all right, but whatever commit ID you specify is semi-ignored, and you get a tarball corresponding to HEAD of master. (The PDFs come from the right version, though!) The reason for that is that the mk-one-release script does this (shorn of not-relevant-here details): export BASE=/home/pgsql export GIT_DIR=$BASE/postgresql.git mkdir pgsql # Export the selected git ref git archive ${gitref} | tar xf - -C pgsql cd pgsql ./configure # Produce .tar.gz and .tar.bz2 files make dist Since 619bc23a1, what "make dist" does is $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@ $(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@ Since GIT_DIR is set, git consults that repo not the current working directory, so HEAD means whatever it means in a just-fetched repo, and mk-one-release's efforts to select the ${gitref} commit mean nothing. (If git had tried to consult the current working directory, it would've failed for lack of any .git subdir therein.) I really really don't want to put version-specific coding into mk-one-release, but fortunately I think we don't have to. What I suggest is doing this in mk-one-release: -make dist +make dist PG_COMMIT_HASH=${gitref} and changing the "make dist" rules to write $(PG_COMMIT_HASH) not HEAD. The extra make variable will have no effect in the back branches, while it should cause the right thing to happen with the new implementation of "make dist". This change seems like a good thing anyway for anyone who's tempted to use "make dist" manually, since they wouldn't necessarily want to package HEAD either. Now, if we just do it exactly like that then trying to "make dist" without setting PG_COMMIT_HASH will fail, since "git archive" has no default for its <tree-ish> argument. I can't quite decide if that's a good thing, or if we should hack the makefile a little further to allow PG_COMMIT_HASH to default to HEAD. Thoughts, better ideas? regards, tom lane
    • Jump to comment-1
      htamfids@gmail.com2024-04-24T14:39:18+00:00
      On Tue, Apr 23, 2024 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > This change seems like a good thing anyway for anyone who's tempted > to use "make dist" manually, since they wouldn't necessarily want > to package HEAD either. Now, if we just do it exactly like that > then trying to "make dist" without setting PG_COMMIT_HASH will > fail, since "git archive" has no default for its <tree-ish> > argument. I can't quite decide if that's a good thing, or if we > should hack the makefile a little further to allow PG_COMMIT_HASH > to default to HEAD. > Just having it fail seems harsh. What if we had plain "make dist" at least output a friendly hint about "please specify a hash"? That seems better than an implicit HEAD default, as they can manually set it to HEAD themselves per the hint. Cheers, Greg
      • Jump to comment-1
        tgl@sss.pgh.pa.us2024-04-24T15:21:22+00:00
        Greg Sabino Mullane <htamfids@gmail.com> writes: > On Tue, Apr 23, 2024 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Now, if we just do it exactly like that >> then trying to "make dist" without setting PG_COMMIT_HASH will >> fail, since "git archive" has no default for its <tree-ish> >> argument. I can't quite decide if that's a good thing, or if we >> should hack the makefile a little further to allow PG_COMMIT_HASH >> to default to HEAD. > Just having it fail seems harsh. What if we had plain "make dist" at least > output a friendly hint about "please specify a hash"? That seems better > than an implicit HEAD default, as they can manually set it to HEAD > themselves per the hint. Yeah, it would be easy to do something like ifneq ($(PG_COMMIT_HASH),) $(GIT) ... else @echo "Please specify PG_COMMIT_HASH." && exit 1 endif I'm just debating whether that's better than inserting a default value. regards, tom lane
        • Jump to comment-1
          tgl@sss.pgh.pa.us2024-04-26T19:24:16+00:00
          Concretely, I'm proposing the attached. Peter didn't like PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not wedded to that if a better name is proposed. regards, tom lane
          • Jump to comment-1
            alvherre@alvh.no-ip.org2024-04-28T19:16:58+00:00
            On 2024-Apr-26, Tom Lane wrote: > --- mk-one-release.orig 2024-04-23 17:30:08.983226671 -0400 > +++ mk-one-release 2024-04-26 15:17:29.713669677 -0400 > @@ -39,13 +39,17 @@ mkdir pgsql > git archive ${gitref} | tar xf - -C pgsql > > # Include the git ref in the output tarballs > +# (This has no effect with v17 and up; instead we rely on "git archive" > +# to include the commit hash in the tar header) > echo ${gitref} >pgsql/.gitrevision Why is it that the .gitrevision file is only created here, instead of being added to the tarball that "git archive" produces? Adding an argument like --add-virtual-file $(distdir)/.gitrevision:$(GIT_REFSPEC) to the git archive call should suffice. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "I can't go to a restaurant and order food because I keep looking at the fonts on the menu. Five minutes later I realize that it's also talking about food" (Donald Knuth)
            • Jump to comment-1
              tgl@sss.pgh.pa.us2024-04-28T19:44:21+00:00
              Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Why is it that the .gitrevision file is only created here, instead of > being added to the tarball that "git archive" produces? Adding an > argument like > --add-virtual-file $(distdir)/.gitrevision:$(GIT_REFSPEC) > to the git archive call should suffice. I think we don't want to do that. In the first place, it's redundant because "git archive" includes the commit hash in the tar header, and in the second place it gets away from the concept that the tarball contains exactly what is in our git tree. Now admittedly, if anyone's built tooling that relies on the presence of the .gitrevision file, they might prefer that we keep on including it. But I'm not sure anyone has, and in any case I think switching to the git-approved way of incorporating the hash is the best thing in the long run. What I'm thinking of doing, as soon as we've sorted the tarball creation process, is to make a test tarball available to the packagers group so that anyone interested can start working on updating their packaging process for the new approach. Hopefully, if anyone's especially unhappy about omitting .gitrevision, they'll speak up. regards, tom lane
          • Jump to comment-1
            peter@eisentraut.org2024-04-29T12:36:47+00:00
            On 26.04.24 21:24, Tom Lane wrote: > Concretely, I'm proposing the attached. Peter didn't like > PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not > wedded to that if a better name is proposed. This seems ok to me, but note that we do have an equivalent implementation in meson. If we don't want to update that in a similar way, maybe we should disable it.
            • Jump to comment-1
              tgl@sss.pgh.pa.us2024-04-29T16:14:23+00:00
              Peter Eisentraut <peter@eisentraut.org> writes: > On 26.04.24 21:24, Tom Lane wrote: >> Concretely, I'm proposing the attached. Peter didn't like >> PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not >> wedded to that if a better name is proposed. > This seems ok to me, but note that we do have an equivalent > implementation in meson. If we don't want to update that in a similar > way, maybe we should disable it. OK. After poking at that for awhile, it seemed like "default to HEAD" fits into meson a lot better than "throw an error if the variable isn't set", so I switched to doing it like that. One reason is that AFAICT you can only set the variable during "meson setup" not during "ninja". This won't matter to the tarball build script, which does a one-off configuration run anyway. But for manual use, a movable target like HEAD might be more convenient given that behavior. I tested this by building tarballs using the makefiles on a RHEL8 box, and using meson on my MacBook (with recent MacPorts tools). I got bit-for-bit identical files, which I found rather impressive given the gap between the platforms. Maybe this "reproducible builds" wheeze will actually work. I also changed the variable name to PG_GIT_REVISION per your other suggestion. regards, tom lane
          • Jump to comment-1
            peter@eisentraut.org2024-04-29T12:43:18+00:00
            On 26.04.24 21:24, Tom Lane wrote: > Concretely, I'm proposing the attached. Peter didn't like > PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not > wedded to that if a better name is proposed. Um, "refspec" leads me here <https://git-scm.com/book/en/v2/Git-Internals-The-Refspec>, which seems like the wrong concept. I think the more correct concept is "revision" (https://git-scm.com/docs/gitrevisions), so something like PG_GIT_REVISION?
    • Jump to comment-1
      peter@eisentraut.org2024-04-24T14:47:00+00:00
      On 24.04.24 00:05, Tom Lane wrote: > It makes tarballs all right, but whatever commit ID you specify > is semi-ignored, and you get a tarball corresponding to HEAD > of master. (The PDFs come from the right version, though!) > > The reason for that is that the mk-one-release script does this > (shorn of not-relevant-here details): > > export BASE=/home/pgsql > export GIT_DIR=$BASE/postgresql.git > > mkdir pgsql > > # Export the selected git ref > git archive ${gitref} | tar xf - -C pgsql Where does ${gitref} come from? Why doesn't this line use git archive HEAD | ... ? > What I suggest is doing this in mk-one-release: > > -make dist > +make dist PG_COMMIT_HASH=${gitref} > > and changing the "make dist" rules to write $(PG_COMMIT_HASH) not > HEAD. The extra make variable will have no effect in the back > branches, while it should cause the right thing to happen with > the new implementation of "make dist". I suppose we could do something like that, but we'd also need to come up with a meson version. (Let's not use "hash" though, since other ways to commit specify a commit can be used.) > This change seems like a good thing anyway for anyone who's tempted > to use "make dist" manually, since they wouldn't necessarily want > to package HEAD either. A tin-foil-hat argument is that we might not want to encourage that, because for reproducibility, we need a known git commit and also a known implementation of make dist. If in the future someone uses the make dist implementation of PG19 to build a tarball for PG17, it might not come out the same way as using the make dist implementation of PG17.
      • Jump to comment-1
        tgl@sss.pgh.pa.us2024-04-24T15:03:44+00:00
        Peter Eisentraut <peter@eisentraut.org> writes: > On 24.04.24 00:05, Tom Lane wrote: >> # Export the selected git ref >> git archive ${gitref} | tar xf - -C pgsql > Where does ${gitref} come from? Why doesn't this line use git archive > HEAD | ... ? ${gitref} is an argument to the script, specifying the commit to be packaged. HEAD would certainly not work when trying to package a back-branch release, and in general it would hardly ever be what you want if your goal is to make a reproducible package. >> What I suggest is doing this in mk-one-release: >> -make dist >> +make dist PG_COMMIT_HASH=${gitref} > I suppose we could do something like that, but we'd also need to come up > with a meson version. Packaging via meson is years away yet IMO, so I'm unconcerned about that for now. See below. > (Let's not use "hash" though, since other ways to commit specify a > commit can be used.) OK, do you have a different term in mind? >> This change seems like a good thing anyway for anyone who's tempted >> to use "make dist" manually, since they wouldn't necessarily want >> to package HEAD either. > A tin-foil-hat argument is that we might not want to encourage that, > because for reproducibility, we need a known git commit and also a known > implementation of make dist. If in the future someone uses the make > dist implementation of PG19 to build a tarball for PG17, it might not > come out the same way as using the make dist implementation of PG17. Of course. The entire reason why this script invokes "make dist", rather than implementing the behavior for itself, is so that branch-specific behaviors can be accounted for in the branches not here. (To be clear, the script has no idea which branch it's packaging --- that's implicit in the commit ID.) Because of that, I really don't want to rely on some equivalent meson infrastructure until it's available in all the live branches. v15 will be EOL in 3.5 years, and that's more or less the time frame that we've spoken of for dropping the makefile infrastructure, so I don't think that's an unreasonable plan. regards, tom lane