Tarball builds in the new world order

  • Jump to comment-1
    Tom Lane<tgl@sss.pgh.pa.us>
    Apr 23, 2024, 10:05 PM UTC
    With one eye on the beta-release calendar, I thought it'd be a
    good idea to test whether our tarball build script works for
    the new plan where we'll use "git archive" instead of the
    traditional process.
    It doesn't.
    It makes tarballs all right, but whatever commit ID you specify
    is semi-ignored, and you get a tarball corresponding to HEAD
    of master. (The PDFs come from the right version, though!)
    The reason for that is that the mk-one-release script does this
    (shorn of not-relevant-here details):
    export BASE=/home/pgsql
    export GIT_DIR=$BASE/postgresql.git
    
    mkdir pgsql
    
    # Export the selected git ref
    	git archive ${gitref} | tar xf - -C pgsql
    cd pgsql
    ./configure
    
    # Produce .tar.gz and .tar.bz2 files
    make dist
    Since 619bc23a1, what "make dist" does is
    $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
    $(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
    Since GIT_DIR is set, git consults that repo not the current working
    directory, so HEAD means whatever it means in a just-fetched repo,
    and mk-one-release's efforts to select the ${gitref} commit mean
    nothing. (If git had tried to consult the current working directory,
    it would've failed for lack of any .git subdir therein.)
    I really really don't want to put version-specific coding into
    mk-one-release, but fortunately I think we don't have to.
    What I suggest is doing this in mk-one-release:
    -make dist
    +make dist PGCOMMITHASH=${gitref}
    and changing the "make dist" rules to write $(PGCOMMITHASH) not
    HEAD. The extra make variable will have no effect in the back
    branches, while it should cause the right thing to happen with
    the new implementation of "make dist".
    This change seems like a good thing anyway for anyone who's tempted
    to use "make dist" manually, since they wouldn't necessarily want
    to package HEAD either. Now, if we just do it exactly like that
    then trying to "make dist" without setting PGCOMMITHASH will
    fail, since "git archive" has no default for its <tree-ish>
    argument. I can't quite decide if that's a good thing, or if we
    should hack the makefile a little further to allow PGCOMMITHASH
    to default to HEAD.
    Thoughts, better ideas?
    		regards, tom lane
    • Jump to comment-1
      Peter Eisentraut<peter@eisentraut.org>
      Apr 24, 2024, 2:47 PM UTC
      On 24.04.24 00:05, Tom Lane wrote:
      It makes tarballs all right, but whatever commit ID you specify
      is semi-ignored, and you get a tarball corresponding to HEAD
      of master. (The PDFs come from the right version, though!)
      The reason for that is that the mk-one-release script does this
      (shorn of not-relevant-here details):
      export BASE=/home/pgsql
      export GIT_DIR=$BASE/postgresql.git
      mkdir pgsql
      # Export the selected git ref
      git archive ${gitref} | tar xf - -C pgsql
      Where does ${gitref} come from? Why doesn't this line use git archive HEAD | ... ?
      What I suggest is doing this in mk-one-release:
      -make dist
      +make dist PGCOMMITHASH=${gitref}
      and changing the "make dist" rules to write $(PGCOMMITHASH) not
      HEAD. The extra make variable will have no effect in the back
      branches, while it should cause the right thing to happen with
      the new implementation of "make dist".
      I suppose we could do something like that, but we'd also need to come up with a meson version.
      (Let's not use "hash" though, since other ways to commit specify a commit can be used.)
      This change seems like a good thing anyway for anyone who's tempted
      to use "make dist" manually, since they wouldn't necessarily want
      to package HEAD either.
      A tin-foil-hat argument is that we might not want to encourage that, because for reproducibility, we need a known git commit and also a known implementation of make dist. If in the future someone uses the make dist implementation of PG19 to build a tarball for PG17, it might not come out the same way as using the make dist implementation of PG17.
      • Jump to comment-1
        Tom Lane<tgl@sss.pgh.pa.us>
        Apr 24, 2024, 3:03 PM UTC
        Peter Eisentraut <peter@eisentraut.org> writes:
        On 24.04.24 00:05, Tom Lane wrote:
        # Export the selected git ref
        git archive ${gitref} | tar xf - -C pgsql
        Where does ${gitref} come from? Why doesn't this line use git archive
        HEAD | ... ?
        ${gitref} is an argument to the script, specifying the commit
        to be packaged. HEAD would certainly not work when trying to
        package a back-branch release, and in general it would hardly
        ever be what you want if your goal is to make a reproducible
        package.
        What I suggest is doing this in mk-one-release:
        -make dist
        +make dist PGCOMMITHASH=${gitref}
        I suppose we could do something like that, but we'd also need to come up
        with a meson version.
        Packaging via meson is years away yet IMO, so I'm unconcerned
        about that for now. See below.
        (Let's not use "hash" though, since other ways to commit specify a
        commit can be used.)
        OK, do you have a different term in mind?
        This change seems like a good thing anyway for anyone who's tempted
        to use "make dist" manually, since they wouldn't necessarily want
        to package HEAD either.
        A tin-foil-hat argument is that we might not want to encourage that,
        because for reproducibility, we need a known git commit and also a known
        implementation of make dist. If in the future someone uses the make
        dist implementation of PG19 to build a tarball for PG17, it might not
        come out the same way as using the make dist implementation of PG17.
        Of course. The entire reason why this script invokes "make dist",
        rather than implementing the behavior for itself, is so that
        branch-specific behaviors can be accounted for in the branches
        not here. (To be clear, the script has no idea which branch
        it's packaging --- that's implicit in the commit ID.)
        Because of that, I really don't want to rely on some equivalent
        meson infrastructure until it's available in all the live branches.
        v15 will be EOL in 3.5 years, and that's more or less the time frame
        that we've spoken of for dropping the makefile infrastructure, so
        I don't think that's an unreasonable plan.
        		regards, tom lane
    • Jump to comment-1
      Greg Sabino Mullane<htamfids@gmail.com>
      Apr 24, 2024, 2:39 PM UTC
      On Tue, Apr 23, 2024 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
      This change seems like a good thing anyway for anyone who's tempted
      to use "make dist" manually, since they wouldn't necessarily want
      to package HEAD either. Now, if we just do it exactly like that
      then trying to "make dist" without setting PGCOMMITHASH will
      fail, since "git archive" has no default for its <tree-ish>
      argument. I can't quite decide if that's a good thing, or if we
      should hack the makefile a little further to allow PGCOMMITHASH
      to default to HEAD.
      Just having it fail seems harsh. What if we had plain "make dist" at least
      output a friendly hint about "please specify a hash"? That seems better
      than an implicit HEAD default, as they can manually set it to HEAD
      themselves per the hint.
      Cheers,
      Greg
      • Jump to comment-1
        Tom Lane<tgl@sss.pgh.pa.us>
        Apr 24, 2024, 3:21 PM UTC
        Greg Sabino Mullane <htamfids@gmail.com> writes:
        On Tue, Apr 23, 2024 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
        Now, if we just do it exactly like that
        then trying to "make dist" without setting PGCOMMITHASH will
        fail, since "git archive" has no default for its <tree-ish>
        argument. I can't quite decide if that's a good thing, or if we
        should hack the makefile a little further to allow PGCOMMITHASH
        to default to HEAD.
        Just having it fail seems harsh. What if we had plain "make dist" at least
        output a friendly hint about "please specify a hash"? That seems better
        than an implicit HEAD default, as they can manually set it to HEAD
        themselves per the hint.
        Yeah, it would be easy to do something like
        ifneq ($(PGCOMMITHASH),)
        $(GIT) ...
        else
        @echo "Please specify PG_COMMIT_HASH." && exit 1
        endif
        I'm just debating whether that's better than inserting a default
        value.
        		regards, tom lane
        • Jump to comment-1
          Tom Lane<tgl@sss.pgh.pa.us>
          Apr 26, 2024, 7:24 PM UTC
          Concretely, I'm proposing the attached. Peter didn't like
          PGCOMMITHASH, so I have PGCOMMITREFSPEC below, but I'm not
          wedded to that if a better name is proposed.
          		regards, tom lane
          • Jump to comment-1
            Peter Eisentraut<peter@eisentraut.org>
            Apr 29, 2024, 12:43 PM UTC
            On 26.04.24 21:24, Tom Lane wrote:
            Concretely, I'm proposing the attached. Peter didn't like
            PGCOMMITHASH, so I have PGCOMMITREFSPEC below, but I'm not
            wedded to that if a better name is proposed.
            Um, "refspec" leads me here https://git-scm.com/book/en/v2/Git-Internals-The-Refspec, which seems like the wrong concept. I think the more correct concept is "revision" (https://git-scm.com/docs/gitrevisions), so something like PGGITREVISION?
          • Jump to comment-1
            Peter Eisentraut<peter@eisentraut.org>
            Apr 29, 2024, 12:36 PM UTC
            On 26.04.24 21:24, Tom Lane wrote:
            Concretely, I'm proposing the attached. Peter didn't like
            PGCOMMITHASH, so I have PGCOMMITREFSPEC below, but I'm not
            wedded to that if a better name is proposed.
            This seems ok to me, but note that we do have an equivalent implementation in meson. If we don't want to update that in a similar way, maybe we should disable it.
            • Jump to comment-1
              Tom Lane<tgl@sss.pgh.pa.us>
              Apr 29, 2024, 4:14 PM UTC
              Peter Eisentraut <peter@eisentraut.org> writes:
              On 26.04.24 21:24, Tom Lane wrote:
              Concretely, I'm proposing the attached. Peter didn't like
              PGCOMMITHASH, so I have PGCOMMITREFSPEC below, but I'm not
              wedded to that if a better name is proposed.
              This seems ok to me, but note that we do have an equivalent
              implementation in meson. If we don't want to update that in a similar
              way, maybe we should disable it.
              OK. After poking at that for awhile, it seemed like "default to
              HEAD" fits into meson a lot better than "throw an error if the
              variable isn't set", so I switched to doing it like that.
              One reason is that AFAICT you can only set the variable during
              "meson setup" not during "ninja". This won't matter to the
              tarball build script, which does a one-off configuration run
              anyway. But for manual use, a movable target like HEAD might be
              more convenient given that behavior.
              I tested this by building tarballs using the makefiles on a RHEL8
              box, and using meson on my MacBook (with recent MacPorts tools).
              I got bit-for-bit identical files, which I found rather impressive
              given the gap between the platforms. Maybe this "reproducible builds"
              wheeze will actually work.
              I also changed the variable name to PGGITREVISION per your
              other suggestion.
              		regards, tom lane
          • Jump to comment-1
            Alvaro Herrera<alvherre@alvh.no-ip.org>
            Apr 28, 2024, 7:16 PM UTC
            On 2024-Apr-26, Tom Lane wrote:
            --- mk-one-release.orig 2024-04-23 17:30:08.983226671 -0400
            +++ mk-one-release 2024-04-26 15:17:29.713669677 -0400
            @@ -39,13 +39,17 @@ mkdir pgsql
            git archive ${gitref} | tar xf - -C pgsql

            # Include the git ref in the output tarballs
            +# (This has no effect with v17 and up; instead we rely on "git archive"
            +# to include the commit hash in the tar header)
            echo ${gitref} >pgsql/.gitrevision
            Why is it that the .gitrevision file is only created here, instead of
            being added to the tarball that "git archive" produces? Adding an
            argument like
            --add-virtual-file $(distdir)/.gitrevision:$(GIT_REFSPEC)
            to the git archive call should suffice.
            --
            Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
            "I can't go to a restaurant and order food because I keep looking at the
            fonts on the menu. Five minutes later I realize that it's also talking
            about food" (Donald Knuth)
            • Jump to comment-1
              Tom Lane<tgl@sss.pgh.pa.us>
              Apr 28, 2024, 7:44 PM UTC
              Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
              Why is it that the .gitrevision file is only created here, instead of
              being added to the tarball that "git archive" produces? Adding an
              argument like
              --add-virtual-file $(distdir)/.gitrevision:$(GIT_REFSPEC)
              to the git archive call should suffice.
              I think we don't want to do that. In the first place, it's redundant
              because "git archive" includes the commit hash in the tar header,
              and in the second place it gets away from the concept that the tarball
              contains exactly what is in our git tree.
              Now admittedly, if anyone's built tooling that relies on the presence
              of the .gitrevision file, they might prefer that we keep on including
              it. But I'm not sure anyone has, and in any case I think switching
              to the git-approved way of incorporating the hash is the best thing
              in the long run.
              What I'm thinking of doing, as soon as we've sorted the tarball
              creation process, is to make a test tarball available to the
              packagers group so that anyone interested can start working on
              updating their packaging process for the new approach. Hopefully,
              if anyone's especially unhappy about omitting .gitrevision, they'll
              speak up.
              		regards, tom lane