Selective tuple deformation

  • Jump to comment-1
    David Rowley<dgrowleyml@gmail.com>
    Apr 24, 2026, 12:07 PM UTC
    (v20 material)
    Last month, in [1], I mentioned that I was working on more tuple
    deforming speedups. This time it's "selective" tuple deforming. Let
    me explain...
    As of today, if we need, say, column 10 in a table and nothing else,
    we deform columns 1-10. This is wasteful. Since c456e3911,
    CompactAttribute.attcacheoff is always populated, so in many cases, we
    know the byte offset where the required attribute's value is stored
    within the tuple. When we know this, it makes sense to only deform the
    attributes that we need for the query, rather than all attributes up
    and including the one we need.
    What we can't really do is just change the deforming code to always
    selectively deform the tuple. There are some extra overheads to
    tracking which columns we need, so for queries that require all
    attributes from a table, we save nothing and only add additional
    overhead for the attribute tracking. To get around this issue, I've
    made it so we support both the existing method and this new selective
    method, this has been added via a new operator named
    EEOPSCANSELECTSOME. The deformer that's used is selected based on
    which attributes and how many attributes are being selected. In
    theory, there should be some heuristic that we can come up with that
    picks the best one for the job.
    Let's say we have a table with 41 INT NOT NULL columns, and we only
    want the 41st column. A query like SELECT col41 FROM t WHERE col41=0;
    where t contains 1 million rows, none with col41=0, currently profiles
    as:
    66.72%  postgres   [.] tts_buffer_heap_getsomeattrs <---
    9.73%  postgres   [.] ExecInterpExpr
    3.87%  postgres   [.] heap_prepare_pagescan
    2.96%  postgres   [.] ExecSeqScanWithQualProject
    2.82%  postgres   [.] heapgettup_pagemode
    1.84%  postgres   [.] ExecStoreBufferHeapTuple
    1.45%  postgres   [.] hash_bytes
    1.26%  postgres   [.] heap_getnextslot
    Tuple deformation is taking over 66% of the query time.
    If we swap to selective deforming, this goes to:
    21.32%  postgres          [.] heapgettup_pagemode
    12.71%  postgres          [.] ExecInterpExpr
    11.31%  postgres          [.] heap_prepare_pagescan
    8.76%  postgres          [.] ExecStoreBufferHeapTuple
    8.54%  postgres          [.] heap_getnextslot
    7.15%  postgres          [.] ExecSeqScanWithQualProject
    6.83%  postgres          [.] tts_buffer_heap_selectattrs <---
    2.45%  postgres          [.] hash_bytes
    1.66%  postgres          [.] LWLockRelease
    1.16%  postgres          [.] hash_search_with_hash_value
    1.06%  postgres          [.] ReservePrivateRefCountEntry
    The new deforming function is taking around 7% of the time here.
    Today, I was experimenting with the heuristic that selects the
    deformer. For now, I've made it so it checks the maximum attribute
    needed, and if we need more than half of the attributes before that
    one, then deform everything up to the max. Otherwise, only deform the
    required attributes. I also added a GUC to override the heuristic of
    which deformer is selected; debugtupledeform supports "incremental",
    "selective", and "auto". "incremental" is what I named the existing
    method. The GUC is only for testing. I don't anticipate keeping it.
    I tried all 3 of those GUC settings on the 41 column table and
    effectively did: SELECT col41 FROM t WHERE col41=0; and then
    progressively added the columns one by one in descending order. The
    "auto" setting will switch methods when there are 20 columns in the
    select list. See graph1.png for the performance results. This shows
    that when only a few columns are needed, the query time improves. With
    1 column selected, patched 128% faster than master. Looking at the red
    line, you can see that as more columns are selected, the increase is
    less and less, and when all 41 columns are needed, selective deforming
    causes a performance regression. The black line shows the "auto"
    setting, which follows the red line (with noise) from 1-20, then the
    blue line from 21-41, from where the incremental method is used.
    graph2.png shows the performance of running the same queries as
    graph1, but the table has a varlena column as the first column. This
    means there's no use of cached offsets, and the tuple is always walked
    to find the offset for the first required column. This shows that the
    speedup for the 1 column case isn't as good, and the performance
    increase is just 99% faster than master. The gains still exist through
    not having to store Datums in tts_values that we don't need.
    The patch is tagged with "WIP", so not finished yet. There is no JIT
    support. I expect if you JIT any expression, bad things will happen.
    I've also only added support for Seq Scan. Other scan node types could
    be supported.
    The attached script can be run with "./selectdeformbench.sh setup"
    to create the table, then "./selectdeformbench.sh run" to run the
    test. Currently, varlena_first is set to 0. Set that to 1 to have a
    text column added as the first column.
    I'll happily take comments and further ideas here, but I won't be
    working on it with much priority until June.
    WIP patch is attached.
    David
    [1] https://postgr.es/m/CAApHDvpdB1t7LCgH8=KOKC6VBb2rsEbaas0FiXo5awsRgCsDxQ@mail.gmail.com