Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: cf/5805~1
Choose a base ref
...
head repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cf/5805
Choose a head ref
  • 2 commits
  • 7 files changed
  • 2 contributors

Commits on Sep 17, 2025

  1. Track the maximum possible frequency of non-MCE array elements.

    The lossy-counting algorithm that ANALYZE uses to identify most-common
    array elements has a notion of cutoff frequency: elements with
    frequency greater than that are guaranteed to be collected, elements
    with smaller frequencies are not.  In cases where we find fewer MCEs
    than the stats target would permit us to store, the cutoff frequency
    provides valuable additional information, to wit that there are no
    non-MCEs with frequency greater than that.  What the selectivity
    estimation functions actually use the "minfreq" entry for is as a
    ceiling on the possible frequency of non-MCEs, so using the cutoff
    rather than the lowest stored MCE frequency provides a tighter bound
    and more accurate estimates.
    
    Therefore, instead of redundantly storing the minimum observed MCE
    frequency, store the cutoff frequency when there are fewer tracked
    values than we want.  (When there are more, then of course we cannot
    assert that no non-stored elements are above the cutoff frequency,
    since we're throwing away some that are; so we still use the
    minimum stored frequency in that case.)
    
    Notably, this works even when none of the values are common enough
    to be called MCEs.  In such cases we stored nothing in the
    STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
    selectivity functions falling back to default estimates.  So in that
    case we want to construct a STATISTIC_KIND_MCELEM entry that contains
    no "values" but does have "numbers", to wit the three extra numbers
    that the MCELEM entry type defines.  A small obstacle is that
    update_attstats() has traditionally stored a null, not an empty array,
    when passed zero "values" for a slot.  That gives rise to an MCELEM
    entry that get_attstatsslot() will spit up on.  The least risky
    solution seems to be to adjust update_attstats() so that it will emit
    a non-null (but possibly empty) array when the passed stavalues array
    pointer isn't NULL, rather than conditioning that on numvalues > 0.
    In other existing cases I don't believe that that changes anything.
    For consistency, handle the stanumbers array the same way.
    
    In addition to adjusting the typanalyze routines that collect
    STATISTIC_KIND_MCELEM data, adjust the routines that use the data
    to assume that minfreq is the ceiling on the frequency of non-MCE
    values.  Previously they assumed that the ceiling is minfreq / 2,
    which seems obviously wrong.  (Perhaps it was correct with the
    original implementation of MCE collection, but surely it is not
    with the lossy-counting algorithm.)
    
    Thanks to Matt Long for the suggestion that we could apply this
    idea even when there are more than zero MCEs.
    
    Reported-by: Mark Frost <[email protected]>
    Reported-by: Matt Long <[email protected]>
    Author: Tom Lane <[email protected]>
    Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
    tglsfdc authored and Commitfest Bot committed Sep 17, 2025
    Configuration menu
    Copy the full SHA
    184474c View commit details
    Browse the repository at this point in the history
  2. [CF 5805] v2 - Improve array-element-test estimation when no array el…

    …ements qualify as common
    
    This branch was automatically generated by a robot using patches from an
    email thread registered at:
    
    https://commitfest.postgresql.org/patch/5805
    
    The branch will be overwritten each time a new patch version is posted to
    the thread, and also periodically to check for bitrot caused by changes
    on the master branch.
    
    Patch(es): https://www.postgresql.org/message-id/[email protected]
    Author(s): Tom Lane
    Commitfest Bot committed Sep 17, 2025
    Configuration menu
    Copy the full SHA
    bc06441 View commit details
    Browse the repository at this point in the history
Loading