Skip to content

Commit 5bf748b

Browse files
Enhance nbtree ScalarArrayOp execution.
Commit 9e8da0f taught nbtree to handle ScalarArrayOpExpr quals natively. This works by pushing down the full context (the array keys) to the nbtree index AM, enabling it to execute multiple primitive index scans that the planner treats as one continuous index scan/index path. This earlier enhancement enabled nbtree ScalarArrayOp index-only scans. It also allowed scans with ScalarArrayOp quals to return ordered results (with some notable restrictions, described further down). Take this general approach a lot further: teach nbtree SAOP index scans to decide how to execute ScalarArrayOp scans (when and where to start the next primitive index scan) based on physical index characteristics. This can be far more efficient. All SAOP scans will now reliably avoid duplicative leaf page accesses (just like any other nbtree index scan). SAOP scans whose array keys are naturally clustered together now require far fewer index descents, since we'll reliably avoid starting a new primitive scan just to get to a later offset from the same leaf page. The scan's arrays now advance using binary searches for the array element that best matches the next tuple's attribute value. Required scan key arrays (i.e. arrays from scan keys that can terminate the scan) ratchet forward in lockstep with the index scan. Non-required arrays (i.e. arrays from scan keys that can only exclude non-matching tuples) "advance" without the process ever rolling over to a higher-order array. Naturally, only required SAOP scan keys trigger skipping over leaf pages (non-required arrays cannot safely end or start primitive index scans). Consequently, even index scans of a composite index with a high-order inequality scan key (which we'll mark required) and a low-order SAOP scan key (which we won't mark required) now avoid repeating leaf page accesses -- that benefit isn't limited to simpler equality-only cases. In general, all nbtree index scans now output tuples as if they were one continuous index scan -- even scans that mix a high-order inequality with lower-order SAOP equalities reliably output tuples in index order. This allows us to remove a couple of special cases that were applied when building index paths with SAOP clauses during planning. Bugfix commit 807a40c taught the planner to avoid generating unsafe path keys: path keys on a multicolumn index path, with a SAOP clause on any attribute beyond the first/most significant attribute. These cases are now all safe, so we go back to generating path keys without regard for the presence of SAOP clauses (just like with any other clause type). Affected queries can now exploit scan output order in all the usual ways (e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early). Also undo changes from follow-up bugfix commit a4523c5, which taught the planner to produce alternative index paths, with path keys, but without low-order SAOP index quals (filter quals were used instead). We'll no longer generate these alternative paths, since they can no longer offer any meaningful advantages over standard index qual paths. Affected queries thereby avoid all of the disadvantages that come from using filter quals within index scan nodes. They can avoid extra heap page accesses from using filter quals to exclude non-matching tuples (index quals will never have that problem). They can also skip over irrelevant sections of the index in more cases (though only when nbtree determines that starting another primitive scan actually makes sense). There is a theoretical risk that removing restrictions on SAOP index paths from the planner will break compatibility with amcanorder-based index AMs maintained as extensions. Such an index AM could have the same limitations around ordered SAOP scans as nbtree had up until now. Adding a pro forma incompatibility item about the issue to the Postgres 17 release notes seems like a good idea. Author: Peter Geoghegan <[email protected]> Author: Matthias van de Meent <[email protected]> Reviewed-By: Heikki Linnakangas <[email protected]> Reviewed-By: Matthias van de Meent <[email protected]> Reviewed-By: Tomas Vondra <[email protected]> Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
1 parent ddd9e43 commit 5bf748b

File tree

22 files changed

+3487
-579
lines changed

22 files changed

+3487
-579
lines changed

doc/src/sgml/indexam.sgml

+9-1
Original file line numberDiff line numberDiff line change
@@ -809,14 +809,22 @@ amrestrpos (IndexScanDesc scan);
809809
<para>
810810
<programlisting>
811811
Size
812-
amestimateparallelscan (void);
812+
amestimateparallelscan (int nkeys,
813+
int norderbys);
813814
</programlisting>
814815
Estimate and return the number of bytes of dynamic shared memory which
815816
the access method will be needed to perform a parallel scan. (This number
816817
is in addition to, not in lieu of, the amount of space needed for
817818
AM-independent data in <structname>ParallelIndexScanDescData</structname>.)
818819
</para>
819820

821+
<para>
822+
The <literal>nkeys</literal> and <literal>norderbys</literal>
823+
parameters indicate the number of quals and ordering operators that will be
824+
used in the scan; the same values will be passed to <function>amrescan</function>.
825+
Note that the actual values of the scan keys aren't provided yet.
826+
</para>
827+
820828
<para>
821829
It is not necessary to implement this function for access methods which
822830
do not support parallel scans or for which the number of additional bytes

doc/src/sgml/monitoring.sgml

+13
Original file line numberDiff line numberDiff line change
@@ -4064,6 +4064,19 @@ description | Waiting for a newly initialized WAL file to reach durable storage
40644064
</para>
40654065
</note>
40664066

4067+
<note>
4068+
<para>
4069+
Queries that use certain <acronym>SQL</acronym> constructs to search for
4070+
rows matching any value out of a list or array of multiple scalar values
4071+
(see <xref linkend="functions-comparisons"/>) perform multiple
4072+
<quote>primitive</quote> index scans (up to one primitive scan per scalar
4073+
value) during query execution. Each internal primitive index scan
4074+
increments <structname>pg_stat_all_indexes</structname>.<structfield>idx_scan</structfield>,
4075+
so it's possible for the count of index scans to significantly exceed the
4076+
total number of index scan executor node executions.
4077+
</para>
4078+
</note>
4079+
40674080
</sect2>
40684081

40694082
<sect2 id="monitoring-pg-statio-all-tables-view">

src/backend/access/index/indexam.c

+4-6
Original file line numberDiff line numberDiff line change
@@ -449,13 +449,10 @@ index_restrpos(IndexScanDesc scan)
449449

450450
/*
451451
* index_parallelscan_estimate - estimate shared memory for parallel scan
452-
*
453-
* Currently, we don't pass any information to the AM-specific estimator,
454-
* so it can probably only return a constant. In the future, we might need
455-
* to pass more information.
456452
*/
457453
Size
458-
index_parallelscan_estimate(Relation indexRelation, Snapshot snapshot)
454+
index_parallelscan_estimate(Relation indexRelation, int nkeys, int norderbys,
455+
Snapshot snapshot)
459456
{
460457
Size nbytes;
461458

@@ -474,7 +471,8 @@ index_parallelscan_estimate(Relation indexRelation, Snapshot snapshot)
474471
*/
475472
if (indexRelation->rd_indam->amestimateparallelscan != NULL)
476473
nbytes = add_size(nbytes,
477-
indexRelation->rd_indam->amestimateparallelscan());
474+
indexRelation->rd_indam->amestimateparallelscan(nkeys,
475+
norderbys));
478476

479477
return nbytes;
480478
}

0 commit comments

Comments
 (0)