summaryrefslogtreecommitdiff
path: root/src/backend
diff options
context:
space:
mode:
authorTom Lane2015-02-28 17:43:04 +0000
committerTom Lane2015-02-28 17:43:04 +0000
commitb514a7460d9127ddda6598307272c701cbb133b7 (patch)
tree2a04ac7147a24fa7a2f939b79b097b1ece98a6be /src/backend
parentc4f4c7ca99169ac609df67c2d0eb78162484420c (diff)
Fix planning of star-schema-style queries.
Part of the intent of the parameterized-path mechanism was to handle star-schema queries efficiently, but some overly-restrictive search limiting logic added in commit e2fa76d80ba571d4de8992de6386536867250474 prevented such cases from working as desired. Fix that and add a regression test about it. Per gripe from Marc Cousin. This is arguably a bug rather than a new feature, so back-patch to 9.2 where parameterized paths were introduced.
Diffstat (limited to 'src/backend')
-rw-r--r--src/backend/optimizer/README34
-rw-r--r--src/backend/optimizer/path/joinpath.c41
2 files changed, 58 insertions, 17 deletions
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 88cdbc32947..6cae9e8b489 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -702,17 +702,37 @@ intermediate layers of joins, for example:
-> Index Scan using C_Z_IDX on C
Index Condition: C.Z = A.X
-If all joins are plain inner joins then this is unnecessary, because
-it's always possible to reorder the joins so that a parameter is used
+If all joins are plain inner joins then this is usually unnecessary,
+because it's possible to reorder the joins so that a parameter is used
immediately below the nestloop node that provides it. But in the
-presence of outer joins, join reordering may not be possible, and then
-this option can be critical. Before version 9.2, Postgres used ad-hoc
-methods for planning and executing such queries, and those methods could
-not handle passing parameters down through multiple join levels.
+presence of outer joins, such join reordering may not be possible.
+
+Also, the bottom-level scan might require parameters from more than one
+other relation. In principle we could join the other relations first
+so that all the parameters are supplied from a single nestloop level.
+But if those other relations have no join clause in common (which is
+common in star-schema queries for instance), the planner won't consider
+joining them directly to each other. In such a case we need to be able
+to create a plan like
+
+ NestLoop
+ -> Seq Scan on SmallTable1 A
+ NestLoop
+ -> Seq Scan on SmallTable2 B
+ NestLoop
+ -> Index Scan using XYIndex on LargeTable C
+ Index Condition: C.X = A.AID and C.Y = B.BID
+
+so we should be willing to pass down A.AID through a join even though
+there is no join order constraint forcing the plan to look like this.
+
+Before version 9.2, Postgres used ad-hoc methods for planning and
+executing nestloop queries of this kind, and those methods could not
+handle passing parameters down through multiple join levels.
To plan such queries, we now use a notion of a "parameterized path",
which is a path that makes use of a join clause to a relation that's not
-scanned by the path. In the example just above, we would construct a
+scanned by the path. In the example two above, we would construct a
path representing the possibility of doing this:
-> Index Scan using C_Z_IDX on C
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index e6aa21c45f5..1da953f6d33 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -117,13 +117,14 @@ add_paths_to_joinrel(PlannerInfo *root,
/*
* Decide whether it's sensible to generate parameterized paths for this
* joinrel, and if so, which relations such paths should require. There
- * is no need to create a parameterized result path unless there is a join
- * order restriction that prevents joining one of our input rels directly
- * to the parameter source rel instead of joining to the other input rel.
- * This restriction reduces the number of parameterized paths we have to
- * deal with at higher join levels, without compromising the quality of
- * the resulting plan. We express the restriction as a Relids set that
- * must overlap the parameterization of any proposed join path.
+ * is usually no need to create a parameterized result path unless there
+ * is a join order restriction that prevents joining one of our input rels
+ * directly to the parameter source rel instead of joining to the other
+ * input rel. (But see exception in try_nestloop_path.) This restriction
+ * reduces the number of parameterized paths we have to deal with at
+ * higher join levels, without compromising the quality of the resulting
+ * plan. We express the restriction as a Relids set that must overlap the
+ * parameterization of any proposed join path.
*/
foreach(lc, root->join_info_list)
{
@@ -291,9 +292,29 @@ try_nestloop_path(PlannerInfo *root,
if (required_outer &&
!bms_overlap(required_outer, param_source_rels))
{
- /* Waste no memory when we reject a path here */
- bms_free(required_outer);
- return;
+ /*
+ * We override the param_source_rels heuristic to accept nestloop
+ * paths in which the outer rel satisfies some but not all of the
+ * inner path's parameterization. This is necessary to get good plans
+ * for star-schema scenarios, in which a parameterized path for a
+ * large table may require parameters from multiple small tables that
+ * will not get joined directly to each other. We can handle that by
+ * stacking nestloops that have the small tables on the outside; but
+ * this breaks the rule the param_source_rels heuristic is based on,
+ * namely that parameters should not be passed down across joins
+ * unless there's a join-order-constraint-based reason to do so. So
+ * ignore the param_source_rels restriction when this case applies.
+ */
+ Relids outerrelids = outer_path->parent->relids;
+ Relids innerparams = PATH_REQ_OUTER(inner_path);
+
+ if (!(bms_overlap(innerparams, outerrelids) &&
+ bms_nonempty_difference(innerparams, outerrelids)))
+ {
+ /* Waste no memory when we reject a path here */
+ bms_free(required_outer);
+ return;
+ }
}
/*