summaryrefslogtreecommitdiff
path: root/src/test/regress/expected/select_distinct.out
diff options
context:
space:
mode:
authorDavid Rowley2022-10-28 10:04:38 +0000
committerDavid Rowley2022-10-28 10:04:38 +0000
commit5543677ec90a15c73dab5ed4f0902b3b920f0b87 (patch)
tree002a4bc852b2c04d10e48e09cbce0d23e89a7c3d /src/test/regress/expected/select_distinct.out
parentb1099eca8f38ff5cfaf0901bb91cb6a22f909bc6 (diff)
Use Limit instead of Unique to implement DISTINCT, when possible
When all of the query's DISTINCT pathkeys have been marked as redundant due to EquivalenceClasses existing which contain constants, we can just implement the DISTINCT operation on a query by just limiting the number of returned rows to 1 instead of performing a Unique on all of the matching (duplicate) rows. This applies in cases such as: SELECT DISTINCT col,col2 FROM tab WHERE col = 1 AND col2 = 10; If there are any matching rows, then they must all be {1,10}. There's no point in fetching all of those and running a Unique operator on them to leave only a single row. Here we effectively just find the first row and then stop. We are obviously unable to apply this optimization if either the col = 1 or col2 = 10 were missing from the WHERE clause or if there were any additional columns in the SELECT clause. Such queries are probably not all that common, but detecting when we can apply this optimization amounts to checking if the distinct_pathkeys are NULL, which is very cheap indeed. Nothing is done here to check if the query already has a LIMIT clause. If it does then the plan may end up with 2 Limits nodes. There's no harm in that and it's probably not worth the complexity to unify them into a single Limit node. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvqS0j8RUWRUSgCAXxOqnYjHUXmKwspRj4GzVfOO25ByHA@mail.gmail.com Discussion: https://postgr.es/m/MEYPR01MB7101CD5DA0A07C9DE2B74850A4239@MEYPR01MB7101.ausprd01.prod.outlook.com
Diffstat (limited to 'src/test/regress/expected/select_distinct.out')
-rw-r--r--src/test/regress/expected/select_distinct.out54
1 files changed, 54 insertions, 0 deletions
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 748419cee05..6ce889d87c1 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -280,6 +280,60 @@ RESET min_parallel_table_scan_size;
RESET parallel_setup_cost;
RESET parallel_tuple_cost;
--
+-- Test the planner's ability to use a LIMIT 1 instead of a Unique node when
+-- all of the distinct_pathkeys have been marked as redundant
+--
+-- Ensure we get a plan with a Limit 1
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT four FROM tenk1 WHERE four = 0;
+ QUERY PLAN
+----------------------------
+ Limit
+ -> Seq Scan on tenk1
+ Filter: (four = 0)
+(3 rows)
+
+-- Ensure the above gives us the correct result
+SELECT DISTINCT four FROM tenk1 WHERE four = 0;
+ four
+------
+ 0
+(1 row)
+
+-- Ensure we get a plan with a Limit 1
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT four FROM tenk1 WHERE four = 0 AND two <> 0;
+ QUERY PLAN
+---------------------------------------------
+ Limit
+ -> Seq Scan on tenk1
+ Filter: ((two <> 0) AND (four = 0))
+(3 rows)
+
+-- Ensure no rows are returned
+SELECT DISTINCT four FROM tenk1 WHERE four = 0 AND two <> 0;
+ four
+------
+(0 rows)
+
+-- Ensure we get a plan with a Limit 1 when the SELECT list contains constants
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT four,1,2,3 FROM tenk1 WHERE four = 0;
+ QUERY PLAN
+----------------------------
+ Limit
+ -> Seq Scan on tenk1
+ Filter: (four = 0)
+(3 rows)
+
+-- Ensure we only get 1 row
+SELECT DISTINCT four,1,2,3 FROM tenk1 WHERE four = 0;
+ four | ?column? | ?column? | ?column?
+------+----------+----------+----------
+ 0 | 1 | 2 | 3
+(1 row)
+
+--
-- Also, some tests of IS DISTINCT FROM, which doesn't quite deserve its
-- very own regression file.
--