Lists: | pgsql-hackers |
---|
From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Exists pull-up application with JoinExpr |
Date: | 2024-12-24 04:44:38 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi, hackers!
I found one pull-up that works if the inner join condition is written
through the where condition,
|create temp table ta (id int primary key, val int); insert into ta
values(1,1); insert into ta values(2,2); ||insert into ta values(3,3);|
|create temp table tb (id int primary key, aval int); insert into tb
values(4,1); insert into tb values(5,1); insert into tb values(1,2);
create temp table tc (id int primary key, aid int); insert into tc
values(6,1); insert into tc values(7,2);|
|EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id);|
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id = ta.id)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
Planning:
Buffers: shared hit=67 read=12
(14 rows)
but it doesn't work if it is written through the outside condition.
|alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id); QUERY PLAN
------------------------------------------------------ Seq Scan on ta
(actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed by
Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual rows=0
loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual rows=0 loops=3)
Filter: (ta.id = id) Rows Removed by Filter: 3 Buffers: local hit=3 ->
Seq Scan on tc (actual rows=1 loops=1) Buffers: local hit=1 Planning:
Buffers: shared hit=16 read=9 (15 rows) |
|I have written a patch to add this functionality and now it gives an
query plan: |
|alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT *
FROM ta
WHERE EXISTS (SELECT *
FROM tb JOIN tc
ON ta.id = tb.id);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id = ta.id)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
(12 rows)|
tb and tc form a Cartesian product, but in the case of the intersection
condition with tuples from the table ta (ta.id = tb.id). So, according
to the join condition, tb intersects only with 1, and only it gets into
the result, but at the same time they appear twice - this is because of
the Cartesian product of tb with tc
|*How it works:*
|
I rewrote the code a bit so that it considers not only the quals in
jointree->quals, but also those in join expression
(subselect->jointree->fromlist). If they satisfy the conditions for
using pull up, I add them to the list of clauses and form a "Bool"
expression from them, joined by an "AND" operation.
--
Regards, Alena Rybakina Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
0001-Add-EXISTS-pull-up-if-subquery-join-expressions.patch | text/x-patch | 12.1 KB |
From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Exists pull-up application with JoinExpr |
Date: | 2024-12-24 10:25:37 |
Message-ID: | CAEudQAoD707uh5Pjpg5NMyF-QO=fzajA+BmtcoqQAeXN1C+TkQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Alena.
Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina <
a(dot)rybakina(at)postgrespro(dot)ru> escreveu:
> Hi, hackers!
>
> I found one pull-up that works if the inner join condition is written
> through the where condition,
>
> create temp table ta (id int primary key, val int);
> insert into ta values(1,1);
> insert into ta values(2,2);insert into ta values(3,3);
> create temp table tb (id int primary key, aval int);
> insert into tb values(4,1);
> insert into tb values(5,1);
> insert into tb values(1,2);
>
> create temp table tc (id int primary key, aid int);
> insert into tc values(6,1);
> insert into tc values(7,2);
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT *
> FROM ta
> WHERE EXISTS (SELECT *
> FROM tb, tc
> WHERE ta.id = tb.id);
> QUERY PLAN
> -------------------------------------------------------------------------
> Nested Loop Semi Join (actual rows=1 loops=1)
> Buffers: local hit=6
> -> Seq Scan on ta (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=5
> -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
> Index Cond: (id = ta.id)
> Heap Fetches: 1
> Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> Planning:
> Buffers: shared hit=67 read=12
> (14 rows)
>
> but it doesn't work if it is written through the outside condition.
>
> alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT *
> FROM ta
> WHERE EXISTS (SELECT *
> FROM tb JOIN tc
> ON ta.id = tb.id);
> QUERY PLAN
> ------------------------------------------------------
> Seq Scan on ta (actual rows=1 loops=1)
> Filter: EXISTS(SubPlan 1)
> Rows Removed by Filter: 2
> Buffers: local hit=5
> SubPlan 1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=4
> -> Seq Scan on tb (actual rows=0 loops=3)
> Filter: (ta.id = id)
> Rows Removed by Filter: 3
> Buffers: local hit=3
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> Planning:
> Buffers: shared hit=16 read=9
> (15 rows)
>
>
> I have written a patch to add this functionality and now it gives an query
> plan:
>
> alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT *
> FROM ta
> WHERE EXISTS (SELECT *
> FROM tb JOIN tc
> ON ta.id = tb.id);
> QUERY PLAN
> -------------------------------------------------------------------------
> Nested Loop Semi Join (actual rows=1 loops=1)
> Buffers: local hit=6
> -> Seq Scan on ta (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=5
> -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
> Index Cond: (id = ta.id)
> Heap Fetches: 1
> Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> (12 rows)
>
> tb and tc form a Cartesian product, but in the case of the intersection
> condition with tuples from the table ta (ta.id = tb.id). So, according to
> the join condition, tb intersects only with 1, and only it gets into the
> result, but at the same time they appear twice - this is because of the
> Cartesian product of tb with tc
> *How it works:*
>
> I rewrote the code a bit so that it considers not only the quals in
> jointree->quals, but also those in join expression
> (subselect->jointree->fromlist). If they satisfy the conditions for using
> pull up, I add them to the list of clauses and form a "Bool" expression
> from them, joined by an "AND" operation.
>
I took a look at this patch and I did a little polishing on it.
And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.
The tests are failing, like this:
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
-> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
-> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
-> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
Index Cond: (id = ta.id)
Heap Fetches: 2
+ Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)
best regards,
Ranier Vilela
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-EXISTS-pull-up-if-subquery-join-expressions.patch | application/octet-stream | 11.2 KB |
From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Exists pull-up application with JoinExpr |
Date: | 2024-12-25 11:20:42 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 24.12.2024 13:25, Ranier Vilela wrote:
> Hi Alena.
>
> Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> escreveu:
>
> Hi, hackers!
>
> I found one pull-up that works if the inner join condition is
> written through the where condition,
>
> |create temp table ta (id int primary key, val int); insert into
> ta values(1,1); insert into ta values(2,2); ||insert into ta values(3,3);|
>
> |create temp table tb (id int primary key, aval int); insert into
> tb values(4,1); insert into tb values(5,1); insert into tb
> values(1,2); create temp table tc (id int primary key, aid int);
> insert into tc values(6,1); insert into tc values(7,2);|
>
> |EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT *
> FROM ta WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id
> <http://ta.id> = tb.id <http://tb.id>);|
> QUERY PLAN
> -------------------------------------------------------------------------
> Nested Loop Semi Join (actual rows=1 loops=1)
> Buffers: local hit=6
> -> Seq Scan on ta (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=5
> -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
> Index Cond: (id =ta.id <http://ta.id>)
> Heap Fetches: 1
> Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> Planning:
> Buffers: shared hit=67 read=12
> (14 rows)
>
> but it doesn't work if it is written through the outside condition.
>
> |alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
> OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON
> ta.id <http://ta.id> = tb.id <http://tb.id>); QUERY PLAN
> ------------------------------------------------------ Seq Scan on
> ta (actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed
> by Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual
> rows=0 loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual
> rows=0 loops=3) Filter: (ta.id <http://ta.id> = id) Rows Removed
> by Filter: 3 Buffers: local hit=3 -> Seq Scan on tc (actual rows=1
> loops=1) Buffers: local hit=1 Planning: Buffers: shared hit=16
> read=9 (15 rows) |
>
> |I have written a patch to add this functionality and now it gives
> an query plan: |
>
> |alena(at)postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING
> OFF)
> SELECT *
> FROM ta
> WHERE EXISTS (SELECT *
> FROM tb JOIN tc
> ON ta.id <http://ta.id> = tb.id <http://tb.id>);
> QUERY PLAN
> -------------------------------------------------------------------------
> Nested Loop Semi Join (actual rows=1 loops=1)
> Buffers: local hit=6
> -> Seq Scan on ta (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=5
> -> Index Only Scan using tb_pkey on tb (actual rows=0
> loops=3)
> Index Cond: (id = ta.id <http://ta.id>)
> Heap Fetches: 1
> Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=1)
> Buffers: local hit=1
> (12 rows)|
>
> tb and tc form a Cartesian product, but in the case of the
> intersection condition with tuples from the table ta (ta.id
> <http://ta.id> = tb.id <http://tb.id>). So, according to the join
> condition, tb intersects only with 1, and only it gets into the
> result, but at the same time they appear twice - this is because
> of the Cartesian product of tb with tc
>
> |*How it works:*
> |
>
> I rewrote the code a bit so that it considers not only the quals
> in jointree->quals, but also those in join expression
> (subselect->jointree->fromlist). If they satisfy the conditions
> for using pull up, I add them to the list of clauses and form a
> "Bool" expression from them, joined by an "AND" operation.
>
> I took a look at this patch and I did a little polishing on it.
>
> And I believe that in testing, you need to set it to BUFFERS OFF,
> because of the recent change made to ANALYZE.
>
> The tests are failing, like this:
> QUERY PLAN
> -------------------------------------------------------------------------
> Nested Loop Semi Join (actual rows=2 loops=1)
> + Buffers: local hit=7
> -> Seq Scan on ta (actual rows=2 loops=1)
> + Buffers: local hit=1
> -> Nested Loop (actual rows=1 loops=2)
> + Buffers: local hit=6
> -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
> Index Cond: (id = ta.id <http://ta.id>)
> Heap Fetches: 2
> + Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=2)
> -(7 rows)
> + Buffers: local hit=2
> +(12 rows)
>
Yes, you are right) Thank you for your interest to this thread)
--
Regards,
Alena Rybakina
Postgres Professional
From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: Exists pull-up application with JoinExpr |
Date: | 2024-12-27 12:53:35 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Alena,
Thank you for your work on subqueries with JOIN.
Have you considered the scenario where in subquery includes a qual like
(tc.aid = 1)? When I tried executing those queries I receive different
results. In my opinion, to prevent this, we should add filters for such
quals within the loop 'foreach (lc, all_clauses)'
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
QUERY PLAN
----------------------------------------------------------------------
Hash Join (actual rows=1 loops=1)
Hash Cond: (ta.id = tb.id)
Buffers: local hit=3
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Hash (actual rows=3 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 33kB
Buffers: local hit=2
-> HashAggregate (actual rows=3 loops=1)
Group Key: tb.id
Batches: 1 Memory Usage: 121kB
Buffers: local hit=2
-> Nested Loop (actual rows=3 loops=1)
Buffers: local hit=2
-> Seq Scan on tb (actual rows=3 loops=1)
Buffers: local hit=1
-> Materialize (actual rows=1 loops=3)
Storage: Memory Maximum Storage: 17kB
Buffers: local hit=1
-> Seq Scan on tc (actual rows=1 loops=1)
Filter: (aid = 1)
Rows Removed by Filter: 1
Buffers: local hit=1
(23 rows)
============================
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id WHERE tc.aid = 1);
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on ta (actual rows=1 loops=1)
Filter: EXISTS(SubPlan 1)
Rows Removed by Filter: 2
Buffers: local hit=6
SubPlan 1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id = ta.id)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Filter: (aid = 1)
Buffers: local hit=1
(14 rows)
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: Exists pull-up application with JoinExpr |
Date: | 2024-12-30 08:24:25 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi! Thank you for your interest to this subject!
On 27.12.2024 15:53, Ilia Evdokimov wrote:
> Hi Alena,
>
> Thank you for your work on subqueries with JOIN.
>
> Have you considered the scenario where in subquery includes a qual
> like (tc.aid = 1)? When I tried executing those queries I receive
> different results. In my opinion, to prevent this, we should add
> filters for such quals within the loop 'foreach (lc, all_clauses)'
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
> WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
> QUERY PLAN
> ----------------------------------------------------------------------
> Hash Join (actual rows=1 loops=1)
> Hash Cond: (ta.id = tb.id)
> Buffers: local hit=3
> -> Seq Scan on ta (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Hash (actual rows=3 loops=1)
> Buckets: 4096 Batches: 1 Memory Usage: 33kB
> Buffers: local hit=2
> -> HashAggregate (actual rows=3 loops=1)
> Group Key: tb.id
> Batches: 1 Memory Usage: 121kB
> Buffers: local hit=2
> -> Nested Loop (actual rows=3 loops=1)
> Buffers: local hit=2
> -> Seq Scan on tb (actual rows=3 loops=1)
> Buffers: local hit=1
> -> Materialize (actual rows=1 loops=3)
> Storage: Memory Maximum Storage: 17kB
> Buffers: local hit=1
> -> Seq Scan on tc (actual rows=1 loops=1)
> Filter: (aid = 1)
> Rows Removed by Filter: 1
> Buffers: local hit=1
> (23 rows)
>
> ============================
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
> tb.id WHERE tc.aid = 1);
> QUERY PLAN
> ---------------------------------------------------------------------------
>
> Seq Scan on ta (actual rows=1 loops=1)
> Filter: EXISTS(SubPlan 1)
> Rows Removed by Filter: 2
> Buffers: local hit=6
> SubPlan 1
> -> Nested Loop (actual rows=0 loops=3)
> Buffers: local hit=5
> -> Index Only Scan using tb_pkey on tb (actual rows=0
> loops=3)
> Index Cond: (id = ta.id)
> Heap Fetches: 1
> Buffers: local hit=4
> -> Seq Scan on tc (actual rows=1 loops=1)
> Filter: (aid = 1)
> Buffers: local hit=1
> (14 rows)
>
>
You are right, at the moment the code is not processed if there is a
constant qual in the subquery (like t1.x1=1 in the example below) and
this problem is not only related to the current patch.
For example you can get such a query plan if you complete this request
to the master:
create table t (xint);
create table t1 (x1int);
create table t2 (x2int);
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT 1
FROM t
WHERE EXISTS (SELECT 1
FROM t1
where t1.x1 = 1);
QUERY PLAN
------------------------------------------------
Result (actual rows=0 loops=1)
One-Time Filter: (InitPlan 1).col1
InitPlan 1
-> Seq Scan on t1 (actual rows=0 loops=1)
Filter: (x1 = 1)
-> Seq Scan on t (never executed)
(6 rows)
It's all because of the check in this function - this qual has levelsoup
= 0, not 1 (see (!contain_vars_of_level(whereClause, 1)), but I already
found out that by changing this, the logic of correction there is
required a little more complicated. At the moment, I'm working to add
this processing to the patch.
Thanks for the case!
--
Regards,
Alena Rybakina
Postgres Professional
From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: Exists pull-up application with JoinExpr |
Date: | 2025-01-13 17:19:42 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi! I have solved it.
On 30.12.2024 11:24, Alena Rybakina wrote:
>
> Hi! Thank you for your interest to this subject!
>
> On 27.12.2024 15:53, Ilia Evdokimov wrote:
>> Hi Alena,
>>
>> Thank you for your work on subqueries with JOIN.
>>
>> Have you considered the scenario where in subquery includes a qual
>> like (tc.aid = 1)? When I tried executing those queries I receive
>> different results. In my opinion, to prevent this, we should add
>> filters for such quals within the loop 'foreach (lc, all_clauses)'
>>
>> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
>> WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
>> QUERY PLAN
>> ----------------------------------------------------------------------
>> Hash Join (actual rows=1 loops=1)
>> Hash Cond: (ta.id = tb.id)
>> Buffers: local hit=3
>> -> Seq Scan on ta (actual rows=3 loops=1)
>> Buffers: local hit=1
>> -> Hash (actual rows=3 loops=1)
>> Buckets: 4096 Batches: 1 Memory Usage: 33kB
>> Buffers: local hit=2
>> -> HashAggregate (actual rows=3 loops=1)
>> Group Key: tb.id
>> Batches: 1 Memory Usage: 121kB
>> Buffers: local hit=2
>> -> Nested Loop (actual rows=3 loops=1)
>> Buffers: local hit=2
>> -> Seq Scan on tb (actual rows=3 loops=1)
>> Buffers: local hit=1
>> -> Materialize (actual rows=1 loops=3)
>> Storage: Memory Maximum Storage: 17kB
>> Buffers: local hit=1
>> -> Seq Scan on tc (actual rows=1 loops=1)
>> Filter: (aid = 1)
>> Rows Removed by Filter: 1
>> Buffers: local hit=1
>> (23 rows)
>>
>> ============================
>>
>> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
>> SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
>> tb.id WHERE tc.aid = 1);
>> QUERY PLAN
>> ---------------------------------------------------------------------------
>>
>> Seq Scan on ta (actual rows=1 loops=1)
>> Filter: EXISTS(SubPlan 1)
>> Rows Removed by Filter: 2
>> Buffers: local hit=6
>> SubPlan 1
>> -> Nested Loop (actual rows=0 loops=3)
>> Buffers: local hit=5
>> -> Index Only Scan using tb_pkey on tb (actual rows=0
>> loops=3)
>> Index Cond: (id = ta.id)
>> Heap Fetches: 1
>> Buffers: local hit=4
>> -> Seq Scan on tc (actual rows=1 loops=1)
>> Filter: (aid = 1)
>> Buffers: local hit=1
>> (14 rows)
>>
>>
> You are right, at the moment the code is not processed if there is a
> constant qual in the subquery (like t1.x1=1 in the example below) and
> this problem is not only related to the current patch.
>
> For example you can get such a query plan if you complete this request
> to the master:
>
> create table t (xint);
> create table t1 (x1int);
> create table t2 (x2int);
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> SELECT 1
> FROM t
> WHERE EXISTS (SELECT 1
> FROM t1
> where t1.x1 = 1);
> QUERY PLAN
> ------------------------------------------------
> Result (actual rows=0 loops=1)
> One-Time Filter: (InitPlan 1).col1
> InitPlan 1
> -> Seq Scan on t1 (actual rows=0 loops=1)
> Filter: (x1 = 1)
> -> Seq Scan on t (never executed)
> (6 rows)
>
> It's all because of the check in this function - this qual has
> levelsoup = 0, not 1 (see (!contain_vars_of_level(whereClause, 1)),
> but I already found out that by changing this, the logic of correction
> there is required a little more complicated. At the moment, I'm
> working to add this processing to the patch.
>
> Thanks for the case!
>
The logic is the same, but extended to constants. I added a few more
tests that not only cover this case, but also NOT EXISTS, which will be
converted to ANTI JOIN.
--
Regards,
Alena Rybakina
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Teach-the-planner-to-convert-EXISTS-and-NOT-EXISTS-s.patch | text/x-patch | 18.7 KB |
From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Subject: | Re: pull-up subquery if JOIN-ON contains refs to upper-query |
Date: | 2025-02-09 15:14:09 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi! I found another example where the transformation worked incorrectly
and reconsidered the idea.
As for conversion of exists_sublink_to_ANY, we need to get the flattened
implicit-AND list of clauses and pull out the chunks of the WHERE clause
that belong to the parent query,
since we are called halfway through the parent's preprocess_expression()
and earlier steps of preprocess_expression() wouldn't get applied to the
pulled-up stuff unless we do them here.
We also do some processing for vars depending on which side the var is
on - if it's in a subquery, we only need to lower its level (varlevel)
because subquery will be flatted, while
for other vars that belong to the parent query, we need to do
preparation to pull up the sub-select into top range table.
For those expressions that we couldn't assign to either list, we define
newWhere and apply both cases.
--
Regards,
Alena Rybakina
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Add-appling-the-pull-up-transformation-for-expressio.patch | text/x-patch | 28.2 KB |
From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: pull-up subquery if JOIN-ON contains refs to upper-query |
Date: | 2025-02-10 20:51:29 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 09.02.2025 18:14, Alena Rybakina wrote:
> Hi! I found another example where the transformation worked
> incorrectly and reconsidered the idea.
>
> As for conversion of exists_sublink_to_ANY, we need to get the
> flattened implicit-AND list of clauses and pull out the chunks of the
> WHERE clause that belong to the parent query,
> since we are called halfway through the parent's
> preprocess_expression() and earlier steps of preprocess_expression()
> wouldn't get applied to the pulled-up stuff unless we do them here.
> We also do some processing for vars depending on which side the var is
> on - if it's in a subquery, we only need to lower its level (varlevel)
> because subquery will be flatted, while
> for other vars that belong to the parent query, we need to do
> preparation to pull up the sub-select into top range table.
>
> For those expressions that we couldn't assign to either list, we
> define newWhere and apply both cases.
>
When I run 'make -C contrib/ check', tests of postgres_fdw extension
failed. I might be wrong, but you should be careful with LIMIT.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
Attachment | Content-Type | Size |
---|---|---|
regression.diffs | text/plain | 5.0 KB |
From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: pull-up subquery if JOIN-ON contains refs to upper-query |
Date: | 2025-02-11 15:59:55 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 10.02.2025 23:51, Ilia Evdokimov wrote:
>
> On 09.02.2025 18:14, Alena Rybakina wrote:
>> Hi! I found another example where the transformation worked
>> incorrectly and reconsidered the idea.
>>
>> As for conversion of exists_sublink_to_ANY, we need to get the
>> flattened implicit-AND list of clauses and pull out the chunks of the
>> WHERE clause that belong to the parent query,
>> since we are called halfway through the parent's
>> preprocess_expression() and earlier steps of preprocess_expression()
>> wouldn't get applied to the pulled-up stuff unless we do them here.
>> We also do some processing for vars depending on which side the var
>> is on - if it's in a subquery, we only need to lower its level
>> (varlevel) because subquery will be flatted, while
>> for other vars that belong to the parent query, we need to do
>> preparation to pull up the sub-select into top range table.
>>
>> For those expressions that we couldn't assign to either list, we
>> define newWhere and apply both cases.
>>
>
> When I run 'make -C contrib/ check', tests of postgres_fdw extension
> failed. I might be wrong, but you should be careful with LIMIT.
>
Thank you for the review, I'm working on it.
--
Regards,
Alena Rybakina
Postgres Professional
From: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Petrov <p(dot)petrov(at)postgrespro(dot)ru>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: pull-up subquery if JOIN-ON contains refs to upper-query |
Date: | 2025-04-02 16:39:58 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi!
My colleague reviewed my patch and gave feedback on how to improve it -
for some queries with data types that I did not consider, pull-up is not
applied, although it should. Some of them:
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
SELECT 1
FROM ta
WHERE EXISTS (SELECT 1
FROM tb
JOIN tc
ON ta.id = tb.id
AND tb.id = ANY('{1}'::int[])
);
QUERY PLAN
--------------------------------------------------------------------------
Seq Scan on ta (actual rows=1.00 loops=1)
Filter: EXISTS(SubPlan 1)
Rows Removed by Filter: 1
SubPlan 1
-> Nested Loop (actual rows=0.50 loops=2)
-> Seq Scan on tb (actual rows=0.50 loops=2)
Filter: ((id = ANY ('{1}'::integer[])) AND (ta.id = id))
Rows Removed by Filter: 2
-> Seq Scan on tc (actual rows=1.00 loops=1)
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
SELECT 1
FROM ta
WHERE EXISTS (SELECT 1
FROM tb
JOIN tc
ON ta.id = tb.id
AND tb.is_active
);
QUERY PLAN
---------------------------------------------------------
Seq Scan on ta (actual rows=2.00 loops=1)
Filter: EXISTS(SubPlan 1)
SubPlan 1
-> Nested Loop (actual rows=1.00 loops=2)
-> Seq Scan on tb (actual rows=1.00 loops=2)
Filter: (is_active AND (ta.id = id))
Rows Removed by Filter: 0
-> Seq Scan on tc (actual rows=1.00 loops=2)
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
SELECT 1
FROM ta
WHERE EXISTS (SELECT 1
FROM tb
JOIN tc
ON ta.id = tb.id
AND tb.is_active IS NOT NULL
);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on ta (actual rows=2.00 loops=1)
Filter: EXISTS(SubPlan 1)
SubPlan 1
-> Nested Loop (actual rows=1.00 loops=2)
-> Seq Scan on tb (actual rows=1.00 loops=2)
Filter: ((is_active IS NOT NULL) AND (ta.id = id))
Rows Removed by Filter: 0
-> Seq Scan on tc (actual rows=1.00 loops=2)
UPDATE tb SET is_active = NULL WHERE id = 2;
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
SELECT 1
FROM ta
WHERE EXISTS (SELECT 1
FROM tb
JOIN tc
ON ta.id = tb.id
AND tb.is_active IS NULL
);
QUERY PLAN
----------------------------------------------------------------
Seq Scan on ta (actual rows=1.00 loops=1)
Filter: EXISTS(SubPlan 1)
Rows Removed by Filter: 1
SubPlan 1
-> Nested Loop (actual rows=0.50 loops=2)
-> Seq Scan on tb (actual rows=0.50 loops=2)
Filter: ((is_active IS NULL) AND (ta.id = id))
Rows Removed by Filter: 4
-> Seq Scan on tc (actual rows=1.00 loops=1)
I see that I need to add a walker that, when traversing the tree,
determines whether there are conditions under which pull-up is
impossible - the presence of
volatility of functions and other restrictions, and leave the
transformation for the var objects that I added before, I described it
here.
Unfortunately, I need a few days to implement this and need time for a
review, and I think I will not have time to do this before the code
freeze, so
I am moving this to the next commitfest and not changing the status
"awaiting the author".
On 11.02.2025 18:59, Alena Rybakina wrote:
> On 10.02.2025 23:51, Ilia Evdokimov wrote:
>>
>> On 09.02.2025 18:14, Alena Rybakina wrote:
>>> Hi! I found another example where the transformation worked
>>> incorrectly and reconsidered the idea.
>>>
>>> As for conversion of exists_sublink_to_ANY, we need to get the
>>> flattened implicit-AND list of clauses and pull out the chunks of
>>> the WHERE clause that belong to the parent query,
>>> since we are called halfway through the parent's
>>> preprocess_expression() and earlier steps of preprocess_expression()
>>> wouldn't get applied to the pulled-up stuff unless we do them here.
>>> We also do some processing for vars depending on which side the var
>>> is on - if it's in a subquery, we only need to lower its level
>>> (varlevel) because subquery will be flatted, while
>>> for other vars that belong to the parent query, we need to do
>>> preparation to pull up the sub-select into top range table.
>>>
>>> For those expressions that we couldn't assign to either list, we
>>> define newWhere and apply both cases.
>>>
>>
>> When I run 'make -C contrib/ check', tests of postgres_fdw extension
>> failed. I might be wrong, but you should be careful with LIMIT.
>>
> Thank you for the review, I'm working on it.
>
Sorry for not responding, but I will fix this bug after I update the
code based on the comments above. Thank you for noticing and writing to
me, your feedback is very important.
--
Regards,
Alena Rybakina
Postgres Professional
From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Petrov <p(dot)petrov(at)postgrespro(dot)ru> |
Cc: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Subject: | Re: pull-up subquery if JOIN-ON contains refs to upper-query |
Date: | 2025-04-03 13:56:52 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 02.04.2025 19:39, Alena Rybakina wrote:
>
> I see that I need to add a walker that, when traversing the tree,
> determines whether there are conditions under which pull-up is
> impossible - the presence of
> volatility of functions and other restrictions, and leave the
> transformation for the var objects that I added before, I described it
> here.
>
I have some concerns about pulling up every clause from the subquery
with one column. In particular, not every clause is safe or beneficial
to pull up: OR-clauses, CASE expressions, nested sublinks could
significantly change how the planner estimates the number of rows or
applies filters, especially when they are not true join predicates.
Pulling them up might lead to worse plans, or even change the semantics
in subtle ways. I think before applying such transformations, we should
make sure they are not only safe but actually improve the resulting plan.