Partitioning and postgres_fdw optimisations for multi-tenancy

Lists: pgsql-hackers
From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-13 19:18:00
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Hackers,

The idea of achieving Postgres scaling via sharding using postgres_fdw +
partitioning got a lot of attention last years. Many optimisations have
been done in this direction: partition pruning, partition-wise
aggregates / joins, postgres_fdw push-down of LIMIT, GROUP BY, etc. In
many cases they work really nice.

However, still there is a vast case, where postgres_fdw + native
partitioning doesn't perform so good — Multi-tenant architecture. From
the database perspective it is presented well in this Citus tutorial
[1]. The main idea is that there is a number of tables and all of them
are sharded / partitioned by the same key, e.g. company_id. That way, if
every company mostly works within its own data, then every query may be
effectively executed on a single node without a need for an internode
communication.

I built a simple two node multi-tenant schema for tests, which can be
easily set up with attached scripts. It creates three tables (companies,
users, documents) distributed over two nodes. Everything can be found in
this Gist [2] as well.

Some real-life test queries show, that all single-node queries aren't
pushed-down to the required node. For example:

SELECT
*
FROM
documents
INNER JOIN users ON documents.user_id = users.id
WHERE
documents.company_id = 5
AND users.company_id = 5;

executed as following

QUERY PLAN
-------------------------------------------------------
Nested Loop
Join Filter: (documents.user_id = users.id)
-> Foreign Scan on users_node2 users
-> Materialize
-> Foreign Scan on documents_node2 documents

i.e. it uses two foreign scans and does the final join locally. However,
once I specify target partitions explicitly, then the entire query is
pushed down to the foreign node:

QUERY PLAN
---------------------------------------------------------
Foreign Scan
Relations: (documents_node2) INNER JOIN (users_node2)

Execution time is dropped significantly as well — by more than 3 times
even for this small test database. Situation for simple queries with
aggregates or joins and aggregates followed by the sharding key filter
is the same. Something similar was briefly discussed in this thread [3].

IIUC, it means that push-down of queries through the postgres_fdw works
perfectly well, the problem is with partition-wise operation detection
at the planning time. Currently, partition-wise aggregate routines,
e.g., looks for a GROUP BY and checks whether sharding key exists there
or not. After that PARTITIONWISE_AGGREGATE_* flag is set. However, it
doesn't look for a content of WHERE clause, so frankly speaking it isn't
a problem, this functionality is not yet implemented.

Actually, sometimes I was able to push down queries with aggregate
simply by adding an additional GROUP BY with sharding key, like this:

SELECT
count(*)
FROM
documents
WHERE
company_id = 5
GROUP BY company_id;

where this GROUP BY obviously doesn't change a results, it just allows
planner to choose from more possible paths.

Also, I have tried to hack it a bit and forcedly set
PARTITIONWISE_AGGREGATE_FULL for this particular query. Everything
executed fine and returned result was correct, which means that all
underlying machinery is ready.

That way, I propose a change to the planner, which will check whether
partitioning key exist in the WHERE clause and will set
PARTITIONWISE_AGGREGATE_* flags if appropriate. The whole logic may look
like:

1. If the only one condition by partitioning key is used (like above),
then it is PARTITIONWISE_AGGREGATE_FULL.
2. If several conditions are used, then it should be
PARTITIONWISE_AGGREGATE_PARTIAL.

I'm aware that WHERE clause may be extremely complex in general, but we
could narrow this possible optimisation to the same restrictions as
postgres_fdw push-down "only WHERE clauses using built-in operators and
functions will be considered for execution on the remote server".

Although it seems that it will be easier to start with aggregates,
probably we should initially plan a more general solution? For example,
check that all involved tables are filtered by partitioning key and push
down the entire query if all of them target the same foreign server.

Any thoughts?

[1]
https://docs.citusdata.com/en/v9.3/get_started/tutorial_multi_tenant.html
[2] https://gist.github.com/ololobus/8fba33241f68be2e3765d27bf04882a3
[3]
https://www.postgresql.org/message-id/flat/CAFT%2BaqL1Tt0qfYqjHH%2BshwPoW8qdFjpJ8vBR5ABoXJDUcHyN1w%40mail.gmail.com

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
init1.sql text/plain 1.9 KB
init2.sql text/plain 1.9 KB
load.sql text/plain 535 bytes
queries.sql text/plain 2.1 KB
setup.sh text/plain 825 bytes

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-14 12:27:49
Message-ID: CAExHW5tBReBcLYtgX0CeTcu61vSXCwBzKC+TozmpPFkF+F_kVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
<a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>
> Hi Hackers,
>
> The idea of achieving Postgres scaling via sharding using postgres_fdw +
> partitioning got a lot of attention last years. Many optimisations have
> been done in this direction: partition pruning, partition-wise
> aggregates / joins, postgres_fdw push-down of LIMIT, GROUP BY, etc. In
> many cases they work really nice.
>
> However, still there is a vast case, where postgres_fdw + native
> partitioning doesn't perform so good — Multi-tenant architecture. From
> the database perspective it is presented well in this Citus tutorial
> [1]. The main idea is that there is a number of tables and all of them
> are sharded / partitioned by the same key, e.g. company_id. That way, if
> every company mostly works within its own data, then every query may be
> effectively executed on a single node without a need for an internode
> communication.
>
> I built a simple two node multi-tenant schema for tests, which can be
> easily set up with attached scripts. It creates three tables (companies,
> users, documents) distributed over two nodes. Everything can be found in
> this Gist [2] as well.
>
> Some real-life test queries show, that all single-node queries aren't
> pushed-down to the required node. For example:
>
> SELECT
> *
> FROM
> documents
> INNER JOIN users ON documents.user_id = users.id
> WHERE
> documents.company_id = 5
> AND users.company_id = 5;

There are a couple of things happening here
1. the clauses on company_id in WHERE clause are causing partition
pruning. Partition-wise join is disabled with partition pruning before
PG13. In PG13 we have added advanced partition matching algorithm
which will allow partition-wise join with partition pruning.
2. the query has no equality condition on the partition key of the
tables being joined. Partitionwise join is possible only when there's
an equality condition on the partition keys (company_id) of the
joining tables. PostgreSQL's optimizer is not smart enough to convert
the equality conditions in WHERE clause into equality conditions on
partition keys. So having those conditions just in WHERE clause does
not help. Instead please add equality conditions on partition keys in
JOIN .. ON clause or WHERE clause (only for INNER join).

>
> executed as following
>
> QUERY PLAN
> -------------------------------------------------------
> Nested Loop
> Join Filter: (documents.user_id = users.id)
> -> Foreign Scan on users_node2 users
> -> Materialize
> -> Foreign Scan on documents_node2 documents
>
> i.e. it uses two foreign scans and does the final join locally. However,
> once I specify target partitions explicitly, then the entire query is
> pushed down to the foreign node:
>
> QUERY PLAN
> ---------------------------------------------------------
> Foreign Scan
> Relations: (documents_node2) INNER JOIN (users_node2)
>
> Execution time is dropped significantly as well — by more than 3 times
> even for this small test database. Situation for simple queries with
> aggregates or joins and aggregates followed by the sharding key filter
> is the same. Something similar was briefly discussed in this thread [3].
>
> IIUC, it means that push-down of queries through the postgres_fdw works
> perfectly well, the problem is with partition-wise operation detection
> at the planning time. Currently, partition-wise aggregate routines,
> e.g., looks for a GROUP BY and checks whether sharding key exists there
> or not. After that PARTITIONWISE_AGGREGATE_* flag is set. However, it
> doesn't look for a content of WHERE clause, so frankly speaking it isn't
> a problem, this functionality is not yet implemented.
>
> Actually, sometimes I was able to push down queries with aggregate
> simply by adding an additional GROUP BY with sharding key, like this:
>
> SELECT
> count(*)
> FROM
> documents
> WHERE
> company_id = 5
> GROUP BY company_id;

This gets pushed down since GROUP BY clause is on the partition key.

>
> where this GROUP BY obviously doesn't change a results, it just allows
> planner to choose from more possible paths.
>
> Also, I have tried to hack it a bit and forcedly set
> PARTITIONWISE_AGGREGATE_FULL for this particular query. Everything
> executed fine and returned result was correct, which means that all
> underlying machinery is ready.
>
> That way, I propose a change to the planner, which will check whether
> partitioning key exist in the WHERE clause and will set
> PARTITIONWISE_AGGREGATE_* flags if appropriate. The whole logic may look
> like:
>
> 1. If the only one condition by partitioning key is used (like above),
> then it is PARTITIONWISE_AGGREGATE_FULL.
> 2. If several conditions are used, then it should be
> PARTITIONWISE_AGGREGATE_PARTIAL.
>
> I'm aware that WHERE clause may be extremely complex in general, but we
> could narrow this possible optimisation to the same restrictions as
> postgres_fdw push-down "only WHERE clauses using built-in operators and
> functions will be considered for execution on the remote server".
>
> Although it seems that it will be easier to start with aggregates,
> probably we should initially plan a more general solution? For example,
> check that all involved tables are filtered by partitioning key and push
> down the entire query if all of them target the same foreign server.
>
> Any thoughts?

I think adding just equality conditions on the partition key will be
enough. No need for any code change.

--
Best Wishes,
Ashutosh Bapat


From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-14 15:12:09
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020-07-14 15:27, Ashutosh Bapat wrote:
> On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>> I built a simple two node multi-tenant schema for tests, which can be
>> easily set up with attached scripts. It creates three tables
>> (companies,
>> users, documents) distributed over two nodes. Everything can be found
>> in
>> this Gist [2] as well.
>>
>> Some real-life test queries show, that all single-node queries aren't
>> pushed-down to the required node. For example:
>>
>> SELECT
>> *
>> FROM
>> documents
>> INNER JOIN users ON documents.user_id = users.id
>> WHERE
>> documents.company_id = 5
>> AND users.company_id = 5;
>
> There are a couple of things happening here
> 1. the clauses on company_id in WHERE clause are causing partition
> pruning. Partition-wise join is disabled with partition pruning before
> PG13. In PG13 we have added advanced partition matching algorithm
> which will allow partition-wise join with partition pruning.
>

I forgot to mention that I use a recent master (991c444e7a) for tests
with

enable_partitionwise_join = 'on'
enable_partitionwise_aggregate = 'on'

of course. I've also tried postgres_fdw.use_remote_estimate = true
followed by ANALYSE on both nodes (it is still used in setup.sh script).

BTW, can you, please, share a link to commit / thread about allowing
partition-wise join and partition pruning to work together in PG13?

>
> 2. the query has no equality condition on the partition key of the
> tables being joined. Partitionwise join is possible only when there's
> an equality condition on the partition keys (company_id) of the
> joining tables. PostgreSQL's optimizer is not smart enough to convert
> the equality conditions in WHERE clause into equality conditions on
> partition keys. So having those conditions just in WHERE clause does
> not help. Instead please add equality conditions on partition keys in
> JOIN .. ON clause or WHERE clause (only for INNER join).
>

With adding documents.company_id = users.company_id

SELECT *
FROM
documents
INNER JOIN users ON (documents.company_id = users.company_id
AND documents.user_id = users.id)
WHERE
documents.company_id = 5
AND users.company_id = 5;

query plan remains the same.

>>
>> executed as following
>>
>> QUERY PLAN
>> -------------------------------------------------------
>> Nested Loop
>> Join Filter: (documents.user_id = users.id)
>> -> Foreign Scan on users_node2 users
>> -> Materialize
>> -> Foreign Scan on documents_node2 documents
>>
>> i.e. it uses two foreign scans and does the final join locally.
>> However,
>> once I specify target partitions explicitly, then the entire query is
>> pushed down to the foreign node:
>>
>> QUERY PLAN
>> ---------------------------------------------------------
>> Foreign Scan
>> Relations: (documents_node2) INNER JOIN (users_node2)
>>
>> Execution time is dropped significantly as well — by more than 3 times
>> even for this small test database. Situation for simple queries with
>> aggregates or joins and aggregates followed by the sharding key filter
>> is the same. Something similar was briefly discussed in this thread
>> [3].
>>
>> IIUC, it means that push-down of queries through the postgres_fdw
>> works
>> perfectly well, the problem is with partition-wise operation detection
>> at the planning time. Currently, partition-wise aggregate routines,
>> e.g., looks for a GROUP BY and checks whether sharding key exists
>> there
>> or not. After that PARTITIONWISE_AGGREGATE_* flag is set. However, it
>> doesn't look for a content of WHERE clause, so frankly speaking it
>> isn't
>> a problem, this functionality is not yet implemented.
>>
>> Actually, sometimes I was able to push down queries with aggregate
>> simply by adding an additional GROUP BY with sharding key, like this:
>>
>> SELECT
>> count(*)
>> FROM
>> documents
>> WHERE
>> company_id = 5
>> GROUP BY company_id;
>
> This gets pushed down since GROUP BY clause is on the partition key.
>

Sure, but it only works *sometimes*, I've never seen most of such simple
queries with aggregates to be pushed down, e.g.:

SELECT
sum(id)
FROM
documents_node2
WHERE
company_id = 5
GROUP BY
company_id;

whether 'GROUP BY company_id' is used or not.

>>
>> Although it seems that it will be easier to start with aggregates,
>> probably we should initially plan a more general solution? For
>> example,
>> check that all involved tables are filtered by partitioning key and
>> push
>> down the entire query if all of them target the same foreign server.
>>
>> Any thoughts?
>
> I think adding just equality conditions on the partition key will be
> enough. No need for any code change.

So, it hasn't helped. Maybe I could modify some costs to verify that
push-down of such joins is ever possible?

Anyway, what about aggregates? Partition-wise aggregates work fine for
queries like

SELECT
count(*)
FROM
documents
GROUP BY
company_id;

but once I narrow it to a single partition with 'WHERE company_id = 5',
then it is being executed in a very inefficient way — takes all rows
from remote partition / node and performs aggregate locally. It doesn't
seem like a problem with query itself.

In my experience, both partition-wise joins and aggregates work well
with simple GROUP or JOIN by the partitioning key, which corresponds to
massive multi-partition OLAP queries. However, both stop working for a
single-partition queries with WHERE, when postgres_fdw and partitioning
are used. I'd be glad if you share any new guesses of how to make them
working without code modification.

Thanks
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-15 12:02:09
Message-ID: CAPmGK163KgoTn5GcqYArh7W6-3Tk+KznrZOu0vdV0L-5yEd4vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 15, 2020 at 12:12 AM Alexey Kondratov
<a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> On 2020-07-14 15:27, Ashutosh Bapat wrote:
> > On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
> > <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> >> Some real-life test queries show, that all single-node queries aren't
> >> pushed-down to the required node. For example:
> >>
> >> SELECT
> >> *
> >> FROM
> >> documents
> >> INNER JOIN users ON documents.user_id = users.id
> >> WHERE
> >> documents.company_id = 5
> >> AND users.company_id = 5;
> >
> > There are a couple of things happening here
> > 1. the clauses on company_id in WHERE clause are causing partition
> > pruning. Partition-wise join is disabled with partition pruning before
> > PG13.

More precisely, PWJ cannot be applied when there are no matched
partitions on the nullable side due to partition pruning before PG13.
But the join is an inner join, so I think PWJ can still be applied for
the join.

> > In PG13 we have added advanced partition matching algorithm
> > which will allow partition-wise join with partition pruning.

> BTW, can you, please, share a link to commit / thread about allowing
> partition-wise join and partition pruning to work together in PG13?

I think the link would be this:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c8434d64ce03c32e0029417a82ae937f2055268f

Unfortunately, advanced PWJ added by the commit only allows PWJ and
partition pruning to work together for list/range partitioned tables,
not for hash partitioned tables. However, I think the commit would
have nothing to do with the issue here, because 1) the tables involved
in the join have the same partition bounds, and 2) the commit doesn't
change the behavior of such a join.

Best regards,
Etsuro Fujita


From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-16 04:55:48
Message-ID: CAPmGK15dTPxjL___pnxVfkMNfVTf91nKdHAkVVHu7KJey8OBWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 15, 2020 at 9:02 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> On Wed, Jul 15, 2020 at 12:12 AM Alexey Kondratov
> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> > On 2020-07-14 15:27, Ashutosh Bapat wrote:
> > > On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
> > > <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> > >> Some real-life test queries show, that all single-node queries aren't
> > >> pushed-down to the required node. For example:
> > >>
> > >> SELECT
> > >> *
> > >> FROM
> > >> documents
> > >> INNER JOIN users ON documents.user_id = users.id
> > >> WHERE
> > >> documents.company_id = 5
> > >> AND users.company_id = 5;
> > >
> > > There are a couple of things happening here
> > > 1. the clauses on company_id in WHERE clause are causing partition
> > > pruning. Partition-wise join is disabled with partition pruning before
> > > PG13.
>
> More precisely, PWJ cannot be applied when there are no matched
> partitions on the nullable side due to partition pruning before PG13.

On reflection, I think I was wrong: the limitation applies to PG13,
even with advanced PWJ.

> But the join is an inner join, so I think PWJ can still be applied for
> the join.

I think I was wrong in this point as well :-(. PWJ cannot be applied
to the join due to the limitation of the PWJ matching logic. See the
discussion started in [1]. I think the patch in [2] would address
this issue as well, though the patch is under review.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk%2BihstpKEt3a1LT6X78A%40mail.gmail.com
[2] https://www.postgresql.org/message-id/[email protected]


From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-16 11:56:54
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/16/20 9:55 AM, Etsuro Fujita wrote:
> On Wed, Jul 15, 2020 at 9:02 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
>> On Wed, Jul 15, 2020 at 12:12 AM Alexey Kondratov
>> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>>> On 2020-07-14 15:27, Ashutosh Bapat wrote:
>>>> On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
>>>> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>>>>> Some real-life test queries show, that all single-node queries aren't
>>>>> pushed-down to the required node. For example:
>>>>>
>>>>> SELECT
>>>>> *
>>>>> FROM
>>>>> documents
>>>>> INNER JOIN users ON documents.user_id = users.id
>>>>> WHERE
>>>>> documents.company_id = 5
>>>>> AND users.company_id = 5;
>>>>
>>>> There are a couple of things happening here
>>>> 1. the clauses on company_id in WHERE clause are causing partition
>>>> pruning. Partition-wise join is disabled with partition pruning before
>>>> PG13.
>>
>> More precisely, PWJ cannot be applied when there are no matched
>> partitions on the nullable side due to partition pruning before PG13.
>
> On reflection, I think I was wrong: the limitation applies to PG13,
> even with advanced PWJ.
>
>> But the join is an inner join, so I think PWJ can still be applied for
>> the join.
>
> I think I was wrong in this point as well :-(. PWJ cannot be applied
> to the join due to the limitation of the PWJ matching logic. See the
> discussion started in [1]. I think the patch in [2] would address
> this issue as well, though the patch is under review.
>

I think, discussion [1] is little relevant to the current task. Here we
join not on partition attribute and PWJ can't be used at all. Here we
can use push-down join of two foreign relations.
We can analyze baserestrictinfo's of outer and inner RelOptInfo's and
may detect that only one partition from outer and inner need to be joined.
Next, we will create joinrel from RelOptInfo's of these partitions and
replace joinrel of partitioned tables. But it is only rough outline of a
possible solution...

>
> [1] https://www.postgresql.org/message-id/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk%2BihstpKEt3a1LT6X78A%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/[email protected]
>
>

--
regards,
Andrey Lepikhov
Postgres Professional


From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-16 16:35:11
Message-ID: CAPmGK15EoA_yHycO9bXgNRze0WbiSLKjz_5jd0Xwg4PVycHmzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 16, 2020 at 8:56 PM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 7/16/20 9:55 AM, Etsuro Fujita wrote:

> >>>> On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
> >>>> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> >>>>> Some real-life test queries show, that all single-node queries aren't
> >>>>> pushed-down to the required node. For example:
> >>>>>
> >>>>> SELECT
> >>>>> *
> >>>>> FROM
> >>>>> documents
> >>>>> INNER JOIN users ON documents.user_id = users.id
> >>>>> WHERE
> >>>>> documents.company_id = 5
> >>>>> AND users.company_id = 5;

> > PWJ cannot be applied
> > to the join due to the limitation of the PWJ matching logic. See the
> > discussion started in [1]. I think the patch in [2] would address
> > this issue as well, though the patch is under review.

> I think, discussion [1] is little relevant to the current task. Here we
> join not on partition attribute and PWJ can't be used at all.

The main point of the discussion is to determine whether PWJ can be
used for a join between partitioned tables, based on
EquivalenceClasses, not just join clauses created by
build_joinrel_restrictlist(). For the above join, for example, the
patch in [2] would derive a join clause "documents.company_id =
users.company_id" from an EquivalenceClass that recorded the knowledge
"documents.company_id = 5" and "users.company_id = 5", and then the
planner would consider from it that PWJ can be used for the join.

Best regards,
Etsuro Fujita


From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-16 16:40:05
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020-07-16 14:56, Andrey Lepikhov wrote:
> On 7/16/20 9:55 AM, Etsuro Fujita wrote:
>> On Wed, Jul 15, 2020 at 9:02 PM Etsuro Fujita
>> <etsuro(dot)fujita(at)gmail(dot)com> wrote:
>>> On Wed, Jul 15, 2020 at 12:12 AM Alexey Kondratov
>>> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>>>> On 2020-07-14 15:27, Ashutosh Bapat wrote:
>>>>> On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
>>>>> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>>>>>> Some real-life test queries show, that all single-node queries
>>>>>> aren't
>>>>>> pushed-down to the required node. For example:
>>>>>>
>>>>>> SELECT
>>>>>> *
>>>>>> FROM
>>>>>> documents
>>>>>> INNER JOIN users ON documents.user_id = users.id
>>>>>> WHERE
>>>>>> documents.company_id = 5
>>>>>> AND users.company_id = 5;
>>>>>
>>>>> There are a couple of things happening here
>>>>> 1. the clauses on company_id in WHERE clause are causing partition
>>>>> pruning. Partition-wise join is disabled with partition pruning
>>>>> before
>>>>> PG13.
>>>
>>> More precisely, PWJ cannot be applied when there are no matched
>>> partitions on the nullable side due to partition pruning before PG13.
>>
>> On reflection, I think I was wrong: the limitation applies to PG13,
>> even with advanced PWJ.
>>
>>> But the join is an inner join, so I think PWJ can still be applied
>>> for
>>> the join.
>>
>> I think I was wrong in this point as well :-(. PWJ cannot be applied
>> to the join due to the limitation of the PWJ matching logic. See the
>> discussion started in [1]. I think the patch in [2] would address
>> this issue as well, though the patch is under review.
>>

Thanks for sharing the links, Fujita-san.

>
> I think, discussion [1] is little relevant to the current task. Here
> we join not on partition attribute and PWJ can't be used at all. Here
> we can use push-down join of two foreign relations.
> We can analyze baserestrictinfo's of outer and inner RelOptInfo's and
> may detect that only one partition from outer and inner need to be
> joined.
> Next, we will create joinrel from RelOptInfo's of these partitions and
> replace joinrel of partitioned tables. But it is only rough outline of
> a possible solution...
>

I was a bit skeptical after eyeballing the thread [1], but still tried
v3 patch with the current master and my test setup. Surprisingly, it
just worked, though it isn't clear for me how. With this patch
aforementioned simple join is completely pushed down to the foreign
server. And speedup is approximately the same (~3 times) as when
required partitions are explicitly used in the query.

As a side-effected it also affected join + aggregate queries like:

SELECT
user_id,
count(*) AS documents_count
FROM
documents
INNER JOIN users ON documents.user_id = users.id
WHERE
documents.company_id = 5
AND users.company_id = 5
GROUP BY
user_id;

With patch it is executed as:

GroupAggregate
Group Key: documents.user_id
-> Sort
Sort Key: documents.user_id
-> Foreign Scan
Relations: (documents_node2 documents)
INNER JOIN (users_node2 users)

Without patch its plan was:

GroupAggregate
Group Key: documents.user_id
-> Sort
Sort Key: documents.user_id
-> Hash Join
Hash Cond: (documents.user_id = users.id)
-> Foreign Scan on documents_node2 documents
-> Hash
-> Foreign Scan on users_node2 users

I cannot say that it is most efficient plan in that case, since the
entire query could be pushed down to the foreign server, but still it
gives a 5-10% speedup on my setup.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-16 16:56:35
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020-07-16 19:35, Etsuro Fujita wrote:
> On Thu, Jul 16, 2020 at 8:56 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 7/16/20 9:55 AM, Etsuro Fujita wrote:
>
>> >>>> On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
>> >>>> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>> >>>>> Some real-life test queries show, that all single-node queries aren't
>> >>>>> pushed-down to the required node. For example:
>> >>>>>
>> >>>>> SELECT
>> >>>>> *
>> >>>>> FROM
>> >>>>> documents
>> >>>>> INNER JOIN users ON documents.user_id = users.id
>> >>>>> WHERE
>> >>>>> documents.company_id = 5
>> >>>>> AND users.company_id = 5;
>
>> > PWJ cannot be applied
>> > to the join due to the limitation of the PWJ matching logic. See the
>> > discussion started in [1]. I think the patch in [2] would address
>> > this issue as well, though the patch is under review.
>
>> I think, discussion [1] is little relevant to the current task. Here
>> we
>> join not on partition attribute and PWJ can't be used at all.
>
> The main point of the discussion is to determine whether PWJ can be
> used for a join between partitioned tables, based on
> EquivalenceClasses, not just join clauses created by
> build_joinrel_restrictlist(). For the above join, for example, the
> patch in [2] would derive a join clause "documents.company_id =
> users.company_id" from an EquivalenceClass that recorded the knowledge
> "documents.company_id = 5" and "users.company_id = 5", and then the
> planner would consider from it that PWJ can be used for the join.
>

Yes, it really worked well. Thank you for the explanation, it wasn't so
obvious for me as well. That way, I think that the patch from [1] covers
many cases of joins targeting a single partition / foreign server.

However, there is an issue with aggregates as well. For a query like:

SELECT
count(*)
FROM
documents
WHERE
company_id = 5;

It would be great to teach planner to understand, that it's a
partition-wise aggregate as well, even without GROUP BY company_id,
which doesn't always help as well. I'll try to look closer on this
problem, but if you have any thoughts about it, then I'd be glad to
know.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-17 04:23:19
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/16/20 9:35 PM, Etsuro Fujita wrote:
> On Thu, Jul 16, 2020 at 8:56 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 7/16/20 9:55 AM, Etsuro Fujita wrote:
>
>>>>>> On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
>>>>>> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>>>>>>> Some real-life test queries show, that all single-node queries aren't
>>>>>>> pushed-down to the required node. For example:
>>>>>>>
>>>>>>> SELECT
>>>>>>> *
>>>>>>> FROM
>>>>>>> documents
>>>>>>> INNER JOIN users ON documents.user_id = users.id
>>>>>>> WHERE
>>>>>>> documents.company_id = 5
>>>>>>> AND users.company_id = 5;
>
>>> PWJ cannot be applied
>>> to the join due to the limitation of the PWJ matching logic. See the
>>> discussion started in [1]. I think the patch in [2] would address
>>> this issue as well, though the patch is under review.
>
>> I think, discussion [1] is little relevant to the current task. Here we
>> join not on partition attribute and PWJ can't be used at all.
>
> The main point of the discussion is to determine whether PWJ can be
> used for a join between partitioned tables, based on
> EquivalenceClasses, not just join clauses created by
> build_joinrel_restrictlist(). For the above join, for example, the
> patch in [2] would derive a join clause "documents.company_id =
> users.company_id" from an EquivalenceClass that recorded the knowledge
> "documents.company_id = 5" and "users.company_id = 5", and then the
> planner would consider from it that PWJ can be used for the join.
>
Ok, this patch works and you solved a part of the problem with this
interesting approach.
But you can see that modification of the query:

SELECT * FROM documents, users WHERE documents.company_id = 5 AND
users.company_id = 7;

also can be pushed into node2 and joined there but not.
My point is that we can try to solve the whole problem.

--
regards,
Andrey Lepikhov
Postgres Professional


From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-17 14:55:09
Message-ID: CAPmGK17pd91953VHhCTDMGo+FRaRYYeqpsN4Lu40qo171ofAHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 17, 2020 at 1:56 AM Alexey Kondratov
<a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> However, there is an issue with aggregates as well. For a query like:
>
> SELECT
> count(*)
> FROM
> documents
> WHERE
> company_id = 5;
>
> It would be great to teach planner to understand, that it's a
> partition-wise aggregate as well, even without GROUP BY company_id,
> which doesn't always help as well. I'll try to look closer on this
> problem, but if you have any thoughts about it, then I'd be glad to
> know.

The reason why the aggregation count(*) isn't pushed down to the
remote side is: 1) we allow the FDW to push the aggregation down only
when the input relation to the aggregation is a foreign (base or join)
relation (see create_grouping_paths()), but 2) for your case the input
relation would be an append relation that contains the foreign
partition as only one child relation, NOT just the foreign partition.
The resulting Append path would be removed in the postprocessing (see
[1]), but that would be too late for the FDW to do the push-down work.
I have no idea what to do about this issue.

Best regards,
Etsuro Fujita

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8edd0e79460b414b1d971895312e549e95e12e4f;hp=f21668f328c864c6b9290f39d41774cb2422f98e


From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-17 15:44:19
Message-ID: CAExHW5vAzbcwT7bvWKTsrZti_Sp1iMXDZhbRDb-oD91LSqkURg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 17, 2020 at 8:24 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
>
> On Fri, Jul 17, 2020 at 1:56 AM Alexey Kondratov
> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> > However, there is an issue with aggregates as well. For a query like:
> >
> > SELECT
> > count(*)
> > FROM
> > documents
> > WHERE
> > company_id = 5;
> >
> > It would be great to teach planner to understand, that it's a
> > partition-wise aggregate as well, even without GROUP BY company_id,
> > which doesn't always help as well. I'll try to look closer on this
> > problem, but if you have any thoughts about it, then I'd be glad to
> > know.
>
> The reason why the aggregation count(*) isn't pushed down to the
> remote side is: 1) we allow the FDW to push the aggregation down only
> when the input relation to the aggregation is a foreign (base or join)
> relation (see create_grouping_paths()), but 2) for your case the input
> relation would be an append relation that contains the foreign
> partition as only one child relation, NOT just the foreign partition.
> The resulting Append path would be removed in the postprocessing (see
> [1]), but that would be too late for the FDW to do the push-down work.
> I have no idea what to do about this issue.

Won't partitionwise aggregate push aggregate down to partition and
then from there to the foreign server through FDW? Something else must
be stopping it. May be whole-var expression?

--
Best Wishes,
Ashutosh Bapat


From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-17 16:30:21
Message-ID: CAPmGK17HXqVx3SWYmOY+seCW_=VEDi-WUxmSCUGxt-2VfZBfFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jul 18, 2020 at 12:44 AM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> On Fri, Jul 17, 2020 at 8:24 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> > On Fri, Jul 17, 2020 at 1:56 AM Alexey Kondratov
> > <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> > > However, there is an issue with aggregates as well. For a query like:
> > >
> > > SELECT
> > > count(*)
> > > FROM
> > > documents
> > > WHERE
> > > company_id = 5;
> > >
> > > It would be great to teach planner to understand, that it's a
> > > partition-wise aggregate as well, even without GROUP BY company_id,
> > > which doesn't always help as well. I'll try to look closer on this
> > > problem, but if you have any thoughts about it, then I'd be glad to
> > > know.
> >
> > The reason why the aggregation count(*) isn't pushed down to the
> > remote side is: 1) we allow the FDW to push the aggregation down only
> > when the input relation to the aggregation is a foreign (base or join)
> > relation (see create_grouping_paths()), but 2) for your case the input
> > relation would be an append relation that contains the foreign
> > partition as only one child relation, NOT just the foreign partition.
> > The resulting Append path would be removed in the postprocessing (see
> > [1]), but that would be too late for the FDW to do the push-down work.
> > I have no idea what to do about this issue.
>
> Won't partitionwise aggregate push aggregate down to partition and
> then from there to the foreign server through FDW?

Sorry, my words were not clear. The aggregation above is count(*)
*without GROUP BY*, so we can’t apply PWA to it.

Best regards,
Etsuro Fujita


From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Date: 2020-07-20 12:03:26
Message-ID: CAExHW5uV7We2tiVSSAh+94dc3HaAzGaDMtLnWYicnsX5+08u9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 17, 2020 at 10:00 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
>
> On Sat, Jul 18, 2020 at 12:44 AM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > On Fri, Jul 17, 2020 at 8:24 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> > > On Fri, Jul 17, 2020 at 1:56 AM Alexey Kondratov
> > > <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
> > > > However, there is an issue with aggregates as well. For a query like:
> > > >
> > > > SELECT
> > > > count(*)
> > > > FROM
> > > > documents
> > > > WHERE
> > > > company_id = 5;
> > > >
> > > > It would be great to teach planner to understand, that it's a
> > > > partition-wise aggregate as well, even without GROUP BY company_id,
> > > > which doesn't always help as well. I'll try to look closer on this
> > > > problem, but if you have any thoughts about it, then I'd be glad to
> > > > know.
> > >
> > > The reason why the aggregation count(*) isn't pushed down to the
> > > remote side is: 1) we allow the FDW to push the aggregation down only
> > > when the input relation to the aggregation is a foreign (base or join)
> > > relation (see create_grouping_paths()), but 2) for your case the input
> > > relation would be an append relation that contains the foreign
> > > partition as only one child relation, NOT just the foreign partition.
> > > The resulting Append path would be removed in the postprocessing (see
> > > [1]), but that would be too late for the FDW to do the push-down work.
> > > I have no idea what to do about this issue.
> >
> > Won't partitionwise aggregate push aggregate down to partition and
> > then from there to the foreign server through FDW?
>
> Sorry, my words were not clear. The aggregation above is count(*)
> *without GROUP BY*, so we can’t apply PWA to it.

Ok. Thanks for the clarification.

IIRC, if GROUP BY does not contain the partition key, partition-wise
aggregate will collect partial aggregates from each partition and then
combine those to form the final aggregate. However, we do not have
infrastructure to request partial aggregates from a foreign server (we
lack SQL level support for it). Hence it's not pushed down to the
foreign server. For count(*) there is no difference between full and
partial aggregates so it appears as if we could change PARTIAL to FULL
to push the aggregate down to the foreign server but that's not true
in general.

--
Best Wishes,
Ashutosh Bapat