Quick Links

Asynchronous MergeAppend

Lists:	pgsql-hackers

From:	Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Asynchronous MergeAppend
Date:	2024-07-17 13:24:28
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

I'd like to make MergeAppend node Async-capable like Append node.
Nowadays when planner chooses MergeAppend plan, asynchronous execution
is not possible. With attached patches you can see plans like

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Merge Append
Sort Key: async_pt.b, async_pt.a
-> Async Foreign Scan on public.async_p1 async_pt_1
Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b %
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
-> Async Foreign Scan on public.async_p2 async_pt_2
Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b %
100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST

This can be quite profitable (in our test cases you can gain up to two
times better speed with MergeAppend async execution on remote servers).

Code for asynchronous execution in Merge Append was mostly borrowed from
Append node.

What significantly differs - in ExecMergeAppendAsyncGetNext() you must
return tuple from the specified slot.
Subplan number determines tuple slot where data should be retrieved to.
When subplan is ready to provide some data,
it's cached in ms_asyncresults. When we get tuple for subplan, specified
in ExecMergeAppendAsyncGetNext(),
ExecMergeAppendAsyncRequest() returns true and loop in
ExecMergeAppendAsyncGetNext() ends. We can fetch data for
subplans which either don't have cached result ready or have already
returned them to the upper node. This
flag is stored in ms_has_asyncresults. As we can get data for some
subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
we check this flag twice in this function.
Unlike ExecAppendAsyncEventWait(), it seems
ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no
need to get result
from synchronous subplan if a tuple form async one was explicitly
requested.

Also we had to fix postgres_fdw to avoid directly looking at Append
fields. Perhaps, accesors to Append fields look strange, but allows
to avoid some code duplication. I suppose, duplication could be even
less if we reworked async Append implementation, but so far I haven't
tried to do this to avoid big diff from master.

Also mark_async_capable() believes that path corresponds to plan. This
can be not true when create_[merge_]append_plan() inserts sort node.
In this case mark_async_capable() can treat Sort plan node as some other
and crash, so there's a small fix for this.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment	Content-Type	Size
v1-0001-mark_async_capable-subpath-should-match-subplan.patch	text/x-diff	1.9 KB
v1-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch	text/x-diff	34.7 KB

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Asynchronous MergeAppend
Date:	2024-08-10 20:24:43
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi! Thank you for your work on this subject! I think this is a very
useful optimization)

While looking through your code, I noticed some points that I think
should be taken into account. Firstly, I noticed only two tests to
verify the functionality of this function and I think that this is not
enough.
Are you thinking about adding some tests with queries involving, for
example, join connections with different tables and unusual operators?

In addition, I have a question about testing your feature on a
benchmark. Are you going to do this?

On 17.07.2024 16:24, Alexander Pyhalov wrote:
> Hello.
>
> I'd like to make MergeAppend node Async-capable like Append node.
> Nowadays when planner chooses MergeAppend plan, asynchronous execution
> is not possible. With attached patches you can see plans like
>
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
>                                                           QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------
>
> Merge Append
>    Sort Key: async_pt.b, async_pt.a
>    -> Async Foreign Scan on public.async_p1 async_pt_1
>          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b %
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>    -> Async Foreign Scan on public.async_p2 async_pt_2
>          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b %
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>
> This can be quite profitable (in our test cases you can gain up to two
> times better speed with MergeAppend async execution on remote servers).
>
> Code for asynchronous execution in Merge Append was mostly borrowed
> from Append node.
>
> What significantly differs - in ExecMergeAppendAsyncGetNext() you must
> return tuple from the specified slot.
> Subplan number determines tuple slot where data should be retrieved
> to. When subplan is ready to provide some data,
> it's cached in ms_asyncresults. When we get tuple for subplan,
> specified in ExecMergeAppendAsyncGetNext(),
> ExecMergeAppendAsyncRequest() returns true and loop in
> ExecMergeAppendAsyncGetNext() ends. We can fetch data for
> subplans which either don't have cached result ready or have already
> returned them to the upper node. This
> flag is stored in ms_has_asyncresults. As we can get data for some
> subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
> we check this flag twice in this function.
> Unlike ExecAppendAsyncEventWait(), it seems
> ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no
> need to get result
> from synchronous subplan if a tuple form async one was explicitly
> requested.
>
> Also we had to fix postgres_fdw to avoid directly looking at Append
> fields. Perhaps, accesors to Append fields look strange, but allows
> to avoid some code duplication. I suppose, duplication could be even
> less if we reworked async Append implementation, but so far I haven't
> tried to do this to avoid big diff from master.
>
> Also mark_async_capable() believes that path corresponds to plan. This
> can be not true when create_[merge_]append_plan() inserts sort node.
> In this case mark_async_capable() can treat Sort plan node as some
> other and crash, so there's a small fix for this.

I think you should add this explanation to the commit message because
without it it's hard to understand the full picture of how your code works.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Asynchronous MergeAppend
Date:	2024-08-20 09:14:44
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi.

Alena Rybakina писал(а) 2024-08-10 23:24:
> Hi! Thank you for your work on this subject! I think this is a very
> useful optimization)
>
> While looking through your code, I noticed some points that I think
> should be taken into account. Firstly, I noticed only two tests to
> verify the functionality of this function and I think that this is not
> enough.
> Are you thinking about adding some tests with queries involving, for
> example, join connections with different tables and unusual operators?

I've added some more tests - tests for joins and pruning.

>
> In addition, I have a question about testing your feature on a
> benchmark. Are you going to do this?
>

The main reason for this work is a dramatic performance degradation when
Append plans with async foreign scan nodes are switched to MergeAppend
plans with synchronous foreign scans.

I've performed some synthetic tests to prove the benefits of async Merge
Append. So far tests are performed on one physical host.

For tests I've deployed 3 PostgreSQL instances on ports 5432-5434.

The first instance:
create server s2 foreign data wrapper postgres_fdw OPTIONS ( port
'5433', dbname 'postgres', async_capable 'on');
create server s3 foreign data wrapper postgres_fdw OPTIONS ( port
'5434', dbname 'postgres', async_capable 'on');

create foreign table players_p1 partition of players for values with
(modulus 4, remainder 0) server s2;
create foreign table players_p2 partition of players for values with
(modulus 4, remainder 1) server s2;
create foreign table players_p3 partition of players for values with
(modulus 4, remainder 2) server s3;
create foreign table players_p4 partition of players for values with
(modulus 4, remainder 3) server s3;

s2 instance:
create table players_p1 (id int, name text, score int);
create table players_p2 (id int, name text, score int);
create index on players_p1(score);
create index on players_p2(score);

s3 instance:
create table players_p3 (id int, name text, score int);
create table players_p4 (id int, name text, score int);
create index on players_p3(score);
create index on players_p4(score);

s1 instance:
insert into players select i, 'player_' ||i, random()* 100 from
generate_series(1,100000) i;

pgbench script:
\set rnd_offset random(0,200)
\set rnd_limit random(10,20)

select * from players order by score desc offset :rnd_offset limit
:rnd_limit;

pgbench was run as:
pgbench -n -f 1.sql postgres -T 100 -c 16 -j 16

CPU idle was about 5-10%.

pgbench results:

Without patch, async_capable on:

pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 16
duration: 100 s
number of transactions actually processed: 130523
latency average = 12.257 ms
initial connection time = 29.824 ms
tps = 1305.363500 (without initial connection time)

Without patch, async_capable off:

pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 16
duration: 100 s
number of transactions actually processed: 130075
latency average = 12.299 ms
initial connection time = 26.931 ms
tps = 1300.877993 (without initial connection time)

as expected - we see no difference.

Patched, async_capable on:

pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 16
duration: 100 s
number of transactions actually processed: 135616
latency average = 11.796 ms
initial connection time = 28.619 ms
tps = 1356.341587 (without initial connection time)

Patched, async_capable off:

pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 16
duration: 100 s
number of transactions actually processed: 131300
latency average = 12.185 ms
initial connection time = 29.573 ms
tps = 1313.138405 (without initial connection time)

Here we can see that async MergeAppend behaves a bit better. You can
argue that benefit is not so big and perhaps is related to some random
factors.
However, if we set number of threads to 1, so that CPU has idle cores,
we'll see more evident improvements:

Patched, async_capable on:
pgbench (14.13, server 18devel)
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 100 s
number of transactions actually processed: 20221
latency average = 4.945 ms
initial connection time = 7.035 ms
tps = 202.221816 (without initial connection time)

Patched, async_capable off
transaction type: 1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 100 s
number of transactions actually processed: 14941
latency average = 6.693 ms
initial connection time = 7.037 ms
tps = 149.415688 (without initial connection time)

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment	Content-Type	Size
v2-0002-MergeAppend-should-support-Async-Foreign-Scan-subpla.patch	text/x-diff	45.2 KB
v2-0001-mark_async_capable-subpath-should-match-subplan.patch	text/x-diff	1.9 KB