Vectored IO in XLogWrite()

Lists: pgsql-hackers
From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Vectored IO in XLogWrite()
Date: 2024-08-06 09:35:54
Message-ID: CAGPVpCQr1yk6p1ZJguacosuqP+OLjXR2EX9yH3tmhM8VQK=smA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi hackers,

I was looking into XLogWrite() and saw the below comment. It cannot really
circle back in wal buffers without needing to call pg_write() since next
pages wouldn't be contiguous in memory. So it needs to write whenever it
reaches the end of wal buffers.

/*
> * Dump the set if this will be the last loop iteration, or if we are
> * at the last page of the cache area (since the next page won't be
> * contiguous in memory), or if we are at the end of the logfile
> * segment.
> */

I think that we don't have the "contiguous pages" constraint when writing
anymore as we can do vectored IO. It seems unnecessary to write just
because XLogWrite() is at the end of wal buffers.
Attached patch uses pg_pwritev() instead of pg_pwrite() and tries to write
pages in one call even if they're not contiguous in memory, until it
reaches the page at startidx.

After quickly experimenting the patch and comparing the number of write
calls, the patch's affect can be more visible when wal_buffers is quite low
as it's more likely to circle back to the beginning. When wal_buffers is
set to a decent amount, the patch only saves a few write calls. But I
wouldn't expect any regression introduced by the patch (I may be wrong
here), so I thought it may be worth to consider.

I appreciate any feedback on the proposed change. I'd also be glad to
benchmark the patch if you want to see how it performs in some specific
cases since I've been struggling with coming up a good test case.

Regards,
--
Melih Mutlu
Microsoft

Attachment Content-Type Size
v1-0001-Use-pg_pwritev-in-XlogWrite.patch application/octet-stream 3.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Vectored IO in XLogWrite()
Date: 2024-08-06 17:42:52
Message-ID: CA+Tgmoana3QBsZa-2BTAJP5cMg-gd6gmhgQGXf74zd+8uUGdbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 6, 2024 at 5:36 AM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> I think that we don't have the "contiguous pages" constraint when writing anymore as we can do vectored IO. It seems unnecessary to write just because XLogWrite() is at the end of wal buffers.
> Attached patch uses pg_pwritev() instead of pg_pwrite() and tries to write pages in one call even if they're not contiguous in memory, until it reaches the page at startidx.

Here are a few notes on this patch:

- It's not pgindent-clean. In fact, it doesn't even pass git diff --check.

- You added a new comment (/* Reaching the buffer... */) in the middle
of a chunk of lines that were already covered by an existing comment
(/* Dump the set ... */). This makes it look like the /* Dump the
set... */ comment only covers the 3 lines of code that immediately
follow it rather than everything in the "if" statement. You could fix
this in a variety of ways, but in this case the easiest solution, to
me, looks like just skipping the new comment. It seems like the point
is pretty self-explanatory.

- The patch removes the initialization of "from" but not the variable
itself. You still increment the variable you haven't initialized.

- I don't think the logic is correct after a partial write. Pre-patch,
"from" advances while "nleft" goes down, but post-patch, what gets
written is dependent on the contents of "iov" which is initialized
outside the loop and never updated. Perhaps compute_remaining_iovec
would be useful here?

- I assume this is probably a good idea in principle, because fewer
system calls are presumably better than more. The impact will probably
be very small, though.

--
Robert Haas
EDB: http://www.enterprisedb.com


From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Vectored IO in XLogWrite()
Date: 2024-08-06 22:30:17
Message-ID: CAGPVpCTZHhc779zuFcdxObuYtJZn4MLLYt4BgsqRgRj1Ufb73g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Robert,

Thanks for reviewing.

Robert Haas <robertmhaas(at)gmail(dot)com>, 6 Ağu 2024 Sal, 20:43 tarihinde şunu
yazdı:

> On Tue, Aug 6, 2024 at 5:36 AM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> > I think that we don't have the "contiguous pages" constraint when
> writing anymore as we can do vectored IO. It seems unnecessary to write
> just because XLogWrite() is at the end of wal buffers.
> > Attached patch uses pg_pwritev() instead of pg_pwrite() and tries to
> write pages in one call even if they're not contiguous in memory, until it
> reaches the page at startidx.
>
> Here are a few notes on this patch:
>
> - It's not pgindent-clean. In fact, it doesn't even pass git diff --check.
>

Fixed.

> - You added a new comment (/* Reaching the buffer... */) in the middle
> of a chunk of lines that were already covered by an existing comment
> (/* Dump the set ... */). This makes it look like the /* Dump the
> set... */ comment only covers the 3 lines of code that immediately
> follow it rather than everything in the "if" statement. You could fix
> this in a variety of ways, but in this case the easiest solution, to
> me, looks like just skipping the new comment. It seems like the point
> is pretty self-explanatory.
>

Removed the new comment. Only keeping the updated version of the /* Dump
the set... */ comment.

> - The patch removes the initialization of "from" but not the variable
> itself. You still increment the variable you haven't initialized.
>
> - I don't think the logic is correct after a partial write. Pre-patch,
> "from" advances while "nleft" goes down, but post-patch, what gets
> written is dependent on the contents of "iov" which is initialized
> outside the loop and never updated. Perhaps compute_remaining_iovec
> would be useful here?
>

You're right. I should have thought about the partial write case. I now
fixed it by looping and trying to write until compute_remaining_iovec()
returns 0.

Thanks,
--
Melih Mutlu
Microsoft

Attachment Content-Type Size
v2-0001-Use-pg_pwritev-in-XlogWrite.patch application/octet-stream 3.6 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Vectored IO in XLogWrite()
Date: 2024-08-08 17:25:47
Message-ID: CA+TgmoYBcLvOcEW7CsfapWmwkzYAXDeX9HBT6X7xxRoWbQ=Rrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 6, 2024 at 6:30 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> Fixed.

+ iov[0].iov_base = XLogCtl->pages + startidx * (Size)
XLOG_BLCKSZ;;

Double semicolon.

Aside from that, this looks correct now, so the next question is
whether we want it. To me, it seems like this isn't likely to buy very
much, but it also doesn't really seem to have any kind of downside, so
I'd be somewhat inclined to go ahead with it. On the other hand, one
could argue that it's better not to change working code without a good
reason.

I wondered whether the regression tests actually hit the iovcnt == 2
case, and it turns out that they do, rather frequently actually.
Making that case a FATAL causes ~260 regression test failure. However,
on larger systems, we'll often end up with wal_segment_size=16MB and
wal_buffers=16MB and then it seems like we don't hit the iovcnt==2
case. Which I guess just reinforces the point that this is
theoretically better but practically not much different.

Any other votes on what to do here?

--
Robert Haas
EDB: http://www.enterprisedb.com


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Vectored IO in XLogWrite()
Date: 2024-12-11 00:41:11
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 08, 2024 at 01:25:47PM -0400, Robert Haas wrote:
> I wondered whether the regression tests actually hit the iovcnt == 2
> case, and it turns out that they do, rather frequently actually.
> Making that case a FATAL causes ~260 regression test failure. However,
> on larger systems, we'll often end up with wal_segment_size=16MB and
> wal_buffers=16MB and then it seems like we don't hit the iovcnt==2
> case. Which I guess just reinforces the point that this is
> theoretically better but practically not much different.
>
> Any other votes on what to do here?

Perhaps some micro-benchmarking to prove that the patch shows benefits
when this code path is taken alone? I could fall behind that.

I'd also document the maths of the patch a bit more, it is always kind
of hard to figure out if iovec is set up right or not. And the
formulas used in the patch look magical.
--
Michael