POC: make mxidoff 64 bits

Lists: pgsql-hackers
From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: POC: make mxidoff 64 bits
Date: 2024-04-23 08:23:41
Message-ID: CACG=ezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

I've been trying to introduce 64-bit transaction identifications to
Postgres for quite a while [0]. All this implies,
of course, an enormous amount of change that will have to take place in
various modules. Consider this, the
patch set become too big to be committed “at once”.

The obvious solutions was to try to split the patch set into smaller ones.
But here it comes a new challenge,
not every one of these parts, make Postgres better at the moment. Actually,
even switching to a
FullTransactionId in PGPROC will have disadvantage in increasing of WAL
size [1].

In fact, I believe, we're dealing with the chicken and the egg problem
here. Not able to commit full patch set
since it is too big to handle and not able to commit parts of it, since
they make sense all together and do not
help improve Postgres at the moment.

But it's not that bad. Since the commit 4ed8f0913bfdb5f, added in [3], we
are capable to use 64 bits to
indexing SLRUs.

PROPOSAL
Make multixact offsets 64 bit.

RATIONALE
It is not very difficult to overflow 32-bit mxidoff. Since, it is created
for every unique combination of the
transaction for each tuple, including XIDs and respective flags. And when a
transaction is added to a
specific multixact, it is rewritten with a new offset. In other words, it
is possible to overflow the offsets of
multixacts without overflowing the multixacts themselves and/or ordinary
transactions. I believe, there
was something about in the hackers mail lists, but I could not find links
now.

PFA, patch. Here is a WIP version. Upgrade machinery should be added later.

As always, any opinions on a subject a very welcome!

[0]
https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
[1]
https://www.postgresql.org/message-id/flat/CACG%3DezY7msw%2Bjip%3Drtfvnfz051dRqz4s-diuO46v3rAoAE0T0g%40mail.gmail.com
[3]
https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
0001-WIP-mxidoff-to-64bit.patch application/octet-stream 23.7 KB

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-04-23 09:37:40
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23/04/2024 11:23, Maxim Orlov wrote:
> PROPOSAL
> Make multixact offsets 64 bit.

+1, this is a good next step and useful regardless of 64-bit XIDs.

> @@ -156,7 +148,7 @@
> ((uint32) ((0xFFFFFFFF % MULTIXACT_MEMBERS_PER_PAGE) + 1))
>
> /* page in which a member is to be found */
> -#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)
> +#define MXOffsetToMemberPage(xid) ((xid) / (MultiXactOffset) MULTIXACT_MEMBERS_PER_PAGE)
> #define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) / SLRU_PAGES_PER_SEGMENT)
>
> /* Location (byte offset within page) of flag word for a given member */

This is really a bug fix. It didn't matter when TransactionId and
MultiXactOffset were both typedefs of uint32, but it was always wrong.
The argument name 'xid' is also misleading.

I think there are some more like that, MXOffsetToFlagsBitShift for example.

--
Heikki Linnakangas
Neon (https://neon.tech)


From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-04-23 13:02:56
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 23 Apr 2024, at 11:23, Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:
>
> Make multixact offsets 64 bit.

- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("multixact \"members\" limit exceeded"),
Personally, I'd be happy with this! We had some incidents where the only mitigation was vacuum settings tweaking.

BTW as a side note... I see lot's of casts to (unsigned long long), can't we just cast to MultiXactOffset?

Best regards, Andrey Borodin.


From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-04-23 13:03:06
Message-ID: CAGjGUAKO1GCzG5wBMt5RosWo0PatgFpYY=Gjgt77tN2brNe=Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Maxim Orlov
Thank you so much for your tireless work on this. Increasing the WAL
size by a few bytes should have very little impact with today's disk
performance(Logical replication of this feature wal log is also increased a
lot, logical replication is a milestone new feature, and the community has
been improving the logical replication of functions),I believe removing
troubled postgresql Transaction ID Wraparound was also a milestone new
feature adding a few bytes is worth it!

Best regards

On Tue, 23 Apr 2024 at 17:37, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 23/04/2024 11:23, Maxim Orlov wrote:
> > PROPOSAL
> > Make multixact offsets 64 bit.
>
> +1, this is a good next step and useful regardless of 64-bit XIDs.
>
> > @@ -156,7 +148,7 @@
> > ((uint32) ((0xFFFFFFFF % MULTIXACT_MEMBERS_PER_PAGE) + 1))
> >
> > /* page in which a member is to be found */
> > -#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId)
> MULTIXACT_MEMBERS_PER_PAGE)
> > +#define MXOffsetToMemberPage(xid) ((xid) / (MultiXactOffset)
> MULTIXACT_MEMBERS_PER_PAGE)
> > #define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) /
> SLRU_PAGES_PER_SEGMENT)
> >
> > /* Location (byte offset within page) of flag word for a given member */
>
> This is really a bug fix. It didn't matter when TransactionId and
> MultiXactOffset were both typedefs of uint32, but it was always wrong.
> The argument name 'xid' is also misleading.
>
> I think there are some more like that, MXOffsetToFlagsBitShift for example.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>
>
>


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-04-25 14:20:53
Message-ID: CACG=ezYokoiumOFnqUfg_ffHD5s8T+6iHYfzKLfa=QQ-1pNrBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 23 Apr 2024 at 12:37, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> This is really a bug fix. It didn't matter when TransactionId and
> MultiXactOffset were both typedefs of uint32, but it was always wrong.
> The argument name 'xid' is also misleading.
>
> I think there are some more like that, MXOffsetToFlagsBitShift for example.

Yeah, I always thought so too. I believe, this is just a copy-paste. You
mean, it is worth creating a separate CF
entry for these fixes?

On Tue, 23 Apr 2024 at 16:03, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru>
wrote:

> BTW as a side note... I see lot's of casts to (unsigned long long), can't
> we just cast to MultiXactOffset?
>
Actually, first versions of the 64xid patch set have such a cast to types
TransactionID, MultiXact and so on. But,
after some discussions, we are switched to unsigned long long cast.
Unfortunately, I could not find an exact link
for that discussion. On the other hand, such a casting is already used
throughout the code. So, just for the
sake of the consistency, I would like to stay with these casts.

On Tue, 23 Apr 2024 at 16:03, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> wrote:

> Hi Maxim Orlov
> Thank you so much for your tireless work on this. Increasing the WAL
> size by a few bytes should have very little impact with today's disk
> performance(Logical replication of this feature wal log is also increased a
> lot, logical replication is a milestone new feature, and the community has
> been improving the logical replication of functions),I believe removing
> troubled postgresql Transaction ID Wraparound was also a milestone new
> feature adding a few bytes is worth it!
>
I'm 100% agree. Maybe, I should return to this approach and find some
benefits for having FXIDs in WAL.


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-08-14 15:30:15
Message-ID: CACG=ezY9xq73jcX_EjVqx5-f90nbQ9PyhFCTW2fwFCS2wmNiFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

Sorry for delay. I was a bit busy last month. Anyway, here is my
proposal for making multioffsets 64 bit.
The patch set consists of three parts:
0001 - making user output of offsets 64-bit ready;
0002 - making offsets 64-bit;
0003 - provide 32 to 64 bit conversion in pg_upgarde.

I'm pretty sure this is just a beginning of the conversation, so any
opinions and reviews, as always, are very welcome!

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v1-0002-Use-64-bit-multixact-offsets.patch application/x-patch 14.3 KB
v1-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/x-patch 9.0 KB
v1-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/x-patch 12.9 KB

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-09-03 13:30:10
Message-ID: CACG=eza+27CfLBobJJccRhXrA3He6c1irAnoyTtSC1-z9UXLrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is rebase. Apparently I'll have to do it often, since the
CATALOG_VERSION_NO changed in the patch.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v2-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v2-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 12.9 KB
v2-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 13.3 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-09-03 13:32:46
Message-ID: CAPpHfduczcop9s6gKUpLGgFUe2y4ERGMJx6SS6Kp+s-kQPwMjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 3, 2024 at 4:30 PM Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:
> Here is rebase. Apparently I'll have to do it often, since the CATALOG_VERSION_NO changed in the patch.

I don't think you need to maintain CATALOG_VERSION_NO change in your
patch for the exact reason you have mentioned: patch will get conflict
each time CATALOG_VERSION_NO is advanced. It's responsibility of
committer to advance CATALOG_VERSION_NO when needed.

------
Regards,
Alexander Korotkov
Supabase


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-09-04 08:49:32
Message-ID: CACG=ezbye4g_ERNqE=gBcvQ0YypRaVENhNUu8xrs4PL12UdnUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 3 Sept 2024 at 16:32, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
wrote:

> I don't think you need to maintain CATALOG_VERSION_NO change in your
> patch for the exact reason you have mentioned: patch will get conflict
> each time CATALOG_VERSION_NO is advanced. It's responsibility of
> committer to advance CATALOG_VERSION_NO when needed.
>

OK, I got it. My intention here was to help to test the patch. If someone
wants to have a
look at the patch, he won't need to make changes in the code. In the next
iteration, I'll
remove CATALOG_VERSION_NO version change.

--
Best regards,
Maxim Orlov.


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-09-07 04:36:52
Message-ID: CACG=ezaMncd0-BcGHBgsSR2eqHfrz9WznHGLKX8biz6zu-azGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is v3. I removed CATALOG_VERSION_NO change, so this should be done by
the actual commiter.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v3-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 13.3 KB
v3-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v3-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 12.6 KB

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-09-12 12:09:08
Message-ID: CALT9ZEHm138m2rTi6-N_Xcm0ibfbK0b7yysdPHj1v144YUOFQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Maxim!

Previously we accessed offsets in shared MultiXactState without locks as
32-bit read is always atomic. But I'm not sure it's so when offset become
64-bit.
E.g. GetNewMultiXactId():

nextOffset = MultiXactState->nextOffset;
is outside lock.

There might be other places we do the same as well.

Regards,
Pavel Borisov
Supabase


From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-09-12 12:25:53
Message-ID: CALT9ZEG4OoQk8OUBCNY8cT3GuUvcQ0NrphHZ9H8--VjwM5VWUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 12 Sept 2024 at 16:09, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> wrote:

> Hi, Maxim!
>
> Previously we accessed offsets in shared MultiXactState without locks as
> 32-bit read is always atomic. But I'm not sure it's so when offset become
> 64-bit.
> E.g. GetNewMultiXactId():
>
> nextOffset = MultiXactState->nextOffset;
> is outside lock.
>
> There might be other places we do the same as well.
>
I think the replacement of plain assignments by
pg_atomic_read_u64/pg_atomic_write_u64 would be sufficient.

(The same I think is needed for the patchset [1])
[1]
https://www.postgresql.org/message-id/flat/CAJ7c6TMvPz8q+nC=JoKniy7yxPzQYcCTnNFYmsDP-nnWsAOJ2g(at)mail(dot)gmail(dot)com

Regards,
Pavel Borisov


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: Maxim Orlov <orlovmg(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-09-12 13:14:08
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2024-Sep-12, Pavel Borisov wrote:

> Hi, Maxim!
>
> Previously we accessed offsets in shared MultiXactState without locks as
> 32-bit read is always atomic. But I'm not sure it's so when offset become
> 64-bit.
> E.g. GetNewMultiXactId():
>
> nextOffset = MultiXactState->nextOffset;
> is outside lock.

Good though. But fortunately I think it's not a problem. The one you
say is with MultiXactGetLock held in shared mode -- and that works OK,
as the assignment (in line 1263 at the bottom of the same routine) is
done with exclusive lock held.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-10-22 09:43:34
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/2024 07:36, Maxim Orlov wrote:
> Here is v3.

MultiXactMemberFreezeThreshold looks quite bogus now. Now that
MaxMultiXactOffset==2^64-1, you cannot get anywhere near the
MULTIXACT_MEMBER_SAFE_THRESHOLD and MULTIXACT_MEMBER_DANGER_THRESHOLD
values anymore. Can we just get rid of MultiXactMemberFreezeThreshold? I
guess it would still be useful to trigger autovacuum if multixacts
members grows large though, to release the disk space, even if you can't
run out of members as such anymore. What should the logic for that look
like?

I'd love to see some tests for the pg_upgrade code. Something like a
little perl script to generate test clusters with different wraparound
scenarios etc. using the old version, and a TAP test to run pg_upgrade
on them and verify that queries on the upgraded cluster works correctly.
We don't have tests like that in the repository today, and I don't know
if we'd want to commit these permanently either, but it would be highly
useful now as a one-off thing, to show that the code works.

On upgrade, are there really no changes required to
pg_multixact/members? I imagined that the segment files would need to be
renamed around wraparound, so that if you previously had files like this:

pg_multixact/members/FFFE
pg_multixact/members/FFFF
pg_multixact/members/0000
pg_multixact/members/0001

after upgrade you would need to have:

pg_multixact/members/00000000FFFE
pg_multixact/members/00000000FFFF
pg_multixact/members/000000010000
pg_multixact/members/000000010001

Thanks for working on this!

--
Heikki Linnakangas
Neon (https://neon.tech)


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-10-22 16:33:58
Message-ID: CACG=ezb9XTvd3ZmS0y8gUunx_wBBdJO7ou+BfCOnnA5jE-11vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 22 Oct 2024 at 12:43, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> MultiXactMemberFreezeThreshold looks quite bogus now. Now that
> MaxMultiXactOffset==2^64-1, you cannot get anywhere near the
> MULTIXACT_MEMBER_SAFE_THRESHOLD and MULTIXACT_MEMBER_DANGER_THRESHOLD
> values anymore. Can we just get rid of MultiXactMemberFreezeThreshold? I
> guess it would still be useful to trigger autovacuum if multixacts
> members grows large though, to release the disk space, even if you can't
> run out of members as such anymore. What should the logic for that look
> like?
>
Yep, you're totally correct. The MultiXactMemberFreezeThreshold call is not
necessary any more and can be safely removed.
I made this as a separate commit in v4. But, as you rightly say, it will be
useful to trigger autovacuum in some cases. The obvious
place for this machinery is in the GetNewMultiXactId. I imagine this like
"if nextOff - oldestOff > threshold kick autovac". So, the
question is: what kind of threshold we want here? Is it a hard coded define
or GUC? If it is a GUC (32–bit), what values should it be?

And the other issue I feel a little regretful about. We still must be
holding MultiXactGenLock in order to track oldestOffset to do
"nextOff - oldestOff" calculation.

>
> I'd love to see some tests for the pg_upgrade code. Something like a
> little perl script to generate test clusters with different wraparound
> scenarios etc.

Agree. I'll address this as soon as I can.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v4-0004-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 12.4 KB
v4-0003-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch application/octet-stream 13.4 KB
v4-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v4-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 13.3 KB

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-10-23 15:55:06
Message-ID: CACG=ezYFNqGjsxF6Vb2CHF6JzKcjhAFauaFm9js0nu_3Ngcdkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

After a bit of thought, I've realized that to be conservative here is the
way to go.

We can reuse a maximum of existing logic. I mean, we can remove offset
wraparound "error logic" and reuse "warning logic". But set the threshold
for "warning logic" to a much higher value. For now, I choose 2^32-1. In
other world, legit logic, in my view, here would be to trigger autovacuum
if the number of offsets (i.e. difference nextOffset - oldestOffset)
exceeds 2^32-1. PFA patch set.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v5-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 13.3 KB
v5-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 12.4 KB
v5-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch application/octet-stream 8.8 KB
v5-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-10-25 03:38:57
Message-ID: CAGjGUA+dcV7veaCV1H65vCNsbS++nT8=ho772gDvsXUW9H7eXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HI Maxim Orlov
> After a bit of thought, I've realized that to be conservative here is the
way to go.
>We can reuse a maximum of existing logic. I mean, we can remove offset
wraparound "error logic" and reuse "warning logic". But set the threshold
for "warning >logic" to a much higher value. For now, I choose 2^32-1. In
other world, legit logic, in my view, here would be to trigger autovacuum
if the number of offsets (i.e. >difference nextOffset - oldestOffset)
exceeds 2^32-1. PFA patch set.
good point ,Couldn't agree with you more. xid64 is the solution to the
wraparound problem,The previous error log is no longer meaningful ,But we
might want to refine the output waring log a little(For example, checking
the underlying reasons why age has been increasing),Though we don't have to
worry about xid wraparound

+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
Can we refine this annotation a bit? for example
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space ,reduce table
bloat if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)

Thanks

Maxim Orlov <orlovmg(at)gmail(dot)com> 于2024年10月23日周三 23:55写道:

> After a bit of thought, I've realized that to be conservative here is the
> way to go.
>
> We can reuse a maximum of existing logic. I mean, we can remove offset
> wraparound "error logic" and reuse "warning logic". But set the threshold
> for "warning logic" to a much higher value. For now, I choose 2^32-1. In
> other world, legit logic, in my view, here would be to trigger autovacuum
> if the number of offsets (i.e. difference nextOffset - oldestOffset)
> exceeds 2^32-1. PFA patch set.
>
> --
> Best regards,
> Maxim Orlov.
>


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-11-08 18:10:28
Message-ID: CACG=ezYThNkf8QsDA-aQfEFEkqn2L=_uUL83z0vJstPRasbZqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 25 Oct 2024 at 06:39, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> wrote:

>
> + * Multixact members warning threshold.
> + *
> + * If difference bettween nextOffset and oldestOffset exceed this value,
> we
> + * trigger autovacuum in order to release the disk space if possible.
> + */
> +#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0xFFFFFFFF)
> Can we refine this annotation a bit? for example
>
Thank you, fixed.

Sorry for a late reply. There was a problem in upgrade with offset
wraparound. Here is a fixed version. Test also added. I decide to use my
old patch to set a non-standard multixacts for the old cluster, fill it
with data and do pg_upgrade.

Here is how to test. All the patches are for 14e87ffa5c543b5f3 master
branch.
1) Get the 14e87ffa5c543b5f3 master branch apply patches
0001-Add-initdb-option-to-initialize-cluster-with-non-sta.patch and
0002-TEST-lower-SLRU_PAGES_PER_SEGMENT.patch
2) Get the 14e87ffa5c543b5f3 master branch in a separate directory and
apply v6 patch set.
3) Build two branches.
4) Use ENV oldinstall to run the test: PROVE_TESTS=t/005_mxidoff.pl
oldinstall=/home/orlov/proj/pgsql-new PG_TEST_NOCLEAN=1 make check -C
src/bin/pg_upgrade/

Maybe, I'll make a shell script to automate this steps if required.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v6-0001-Use-64-bit-multixact-offsets.patch application/octet-stream 13.3 KB
v6-0002-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 18.3 KB
v6-0004-TEST-lower-SLRU_PAGES_PER_SEGMENT-set-bump-catver.patch application/octet-stream 2.1 KB
v6-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch application/octet-stream 25.4 KB
v6-0003-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch application/octet-stream 8.8 KB
v6-0006-TEST-add-basic-mxidoff64-tests-005_mxidoff.pl.patch application/octet-stream 10.6 KB
0002-TEST-lower-SLRU_PAGES_PER_SEGMENT.patch application/octet-stream 784 bytes
0001-Add-initdb-option-to-initialize-cluster-with-non-sta.patch application/octet-stream 33.0 KB

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-11-11 23:31:40
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/11/2024 20:10, Maxim Orlov wrote:
> Sorry for a late reply. There was a problem in upgrade with offset
> wraparound. Here is a fixed version. Test also added. I decide to use my
> old patch to set a non-standard multixacts for the old cluster, fill it
> with data and do pg_upgrade.

The wraparound logic is still not correct. To test, I created a cluster
where multixids have wrapped around, so that:

$ ls -l data-old/pg_multixact/offsets/
total 720
-rw------- 1 heikki heikki 212992 Nov 12 01:11 0000
-rw-r--r-- 1 heikki heikki 262144 Nov 12 00:55 FFFE
-rw------- 1 heikki heikki 262144 Nov 12 00:56 FFFF

After running pg_upgrade:

$ ls -l data-new/pg_multixact/offsets/
total 1184
-rw------- 1 heikki heikki 155648 Nov 12 01:12 0001
-rw------- 1 heikki heikki 262144 Nov 12 01:11 1FFFD
-rw------- 1 heikki heikki 262144 Nov 12 01:11 1FFFE
-rw------- 1 heikki heikki 262144 Nov 12 01:11 1FFFF
-rw------- 1 heikki heikki 262144 Nov 12 01:11 20000
-rw------- 1 heikki heikki 155648 Nov 12 01:11 20001

That's not right. The segments 20000 and 20001 were created by the new
pg_upgrade conversion code from old segment '0000'. But multixids are
still 32-bit values, so after segment 1FFFF, you should still wrap
around to 0000. The new segments should be '0000' and '0001'. The
segment '0001' is created when postgres is started after upgrade, but
it's created from scratch and doesn't contain the upgraded values.

When I try to select from a table after upgrade that contains
post-wraparound multixids:

TRAP: failed Assert("offset != 0"), File:
"../src/backend/access/transam/multixact.c", Line: 1353, PID: 63386

On a different note, I'm surprised you're rewriting member segments from
scratch, parsing all the individual member groups and writing them out
again. There's no change to the members file format, except for the
numbering of the files, so you could just copy the files under the new
names without paying attention to the contents. It's not wrong to parse
them in detail, but I'd assume that it would be simpler not to.

> Here is how to test. All the patches are for 14e87ffa5c543b5f3 master
> branch.
> 1) Get the 14e87ffa5c543b5f3 master branch apply patches 0001-Add-
> initdb-option-to-initialize-cluster-with-non-sta.patch and 0002-TEST-
> lower-SLRU_PAGES_PER_SEGMENT.patch
> 2) Get the 14e87ffa5c543b5f3 master branch in a separate directory and
> apply v6 patch set.
> 3) Build two branches.
> 4) Use ENV oldinstall to run the test: PROVE_TESTS=t/005_mxidoff.pl
> <http://005_mxidoff.pl> oldinstall=/home/orlov/proj/pgsql-new
> PG_TEST_NOCLEAN=1 make check -C src/bin/pg_upgrade/
>
> Maybe, I'll make a shell script to automate this steps if required.

Yeah, I think we need something to automate this. I did the testing
manually. I used the attached python script to consume multixids faster,
but it's still tedious.

I used pg_resetwal to quickly create a cluster that's close to multixid
wrapround:

initdb -D data
pg_resetwal -D data -m 4294900001,4294900000
dd if=/dev/zero of=data/pg_multixact/offsets/FFFE bs=8192 count=32

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
multixids.py text/x-python 1.8 KB

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-11-13 15:44:44
Message-ID: CACG=ezYtCatcRODS-ZkwhcxuqBKCuhEsZGBruw=dGCLoepF+ZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 12 Nov 2024 at 02:31, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> The wraparound logic is still not correct.

Yep, my fault. I forget to reset segment counter if wraparound is happened.
Fixed.

When I try to select from a table after upgrade that contains
> post-wraparound multixids:
>
> TRAP: failed Assert("offset != 0"), File:
> "../src/backend/access/transam/multixact.c", Line: 1353, PID: 63386
>
The problem was in converting offset segments. The new_entry index should
also bypass the invalid offset (0) value. Fixed.

>
> On a different note, I'm surprised you're rewriting member segments from
> scratch, parsing all the individual member groups and writing them out
> again. There's no change to the members file format, except for the
> numbering of the files, so you could just copy the files under the new
> names without paying attention to the contents. It's not wrong to parse
> them in detail, but I'd assume that it would be simpler not to.
>
Yes, at the beginning I also thought that it would be possible to get by
with simple copying. But in case of wraparound, we must "bypass" invalid
zero offset value. See, old 32 bit offsets a wrapped at 2^32, thus 0 values
appears in multixact.c So, they must be handled. Bypass, in fact. When we
are switched to the 64-bit offsets, we have two options:
1). Bypass every ((uint32) offset == 0) value in multixact.c;
2). Convert members and bypass invalid value once.

The first options seem too weird for me. So, we have to repack members and
bypass invalid value.

All patches are for master(at)38c18710b37a2d

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v7-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v7-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch application/octet-stream 8.8 KB
v7-0005-TEST-bump-catver.patch application/octet-stream 1.1 KB
v7-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 18.6 KB
v7-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 13.3 KB

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-11-15 08:41:46
Message-ID: CACG=ezac431kXL+9jfUJSD6u15Ybmfzvq0Pf7nP51b3D_uc73g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is the test scripts.
The generate.sh script is used to generate data dir with multimple clusters
in it. This script will call multixids.py in order to generate data. If you
are not use system psql consider using LD_LIBRARY_PATH env to specify path
to the lib directory.
OLDBIN=/.../pgsql-new ./generate.sh

Then the test.sh is used to run various upgrades.
OLDBIN=/.../pgsql-old NEWBIN=/.../pgsql-new ./test.sh

I hope that helps!

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
multixids.py text/x-python-script 1.6 KB
generate.sh application/x-sh 1.9 KB
test.sh application/x-sh 1.8 KB

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-11-15 11:06:00
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13/11/2024 17:44, Maxim Orlov wrote:
> On Tue, 12 Nov 2024 at 02:31, Heikki Linnakangas <hlinnaka(at)iki(dot)fi
> <mailto:hlinnaka(at)iki(dot)fi>> wrote:
> On a different note, I'm surprised you're rewriting member segments
> from
> scratch, parsing all the individual member groups and writing them out
> again. There's no change to the members file format, except for the
> numbering of the files, so you could just copy the files under the new
> names without paying attention to the contents. It's not wrong to parse
> them in detail, but I'd assume that it would be simpler not to.
>
> Yes, at the beginning I also thought that it would be possible to get by
> with simple copying. But in case of wraparound, we must "bypass" invalid
> zero offset value. See, old 32 bit offsets a wrapped at 2^32, thus 0
> values appears in multixact.c So, they must be handled. Bypass, in fact.
> When we are switched to the 64-bit offsets, we have two options:
> 1). Bypass every ((uint32) offset == 0) value in multixact.c;
> 2). Convert members and bypass invalid value once.
>
> The first options seem too weird for me. So, we have to repack members
> and bypass invalid value.

Hmm, so if I understand correctly, this is related to how we determine
the length of the members array, by looking at the next multixid's
offset. This is explained in GetMultiXactIdMembers:

> /*
> * Find out the offset at which we need to start reading MultiXactMembers
> * and the number of members in the multixact. We determine the latter as
> * the difference between this multixact's starting offset and the next
> * one's. However, there are some corner cases to worry about:
> *
> * 1. This multixact may be the latest one created, in which case there is
> * no next one to look at. In this case the nextOffset value we just
> * saved is the correct endpoint.
> *
> * 2. The next multixact may still be in process of being filled in: that
> * is, another process may have done GetNewMultiXactId but not yet written
> * the offset entry for that ID. In that scenario, it is guaranteed that
> * the offset entry for that multixact exists (because GetNewMultiXactId
> * won't release MultiXactGenLock until it does) but contains zero
> * (because we are careful to pre-zero offset pages). Because
> * GetNewMultiXactId will never return zero as the starting offset for a
> * multixact, when we read zero as the next multixact's offset, we know we
> * have this case. We handle this by sleeping on the condition variable
> * we have just for this; the process in charge will signal the CV as soon
> * as it has finished writing the multixact offset.
> *
> * 3. Because GetNewMultiXactId increments offset zero to offset one to
> * handle case #2, there is an ambiguity near the point of offset
> * wraparound. If we see next multixact's offset is one, is that our
> * multixact's actual endpoint, or did it end at zero with a subsequent
> * increment? We handle this using the knowledge that if the zero'th
> * member slot wasn't filled, it'll contain zero, and zero isn't a valid
> * transaction ID so it can't be a multixact member. Therefore, if we
> * read a zero from the members array, just ignore it.
> *
> * This is all pretty messy, but the mess occurs only in infrequent corner
> * cases, so it seems better than holding the MultiXactGenLock for a long
> * time on every multixact creation.
> */

With 64-bit offsets, can we assume that it never wraps around? We often
treat 2^64 as "large enough that we'll never run out", e.g. LSNs are
also assumed to never wrap around. I think that would be a safe
assumption here too.

If we accept that, we don't need to worry about case 3 anymore. But if
we upgrade wrapped-around members files by just renaming them, there
could still be a members array where we had skipped offset 0, and
reading that after the upgrade might get confused. We could continue to
ignore a 0 XID in the members array like the comment says; I think that
would be enough. But yeah, maybe it's better to bite the bullet in
pg_upgrade and squeeze those out.

Does your upgrade test suite include case 3, where the next multixact's
offset is 1?

Can we remove MaybeExtendOffsetSlru() now? There are a bunch of other
comments and checks that talk about binary-upgraded values too that we
can hopefully clean up now.

If we are to parse the member segments in detail in upgrade anyway, I'd
be tempted to make some further changes / optimizations:

- You could leave out all locking XID members in upgrade, because
they're not relevant after upgrade any more (all the XIDs will be
committed or aborted and have released the locks; we require prepared
transactions to be completed before upgrading too). It'd be enough to
include actual UPDATE/DELETE XIDs.

- The way we determine the length of the members array by looking at the
next multixid's offset is a bit complicated. We could have one extra
flag per XID in the members to indicate "this is the last member of this
multixid". That could either to replace the current mechanism of looking
at the next offset, or be just an additional cross-check.

- Do we still like the "group" representation, with 4 bytes of flags
followed by 4 XIDs? I wonder if it'd be better to just store 5 bytes per
XID unaligned.

- A more radical idea: There can be only one updating XID in one
multixid. We could store that directly in the offsets SLRU, and keep
only the locking XIDs in members. That way, the members SLRU would
become less critical; it could be safely reset on crash for example
(except for prepared transactions, which could still be holding locks,
but it'd still be less serious). Separating correctness-critical data
from more ephemeral state is generally a good idea.

I'm not insisting on any of these changes, just some things that might
be worth considering if we're rewriting the SLRUs on upgrade anyway.

--
Heikki Linnakangas
Neon (https://neon.tech)


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-11-15 16:19:07
Message-ID: CACG=ezb680eb=JXh1ns=t5eGH3h9y-uTfT4tf3Xc8t2UH2q6tQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 15 Nov 2024 at 14:06, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> Hmm, so if I understand correctly, this is related to how we determine
> the length of the members array, by looking at the next multixid's
> offset. This is explained in GetMultiXactIdMembers:
>
Correct.

> If we accept that, we don't need to worry about case 3 anymore. But if
> we upgrade wrapped-around members files by just renaming them, there
> could still be a members array where we had skipped offset 0, and
> reading that after the upgrade might get confused. We could continue to
> ignore a 0 XID in the members array like the comment says; I think that
> would be enough. But yeah, maybe it's better to bite the bullet in
> pg_upgrade and squeeze those out.
>
Correct. I couldn't explain this better. I'm more for the squeeze those
out. Overwise, we're ending up in adding another hack in multixact, but one
of the benefits from switching to 64-bits, it should make XID's logic more
straight forward. After all, mxact juggling in pg_upgrade is one time
inconvenience.

>
> Does your upgrade test suite include case 3, where the next multixact's
> offset is 1?
>
Not exactly.

simple
Latest checkpoint's NextMultiXactId: 119441
Latest checkpoint's NextMultiOffset: 5927049

offset-wrap
Latest checkpoint's NextMultiXactId: 119441
Latest checkpoint's NextMultiOffset: 5591183

multi-wrap
Latest checkpoint's NextMultiXactId: 82006
Latest checkpoint's NextMultiOffset: 7408811

offset-multi-wrap
Latest checkpoint's NextMultiXactId: 52146
Latest checkpoint's NextMultiOffset: 5591183

You want test case where NextMultiOffset will be 1?

> Can we remove MaybeExtendOffsetSlru() now? There are a bunch of other
> comments and checks that talk about binary-upgraded values too that we
> can hopefully clean up now.
>
Yes, technically we can. But this is kinda unrelated to the offsets and
will make the patch set significantly complicated, thus more complicated to
review and less likely to be committed. Again, I'm not opposing the idea,
I'm not sure if it is worth to do it right now.

>
> If we are to parse the member segments in detail in upgrade anyway, I'd
> be tempted to make some further changes / optimizations:
>
> - You could leave out all locking XID members in upgrade, because
> they're not relevant after upgrade any more (all the XIDs will be
> committed or aborted and have released the locks; we require prepared
> transactions to be completed before upgrading too). It'd be enough to
> include actual UPDATE/DELETE XIDs.
>
> - The way we determine the length of the members array by looking at the
> next multixid's offset is a bit complicated. We could have one extra
> flag per XID in the members to indicate "this is the last member of this
> multixid". That could either to replace the current mechanism of looking
> at the next offset, or be just an additional cross-check.
>
> - Do we still like the "group" representation, with 4 bytes of flags
> followed by 4 XIDs? I wonder if it'd be better to just store 5 bytes per
> XID unaligned.
>
Not really. But I would leave it for next iteration - switching multi to 64
bit. I already have some drafts for this. In any case, we'll must do
adjustments in pg_upgrade again. My goal is to move towards 64 XIDs, but
with the small steps, and I plan changes in "group" representation in
combination with switching multi to 64 bit. This seems a bit more
appropriate in my view.

As for your optimization suggestions, I like them. I don’t against them,
but I’m afraid to disrupt the clarity of thought, especially since the
algorithm is not the simplest.

--
Best regards,
Maxim Orlov.


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-11-18 13:22:38
Message-ID: CACG=ezZGQFBb0yepka8hU2BmJ48ujt3xa+aYLNL0BQPx0vqwZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Shame on me! I've sent an erroneous patch set. Version 7 is defective. Here
is the proper version v8 with minor refactoring in segresize.c.

Also, I rename bump cat version patch into txt in order not to break cfbot.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v8-0005-TEST-bump-catver.patch.txt text/plain 1.1 KB
v8-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 13.3 KB
generate.sh application/x-sh 2.1 KB
test.sh application/x-sh 2.1 KB
multixids.py text/x-python-script 1.6 KB
v8-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v8-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch application/octet-stream 8.8 KB
v8-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 18.3 KB

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-11-19 17:53:49
Message-ID: CACG=ezajc_Pcqmy6fcq-N8+LzCRMzOzJzez2_BgHEu-6RVJtKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oops! Sorry for the noise. I've must have been overworking yesterday and
messed up the working branches. v7 was a correct set and v8 don't. Here is
the correction with extended Perl test.

The test itself is in src/bin/pg_upgrade/t/005_offset.pl It is rather heavy
and took about 45 minutes on my i5 with 2.7 Gb data generated. Basically,
each test here is creating a cluster and fill it with multixacts. Thus,
dozens of segments are created using two methods. One is with prepared
transactions, and it creates, roughly, the same amount of segments for
members and for offsets. The other one is based on Heikki's multixids.py
and creates more members than offsets. I've used both of these methods to
generate as much diverse data as possible.

Here is how I test this patch set:

1. You need two pg clusters: the "old" one, i.e. without patch set, and
the "new" with patch set v9 applied.
2. Apply v9-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch.txt
to the "old" and "new" clusters. Note, this is only patch required for
"old" cluster. This will allow you to create a cluster with non-standard
initial multixact and multixact offset. Unfortunately, this patch was not
did not arouse public interest since it is assumed that there is similar
functionality to the pg_resetwal utility. But similar is not mean equal.
See, pg_resetwal must be used after cluster init, thus, we step into some
problems with vacuum and some SLRU segments must be filled with zeroes.
Also, template0 datminmxid must be manually updated. So, in me view,
using this patch is justified and very handy here.
3. Also, apply all the "TEST" (0006 and 0007) patches to the "new"
cluster.
4. Build "old" and "new" pg clusters.
5. Run the test with: PROVE_TESTS=t/005_offset.pl PG_TEST_NOCLEAN=1
oldinstall=/home/orlov/proj/OFFSET3/pgsql-old make check -s -C
src/bin/pg_upgrade/
6. In my case, it took around 45 minutes and generate roughly 2.7 Gb of
data.

"TEST" patches, of course, are for the test purposes and not to be
committed.

In src/bin/pg_upgrade/t/005_offset.pl I try to consider next cases:

- Basic sanity checks.
Here I test various initial multi and offset values (including
wraparound) and see how appropriate segments are generated.
- pg_upgarde tests.
Here is oldinstall ENV is for. Run pg_upgrade for old cluster with multi
and offset values just like in previous step. i.e. with various
combinations.
- Self pg_upgarde.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v9-0005-TEST-initdb-option-to-initialize-cluster-with-non.patch.txt text/plain 25.4 KB
v9-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch application/octet-stream 8.8 KB
v9-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 12.9 KB
v9-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v9-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 17.9 KB
v9-0007-TEST-bump-catver.patch.txt text/plain 1.1 KB
v9-0006-TEST-add-src-bin-pg_upgrade-t-005_offset.pl.patch.txt text/plain 13.5 KB

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-12-18 10:21:35
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for working on this!

On 19/11/2024 19:53, Maxim Orlov wrote:
> The test itself is in src/bin/pg_upgrade/t/005_offset.pl
> <http://005_offset.pl> It is rather heavy and took about 45 minutes on
> my i5 with 2.7 Gb data generated. Basically, each test here is creating
> a cluster and fill it with multixacts. Thus, dozens of segments are
> created using two methods. One is with prepared transactions, and it
> creates, roughly, the same amount of segments for members and for
> offsets. The other one is based on Heikki's multixids.py and creates
> more members than offsets. I've used both of these methods to generate
> as much diverse data as possible.
>
> Here is how I test this patch set:
>
> 1. You need two pg clusters: the "old" one, i.e. without patch set, and
> the "new" with patch set v9 applied.
> 2. Apply v9-0005-TEST-initdb-option-to-initialize-cluster-with-
> non.patch.txt to the "old" and "new" clusters. Note, this is only
> patch required for "old" cluster. This will allow you to create a
> cluster with non-standard initial multixact and multixact offset.
> Unfortunately, this patch was not did not arouse public interest
> since it is assumed that there is similar functionality to the
> pg_resetwal utility. But similar is not mean equal. See, pg_resetwal
> must be used after cluster init, thus, we step into some problems
> with vacuum and some SLRU segments must be filled with zeroes. Also,
> template0 datminmxidmust be manually updated. So, in me view, using
> this patch is justifiedand very handy here.
> 3. Also, apply all the "TEST" (0006 and 0007) patches to the "new" cluster.
> 4. Build "old" and "new" pg clusters.
> 5. Run the test with: PROVE_TESTS=t/005_offset.pl
> <http://005_offset.pl> PG_TEST_NOCLEAN=1 oldinstall=/home/orlov/
> proj/OFFSET3/pgsql-old make check -s -C src/bin/pg_upgrade/
> 6. In my case, it took around 45 minutes and generate roughly 2.7 Gb of
> data.
>
> "TEST" patches, of course, are for the test purposes and not to be
> committed.
>
> In src/bin/pg_upgrade/t/005_offset.pl <http://005_offset.pl> I try to
> consider next cases:
>
> * Basic sanity checks.
> Here I test various initial multi and offset values (including
> wraparound) and see how appropriate segments are generated.
> * pg_upgarde tests.
> Here is oldinstall ENV is for. Run pg_upgrade for old cluster with
> multi and offset values just like in previous step. i.e. with
> various combinations.
> * Self pg_upgarde.

Attached is some more cleanup on top of patch set v9, removing more dead
stuff related to wraparound. I also removed the oldestOffsetKnown
variable and related code. It was needed to deal with clusters upgraded
from buggy 9.3 and 9.4 era versions, but now that pg_upgrade will
rewrite the SLRUs, it's no longer needed.

Does the pg_upgrade code work though, if you have that buggy situation
where oldestOffsetKnown == false ?

>
> if (!TransactionIdIsValid(*xactptr))
> {
> /* Corner case 3: we must be looking at unused slot zero */
> Assert(offset == 0);
> continue;
> }

After upgrade, this corner case 3 would *not* happen on offset == 0. So
looks like we're still missing test coverage for this upgrade corner case.

I'm still felt pretty uneasy about the pg_upgrade code. It's
complicated, and the way it rewrites offsets and members separately and
page at a time is quite different from the normal codepaths in
multixact.c, so it's a bit hard to see if it's handling all those corner
cases the same way. I rewrito that so that it's easier to understand,
IMHO anyway. The code for reading the old format and writing the new
format is now more decoupled. The code for reading the old format is in
a separate source file, multixact_old.c. The interface to that is the
GetOldMultiXactIdSingleMember() that returns the updating member of a
given multixid, much like the GetMultiXactIdMembers() backend function.
The conversion routine calls that for each multixid, and write it back
out in the new format, one multixid at a time.

The new code now "squeezes out" locking-only XIDs, keeping only the
updating ones. Not that important, but with this new code structure, it
was trivial and even easier to do than retaining all the XIDs.

Now that the offsets are rewritten one by one, we don't need the
"special case 3" in GetMultiXactIdMembers. The upgrade process removes
that special case.

I tried to keep my changes sepearate from your patches in the attached
patch series. This needs some more cleanup and squashing before
committing, but I think we're getting close.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
v10-0001-Use-64-bit-format-output-for-multixact-offsets.patch text/x-patch 9.0 KB
v10-0002-Use-64-bit-multixact-offsets.patch text/x-patch 13.0 KB
v10-0003-Make-pg_upgrade-convert-multixact-offsets.patch text/x-patch 17.9 KB
v10-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch text/x-patch 8.8 KB
v10-0005-Fixup-setting-oldestOffset.patch text/x-patch 1.4 KB
v10-0006-Remove-some-dead-code-related-to-handling-member.patch text/x-patch 3.9 KB
v10-0007-add-fixme-comment-about-pg_upgrade.patch text/x-patch 1.0 KB
v10-0008-Remove-code-to-deal-with-old-9.3-and-9.3-era-bro.patch text/x-patch 8.1 KB
v10-0009-Move-some-macros-to-deal-with-pg_multixact-on-di.patch text/x-patch 8.2 KB
v10-0010-rewrite-pg_upgrade-code.patch text/x-patch 44.3 KB
v10-0011-TEST-initdb-option-to-initialize-cluster-with-no.patch text/x-patch 25.5 KB
v10-0012-TEST-add-src-bin-pg_upgrade-t-005_offset.pl.patch text/x-patch 13.5 KB
v10-0013-Initialize-old-test-clusters-with-checksums.patch text/x-patch 749 bytes
v10-0014-TEST-bump-catver.patch text/x-patch 1.1 KB

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2024-12-27 17:09:56
Message-ID: CACG=ezbKwypBp=14q9+hMQApus3=1hKxJ9x1+KinUhtT48570Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 18 Dec 2024 at 13:21, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

>
> Attached is some more cleanup on top of patch set v9, removing more dead
> stuff related to wraparound. I also removed the oldestOffsetKnown
> variable and related code. It was needed to deal with clusters upgraded
> from buggy 9.3 and 9.4 era versions, but now that pg_upgrade will
> rewrite the SLRUs, it's no longer needed.
>
Yep, multixact.c looks correct to me. As for "XXX could use
SimpleLruTruncate()", yes, for sure.
Actually, xl_multixact_truncate.startTruncMemb is also no longer needed, is
it?

> Does the pg_upgrade code work though, if you have that buggy situation
> where oldestOffsetKnown == false ?
>
> >
> > if (!TransactionIdIsValid(*xactptr))
> > {
> > /* Corner case 3: we must be looking at unused
> slot zero */
> > Assert(offset == 0);
> > continue;
> > }
>
> After upgrade, this corner case 3 would *not* happen on offset == 0. So
> looks like we're still missing test coverage for this upgrade corner case.
>
Am I understanding correctly that you want to have a test corresponding to
the buggy 9.3 and 9.4 era versions?
Do you think we could imitate this scenario on a current master branch like
that:
1) generate a couple of offsets segments for the first table;
2) generate more segments for a second table;
3) drop first table;
4) stop pg cluster;
5) remove pg_multixact/offsets/0000
6) upgrade?

PFA, v10-0016-TEST-try-to-replicate-buggy-oldest-offset.patch
This test will fail now, for an obvious reason, but is this case a relevant
one?

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v10-64-bit-mxoff.zip application/zip 51.3 KB

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-01-01 22:12:36
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27/12/2024 19:09, Maxim Orlov wrote:
>
> On Wed, 18 Dec 2024 at 13:21, Heikki Linnakangas <hlinnaka(at)iki(dot)fi
> <mailto:hlinnaka(at)iki(dot)fi>> wrote:
> Does the pg_upgrade code work though, if you have that buggy situation
> where oldestOffsetKnown == false ?

...

> >
> >               if (!TransactionIdIsValid(*xactptr))
> >               {
> >                       /* Corner case 3: we must be looking at
> unused slot zero */
> >                       Assert(offset == 0);
> >                       continue;
> >               }
>
> After upgrade, this corner case 3 would *not* happen on offset == 0. So
> looks like we're still missing test coverage for this upgrade corner
> case.
>
> Am I understanding correctly that you want to have a test corresponding
> to the buggy 9.3 and 9.4 era versions?

No, those were two different things. I think there might be two things
wrong here:

1. I suspect pg_upgrade might not correctly handle the situation where
oldestOffsetKnown==false, and

2. The above assertion in "corner case 3" would not hold. It seems that
we don't have a test case for it, or it would've hit the assertion.

Now that I think about it, yes, a test case for 1. would be good too.
But I was talking about 2.

> Do you think we could imitate this scenario on a current master branch
> like that:
> 1) generate a couple of offsets segments for the first table;
> 2) generate more segments for a second table;
> 3) drop first table;
> 4) stop pg cluster;
> 5) remove pg_multixact/offsets/0000
> 6) upgrade?

I don't remember off the top of my head.

It might be best to just refuse the upgrade if oldestOffsetKnown==false.
It's a very ancient corner case. It seems reasonable to require you to
upgrade to a newer minor version and run VACUUM before upgrading. IIRC
that sets oldestOffsetKnown.

--
Heikki Linnakangas
Neon (https://neon.tech)


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-01-07 10:23:59
Message-ID: CACG=ezZwdvsijzuXE3hex3xHcoz75EQYBXRTsQJVwbo5J5sS3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2 Jan 2025 at 01:12, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

>
> It might be best to just refuse the upgrade if oldestOffsetKnown==false.
> It's a very ancient corner case. It seems reasonable to require you to
> upgrade to a newer minor version and run VACUUM before upgrading. IIRC
> that sets oldestOffsetKnown.
>

I agree. After all, we do already have a ready-made solution in the form
of a vacuum, do we?

If I understand all this multixact_old.c machinery correctly, in case of
oldestOffsetKnown==false
we should fail with "could not open file" or offset will be 0 in
GetOldMultiXactIdSingleMember.
So, I suppose we can put an analogue of SimpleLruDoesPhysicalPageExist call
in the beginning
of GetOldMultiXactIdSingleMember. And if either
SimpleLruDoesPhysicalPageExist return false
or a corresponding offset will be 0 we have to bail out with "oldest offset
does not exist, consider
running vacuum before pg_upgrdade" or smth. Please, correct me if I'm wrong.

--
Best regards,
Maxim Orlov.


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-01-16 13:32:09
Message-ID: CACG=ezbs912S58=uR17b4w8uuWv1=OcCRaTW_OWdFm4+tXZA6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Looks like there is a bit of a pause in the discussion. Here is a small
update. Consider v12.
No major changes, rebase to the actual master and a squash of multiple
commits to make a
patch set easy to reviewer.

AFAICs, we are reached a consensus on a core patch for switching to 64 bits
offsets. The
only concern is about more comprehensive test coverage for pg_upgrade, is
it?

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v12-0007-TEST-bump-catver.patch.txt text/plain 1.1 KB
v12-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v12-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 31.3 KB
v12-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 36.2 KB
v12-0005-TEST-add-src-bin-pg_upgrade-t-005_offset.pl.patch application/octet-stream 13.5 KB
v12-0004-TEST-initdb-option-to-initialize-cluster-with-no.patch application/octet-stream 25.4 KB
v12-0006-TEST-try-to-replicate-buggy-oldest-offset.patch application/octet-stream 2.1 KB

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-01-21 03:35:33
Message-ID: CAGjGUA+BfcWyccNN4=tHsW_E-koRxbg8h8ut6hjvPsHMgmek6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HI Maxim
> Looks like there is a bit of a pause in the discussion. Here is a small
update. Consider v12.
> No major changes, rebase to the actual master and a squash of multiple
commits to make a
> patch set easy to reviewer.

> AFAICs, we are reached a consensus on a core patch for switching to 64
bits offsets. The
> only concern is about more comprehensive test coverage for pg_upgrade, is
it?
Agree ,When upgrading meets extremes (oldestOffsetKnown==false.) Just
follow the solution mentioned by Heikki Linnakangas.

Thanks

On Thu, Jan 16, 2025 at 9:32 PM Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:

> Looks like there is a bit of a pause in the discussion. Here is a small
> update. Consider v12.
> No major changes, rebase to the actual master and a squash of multiple
> commits to make a
> patch set easy to reviewer.
>
> AFAICs, we are reached a consensus on a core patch for switching to 64
> bits offsets. The
> only concern is about more comprehensive test coverage for pg_upgrade, is
> it?
>
> --
> Best regards,
> Maxim Orlov.
>


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-01-29 14:04:28
Message-ID: CACG=ezYbYO_KHWdeDedbDcY0tOS0JfaqBxG3=bG5+DdsDK4MpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a v13 version with small changes to make cf bot happy.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v13-0005-TEST-add-src-bin-pg_upgrade-t-005_offset.pl.patch.txt text/plain 13.5 KB
v13-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 31.3 KB
v13-0004-TEST-initdb-option-to-initialize-cluster-with-no.patch.txt text/plain 25.4 KB
v13-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v13-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 36.2 KB
v13-0006-TEST-try-to-replicate-buggy-oldest-offset.patch.txt text/plain 2.1 KB
v13-0007-TEST-bump-catver.patch.txt text/plain 1.1 KB

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-03-07 11:30:17
Message-ID: CACG=ezYpZRPwoRCz_h3Qerd3XJNdpTHCpwGbZphNdy26tA4_qQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a rebase, v14.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v14-0005-TEST-initdb-option-to-initialize-cluster-with-no.patch.txt text/plain 25.4 KB
v14-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 9.0 KB
v14-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch application/octet-stream 8.8 KB
v14-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 13.0 KB
v14-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 17.9 KB
v14-0006-TEST-add-src-bin-pg_upgrade-t-006_offset.pl.patch.txt text/plain 13.5 KB
v14-0007-TEST-bump-catver.patch.txt text/plain 1.1 KB

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-03-27 02:03:22
Message-ID: CAGjGUA+qrPeotW-LFLLQ3u6eFv0kbB7fcGWOmuSay_xB9mBc6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HI Maxim Orlov Heikki Linnakangas
Thank you for working on it,A few more days is a code freeze.It seems
like things have been quiet for a while, but I believe implementing xid64
is absolutely necessary. For instance, there’s often concern about
performance jitter caused by frequent freezes. If xid64 is implemented, the
freeze process can be smoother.

Thanks

On Fri, Mar 7, 2025 at 7:30 PM Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:

> Here is a rebase, v14.
>
> --
> Best regards,
> Maxim Orlov.
>


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-04-01 18:25:16
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/03/2025 13:30, Maxim Orlov wrote:
> Here is a rebase, v14.

Thanks! I did some manual testing of this. I created a little helper
function to consume multixids, to test the autovacuum behavior, and
found one issue:

If you consume a lot of multixid members space, by creating lots of
multixids with huge number of members in each, you can end up with a
very bloated members SLRU, and autovacuum is in no hurry to clean it up.
Here's what I did:

1. Installed attached test module
2. Ran "select consume_multixids(10000, 100000);" many times
3. ran:

$ du -h data/pg_multixact/members/
26G data/pg_multixact/members/

When I run "vacuum freeze; select * from pg_database;", I can see that
'datminmxid' for the current database is advanced. However, autovacuum
is in no hurry to vacuum 'template0' and 'template1', so
pg_multixact/members/ does not get truncated. Eventually, when
autovacuum_multixact_freeze_max_age is reached, it presumably will, but
you will run out of disk space before that.

There is this check for members size at the end of SetOffsetVacuumLimit():

>
> /*
> * Do we need autovacuum? If we're not sure, assume yes.
> */
> return !oldestOffsetKnown ||
> (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);

And the caller (SetMultiXactIdLimit()) will in fact signal the
autovacuum launcher after "vacuum freeze" because of that. But
autovacuum launcher will look at the datminmxid / relminmxid values, see
that they are well within autovacuum_multixact_freeze_max_age, and do
nothing.

This is a very extreme case, but clearly the code to signal autovacuum
launcher, and the freeze age cutoff that autovacuum then uses, are not
in sync.

This patch removed MultiXactMemberFreezeThreshold(), per my suggestion,
but we threw this baby with the bathwater. We discussed that in this
thread, but didn't come up with any solution. But ISTM we still need
something like MultiXactMemberFreezeThreshold() to trigger autovacuum
freezing if the members have grown too large.

--
Heikki Linnakangas
Neon (https://neon.tech)


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-04-01 18:26:27
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/04/2025 21:25, Heikki Linnakangas wrote:
> On 07/03/2025 13:30, Maxim Orlov wrote:
>> Here is a rebase, v14.
>
> Thanks! I did some manual testing of this. I created a little helper
> function to consume multixids, to test the autovacuum behavior, and
> found one issue:

Forgot to attach the test function I used, here it is.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
0001-TEST-add-consume_multixids-function.patch text/x-patch 5.0 KB

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Maxim Orlov <orlovmg(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-04-14 09:04:40
Message-ID: CAGjGUAK1Cpa5Sm0cJVX7iG-uKXpNC26BjWq8yJS40ZpCdenC4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HI Heikki
Pls Kindly help to create task in https://commitfest.postgresql.org/53/
,I can not found this path in
Commitfest 2025-07

Thanks

On Wed, Apr 2, 2025 at 2:26 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 01/04/2025 21:25, Heikki Linnakangas wrote:
> > On 07/03/2025 13:30, Maxim Orlov wrote:
> >> Here is a rebase, v14.
> >
> > Thanks! I did some manual testing of this. I created a little helper
> > function to consume multixids, to test the autovacuum behavior, and
> > found one issue:
>
> Forgot to attach the test function I used, here it is.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-04-16 08:11:46
Message-ID: CACG=ezYiLzCSo43uTPzAeq8ZCnGSkAsw061=oMyw5J1NUZ9Jwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I moved the topic to the next commitfest.

--
Best regards,
Maxim Orlov.


From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-04-29 12:01:09
Message-ID: CAExHW5soKc9mhLhroi__yrPD-ymkFbz=e5hyZ34iqjM-cdK9_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Maxim,

On Wed, Apr 16, 2025 at 1:42 PM Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:

> I moved the topic to the next commitfest.
>

I am reviewing these patches.

I notice that transam/README does not mention multixact except one place in
section "Transaction Emulation during Recovery". I expected it to document
how pg_multixact/members and pg_multixact/offset are used and what is their
layout. It's not the responsibility of this patchset to document it, but it
will be good if we add a section about multixacts in transam/README. It
will make reviews easier.

--
Best Wishes,
Ashutosh Bapat


From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: make mxidoff 64 bits
Date: 2025-05-27 15:18:26
Message-ID: CACG=ezaLdhpFY9_qr7B6s7kkg_=s5S_ZD2=dsSeBNgdWWWuKbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 29 Apr 2025 at 15:01, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

>
> I notice that transam/README does not mention multixact except one place
> in section "Transaction Emulation during Recovery". I expected it to
> document how pg_multixact/members and pg_multixact/offset are used and what
> is their layout. It's not the responsibility of this patchset to document
> it, but it will be good if we add a section about multixacts in
> transam/README. It will make reviews easier.
>

Yeah, I agree, this is a big overlook, I think. Anyone who tries to
understand how pg_multixact works has to deal with the code.
Certainly, we need to address this issue.

But for now, here is a new rebase @ 70a13c528b6e382a381f.
The only change is that following commits 15a79c7 and a0ed19e, we must also
switch to PRIu64 format.

--
Best regards,
Maxim Orlov.

Attachment Content-Type Size
v15-0005-TEST-initdb-option-to-initialize-cluster-with-no.patch.txt text/plain 25.3 KB
v15-0003-Make-pg_upgrade-convert-multixact-offsets.patch application/octet-stream 17.7 KB
v15-0004-Get-rid-of-MultiXactMemberFreezeThreshold-call.patch application/octet-stream 8.8 KB
v15-0002-Use-64-bit-multixact-offsets.patch application/octet-stream 13.0 KB
v15-0001-Use-64-bit-format-output-for-multixact-offsets.patch application/octet-stream 7.2 KB
v15-0007-TEST-bump-catver.patch.txt text/plain 1.1 KB
v15-0006-TEST-add-src-bin-pg_upgrade-t-006_offset.pl.patch.txt text/plain 14.0 KB