Quick Links

Re: MultiXact\SLRU buffers configuration

Lists:	pgsql-hackers

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	MultiXact\SLRU buffers configuration
Date:	2020-05-08 16:36:40
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, hackers!

*** The problem ***
I'm investigating some cases of reduced database performance due to MultiXactOffsetLock contention (80% MultiXactOffsetLock, 20% IO DataFileRead).
The problem manifested itself during index repack and constraint validation. Both being effectively full table scans.
The database workload contains a lot of select for share\select for update queries. I've tried to construct synthetic world generator and could not achieve similar lock configuration: I see a lot of different locks in wait events, particularly a lot more MultiXactMemberLocks. But from my experiments with synthetic workload, contention of MultiXactOffsetLock can be reduced by increasing NUM_MXACTOFFSET_BUFFERS=8 to bigger numbers.

*** Question 1 ***
Is it safe to increase number of buffers of MultiXact\All SLRUs, recompile and run database as usual?
I cannot experiment much with production. But I'm mostly sure that bigger buffers will solve the problem.

*** Question 2 ***
Probably, we could do GUCs for SLRU sizes? Are there any reasons not to do them configurable? I think multis, clog, subtransactions and others will benefit from bigger buffer. But, probably, too much of knobs can be confusing.

*** Question 3 ***
MultiXact offset lock is always taken as exclusive lock. It turns MultiXact Offset subsystem to single threaded. If someone have good idea how to make it more concurrency-friendly, I'm willing to put some efforts into this.
Probably, I could just add LWlocks for each offset buffer page. Is it something worth doing? Or are there any hidden cavers and difficulties?

Thanks!

Best regards, Andrey Borodin.

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-11 11:17:58
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 8 мая 2020 г., в 21:36, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
> *** The problem ***
> I'm investigating some cases of reduced database performance due to MultiXactOffsetLock contention (80% MultiXactOffsetLock, 20% IO DataFileRead).
> The problem manifested itself during index repack and constraint validation. Both being effectively full table scans.
> The database workload contains a lot of select for share\select for update queries. I've tried to construct synthetic world generator and could not achieve similar lock configuration: I see a lot of different locks in wait events, particularly a lot more MultiXactMemberLocks. But from my experiments with synthetic workload, contention of MultiXactOffsetLock can be reduced by increasing NUM_MXACTOFFSET_BUFFERS=8 to bigger numbers.
>
> *** Question 1 ***
> Is it safe to increase number of buffers of MultiXact\All SLRUs, recompile and run database as usual?
> I cannot experiment much with production. But I'm mostly sure that bigger buffers will solve the problem.
>
> *** Question 2 ***
> Probably, we could do GUCs for SLRU sizes? Are there any reasons not to do them configurable? I think multis, clog, subtransactions and others will benefit from bigger buffer. But, probably, too much of knobs can be confusing.
>
> *** Question 3 ***
> MultiXact offset lock is always taken as exclusive lock. It turns MultiXact Offset subsystem to single threaded. If someone have good idea how to make it more concurrency-friendly, I'm willing to put some efforts into this.
> Probably, I could just add LWlocks for each offset buffer page. Is it something worth doing? Or are there any hidden cavers and difficulties?

I've created benchmark[0] imitating MultiXact pressure on my laptop: 7 clients are concurrently running select "select * from table where primary_key = ANY ($1) for share" where $1 is array of identifiers so that each tuple in a table is locked by different set of XIDs. During this benchmark I observe contention of MultiXactControlLock in pg_stat_activity

пятница, 8 мая 2020 г. 15:08:37 (every 1s)

Finally, the benchmark is measuring time to execute select for update 42 times.

I've went ahead and created 3 patches:
1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
2. Reduce locking level to shared on read of MultiXactId members
3. Configurable cache size

I've found out that:
1. When MultiXact working set does not fit into buffers - benchmark results grow very high. Yet, very big buffers slow down benchmark too. For this benchmark optimal SLRU size id 32 pages for offsets and 64 pages for members (defaults are 8 and 16 respectively).
2. Lock optimisation increases performance by 5% on default SLRU sizes. Actually, benchmark does not explicitly read MultiXactId members, but when it replaces one with another - it have to read previous set. I understand that we can construct benchmark to demonstrate dominance of any algorithm and 5% of synthetic workload is not a very big number. But it just make sense to try to take shared lock for reading.
3. Manipulations with cache size do not affect benchmark anyhow. It's somewhat expected: benchmark is designed to defeat cache, either way OffsetControlLock would not be stressed.

For our workload, I think we will just increase numbers of SLRU sizes. But patchset may be useful for tuning and as a performance optimisation of MultiXact.

Also MultiXacts seems to be not very good fit into SLRU design. I think it would be better to use B-tree as a container. Or at least make MultiXact members extendable in-place (reserve some size when multixact is created).
When we want to extend number of locks for a tuple currently we will:
1. Iterate through all SLRU buffers for offsets to read current offset (with exclusive lock for offsets)
2. Iterate through all buffers for members to find current members (with exclusive lock for members)
3. Create new members array with +1 xid
4. Iterate through all cache members to find out maybe there are any such cache item as what we are going to create
5. iterate over 1 again for write
6. Iterate over 2 again for write

Obviously this does not scale well - we cannot increase SLRU sizes for too long.

Thanks! I'd be happy to hear any feedback.

Best regards, Andrey Borodin.

[0] https://github.com/x4m/multixact_stress

Attachment	Content-Type	Size
v1-0001-Add-GUCs-to-tune-MultiXact-SLRUs.patch	application/octet-stream	6.2 KB
v1-0002-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch	application/octet-stream	7.6 KB
v1-0003-Make-MultiXact-local-cache-size-configurable.patch	application/octet-stream	3.7 KB

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-13 18:08:37
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 11 мая 2020 г., в 16:17, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
> I've went ahead and created 3 patches:
> 1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
> 2. Reduce locking level to shared on read of MultiXactId members
> 3. Configurable cache size

I'm looking more at MultiXact and it seems to me that we have a race condition there.

When we create a new MultiXact we do:
1. Generate new MultiXactId under MultiXactGenLock
2. Record new mxid with members and offset to WAL
3. Write offset to SLRU under MultiXactOffsetControlLock
4. Write members to SLRU under MultiXactMemberControlLock

When we read MultiXact we do:
1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
3. Retrieve members from SLRU under MultiXactMemberControlLock
4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members list.

What am I missing?

Best regards, Andrey Borodin.

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	x4mmm(at)yandex-team(dot)ru
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-14 01:25:26
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 13 May 2020 23:08:37 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>
>
> > 11 мая 2020 г., в 16:17, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
> >
> > I've went ahead and created 3 patches:
> > 1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
> > 2. Reduce locking level to shared on read of MultiXactId members
> > 3. Configurable cache size
>
> I'm looking more at MultiXact and it seems to me that we have a race condition there.
>
> When we create a new MultiXact we do:
> 1. Generate new MultiXactId under MultiXactGenLock
> 2. Record new mxid with members and offset to WAL
> 3. Write offset to SLRU under MultiXactOffsetControlLock
> 4. Write members to SLRU under MultiXactMemberControlLock

But, don't we hold exclusive lock on the buffer through all the steps
above?

> When we read MultiXact we do:
> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
> 3. Retrieve members from SLRU under MultiXactMemberControlLock
> 4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members list.

So transactions never see such incomplete mxids, I believe.

> What am I missing?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-14 05:19:42
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 14 мая 2020 г., в 06:25, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> написал(а):
>
> At Wed, 13 May 2020 23:08:37 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>>
>>
>>> 11 мая 2020 г., в 16:17, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>>>
>>> I've went ahead and created 3 patches:
>>> 1. Configurable SLRU buffer sizes for MultiXacOffsets and MultiXactMembers
>>> 2. Reduce locking level to shared on read of MultiXactId members
>>> 3. Configurable cache size
>>
>> I'm looking more at MultiXact and it seems to me that we have a race condition there.
>>
>> When we create a new MultiXact we do:
>> 1. Generate new MultiXactId under MultiXactGenLock
>> 2. Record new mxid with members and offset to WAL
>> 3. Write offset to SLRU under MultiXactOffsetControlLock
>> 4. Write members to SLRU under MultiXactMemberControlLock
>
> But, don't we hold exclusive lock on the buffer through all the steps
> above?
Yes...Unless MultiXact is observed on StandBy. This could lead to observing inconsistent snapshot: one of lockers committed tuple delete, but standby sees it as alive.

>> When we read MultiXact we do:
>> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
>> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
>> 3. Retrieve members from SLRU under MultiXactMemberControlLock
>> 4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members list.
>
> So transactions never see such incomplete mxids, I believe.
I've observed sleep in step 2. I believe it's possible to observe special effects of step 4 too.
Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it hits hard on Standbys: if someone is locking whole table on primary - all seq scans on standbys follow him with MultiXactOffsetControlLock contention.

It looks like this:
0x00007fcd56896ff7 in __GI___select (nfds=nfds(at)entry=0, readfds=readfds(at)entry=0x0, writefds=writefds(at)entry=0x0, exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7ffd83376fe0) at ../sysdeps/unix/sysv/linux/select.c:41
#0 0x00007fcd56896ff7 in __GI___select (nfds=nfds(at)entry=0, readfds=readfds(at)entry=0x0, writefds=writefds(at)entry=0x0, exceptfds=exceptfds(at)entry=0x0, timeout=timeout(at)entry=0x7ffd83376fe0) at ../sysdeps/unix/sysv/linux/select.c:41
#1 0x000056186e0d54bd in pg_usleep (microsec=microsec(at)entry=1000) at ./build/../src/port/pgsleep.c:56
#2 0x000056186dd5edf2 in GetMultiXactIdMembers (from_pgupgrade=0 '\000', onlyLock=<optimized out>, members=0x7ffd83377080, multi=3106214809) at ./build/../src/backend/access/transam/multixact.c:1370
#3 GetMultiXactIdMembers () at ./build/../src/backend/access/transam/multixact.c:1202
#4 0x000056186dd2d2d9 in MultiXactIdGetUpdateXid (xmax=<optimized out>, t_infomask=<optimized out>) at ./build/../src/backend/access/heap/heapam.c:7039
#5 0x000056186dd35098 in HeapTupleGetUpdateXid (tuple=tuple(at)entry=0x7fcba3b63d58) at ./build/../src/backend/access/heap/heapam.c:7080
#6 0x000056186e0cd0f8 in HeapTupleSatisfiesMVCC (htup=<optimized out>, snapshot=0x56186f44a058, buffer=230684) at ./build/../src/backend/utils/time/tqual.c:1091
#7 0x000056186dd2d922 in heapgetpage (scan=scan(at)entry=0x56186f4c8e78, page=page(at)entry=3620) at ./build/../src/backend/access/heap/heapam.c:439
#8 0x000056186dd2ea7c in heapgettup_pagemode (key=0x0, nkeys=0, dir=ForwardScanDirection, scan=0x56186f4c8e78) at ./build/../src/backend/access/heap/heapam.c:1034
#9 heap_getnext (scan=scan(at)entry=0x56186f4c8e78, direction=direction(at)entry=ForwardScanDirection) at ./build/../src/backend/access/heap/heapam.c:1801
#10 0x000056186de84f51 in SeqNext (node=node(at)entry=0x56186f4a4f78) at ./build/../src/backend/executor/nodeSeqscan.c:81
#11 0x000056186de6a3f1 in ExecScanFetch (recheckMtd=0x56186de84ef0 <SeqRecheck>, accessMtd=0x56186de84f20 <SeqNext>, node=0x56186f4a4f78) at ./build/../src/backend/executor/execScan.c:97
#12 ExecScan (node=0x56186f4a4f78, accessMtd=0x56186de84f20 <SeqNext>, recheckMtd=0x56186de84ef0 <SeqRecheck>) at ./build/../src/backend/executor/execScan.c:164

Best regards, Andrey Borodin.

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	x4mmm(at)yandex-team(dot)ru
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-14 06:16:59
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 14 May 2020 10:19:42 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
> >> I'm looking more at MultiXact and it seems to me that we have a race condition there.
> >>
> >> When we create a new MultiXact we do:
> >> 1. Generate new MultiXactId under MultiXactGenLock
> >> 2. Record new mxid with members and offset to WAL
> >> 3. Write offset to SLRU under MultiXactOffsetControlLock
> >> 4. Write members to SLRU under MultiXactMemberControlLock
> >
> > But, don't we hold exclusive lock on the buffer through all the steps
> > above?
> Yes...Unless MultiXact is observed on StandBy. This could lead to observing inconsistent snapshot: one of lockers committed tuple delete, but standby sees it as alive.

Ah, right. I looked from GetNewMultiXactId. Actually
XLOG_MULTIXACT_CREATE_ID is not protected from concurrent reference to
the creating mxact id. And GetMultiXactIdMembers is considering that
case.

> >> When we read MultiXact we do:
> >> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
> >> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
> >> 3. Retrieve members from SLRU under MultiXactMemberControlLock
> >> 4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members list.
> >
> > So transactions never see such incomplete mxids, I believe.
> I've observed sleep in step 2. I believe it's possible to observe special effects of step 4 too.
> Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it hits hard on Standbys: if someone is locking whole table on primary - all seq scans on standbys follow him with MultiXactOffsetControlLock contention.

GetMultiXactIdMembers believes that 4 is successfully done if 2
returned valid offset, but actually that is not obvious.

If we add a single giant lock just to isolate ,say,
GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
unnecessarily. Perhaps we need finer-grained locking-key for standby
that works similary to buffer lock on primary, that doesn't cause
confilicts between irrelevant mxids.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-14 06:44:01
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 14 мая 2020 г., в 11:16, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> написал(а):
>
> At Thu, 14 May 2020 10:19:42 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>>>> I'm looking more at MultiXact and it seems to me that we have a race condition there.
>>>>
>>>> When we create a new MultiXact we do:
>>>> 1. Generate new MultiXactId under MultiXactGenLock
>>>> 2. Record new mxid with members and offset to WAL
>>>> 3. Write offset to SLRU under MultiXactOffsetControlLock
>>>> 4. Write members to SLRU under MultiXactMemberControlLock
>>>
>>> But, don't we hold exclusive lock on the buffer through all the steps
>>> above?
>> Yes...Unless MultiXact is observed on StandBy. This could lead to observing inconsistent snapshot: one of lockers committed tuple delete, but standby sees it as alive.
>
> Ah, right. I looked from GetNewMultiXactId. Actually
> XLOG_MULTIXACT_CREATE_ID is not protected from concurrent reference to
> the creating mxact id. And GetMultiXactIdMembers is considering that
> case.
>
>>>> When we read MultiXact we do:
>>>> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
>>>> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
>>>> 3. Retrieve members from SLRU under MultiXactMemberControlLock
>>>> 4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members list.
>>>
>>> So transactions never see such incomplete mxids, I believe.
>> I've observed sleep in step 2. I believe it's possible to observe special effects of step 4 too.
>> Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it hits hard on Standbys: if someone is locking whole table on primary - all seq scans on standbys follow him with MultiXactOffsetControlLock contention.
>
> GetMultiXactIdMembers believes that 4 is successfully done if 2
> returned valid offset, but actually that is not obvious.
>
> If we add a single giant lock just to isolate ,say,
> GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
> unnecessarily. Perhaps we need finer-grained locking-key for standby
> that works similary to buffer lock on primary, that doesn't cause
> confilicts between irrelevant mxids.
>
We can just replay members before offsets. If offset is already there - members are there too.
But I'd be happy if we could mitigate those 1000us too - with a hint about last maixd state in a shared MX state, for example.

Actually, if we read empty mxid array instead of something that is replayed just yet - it's not a problem of inconsistency, because transaction in this mxid could not commit before we started. ISTM.
So instead of fix, we, probably, can just add a comment. If this reasoning is correct.

Best regards, Andrey Borodin.

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	x4mmm(at)yandex-team(dot)ru
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-15 00:03:33
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
> > GetMultiXactIdMembers believes that 4 is successfully done if 2
> > returned valid offset, but actually that is not obvious.
> >
> > If we add a single giant lock just to isolate ,say,
> > GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
> > unnecessarily. Perhaps we need finer-grained locking-key for standby
> > that works similary to buffer lock on primary, that doesn't cause
> > confilicts between irrelevant mxids.
> >
> We can just replay members before offsets. If offset is already there - members are there too.
> But I'd be happy if we could mitigate those 1000us too - with a hint about last maixd state in a shared MX state, for example.

Generally in such cases, condition variables would work. In the
attached PoC, the reader side gets no penalty in the "likely" code
path. The writer side always calls ConditionVariableBroadcast but the
waiter list is empty in almost all cases. But I couldn't cause the
situation where the sleep 1000u is reached.

> Actually, if we read empty mxid array instead of something that is replayed just yet - it's not a problem of inconsistency, because transaction in this mxid could not commit before we started. ISTM.
> So instead of fix, we, probably, can just add a comment. If this reasoning is correct.

The step 4 of the reader side reads the members of the target mxid. It
is already written if the offset of the *next* mxid is filled-in.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
mxid_wait_instead_of_sleep.patch	text/x-patch	3.0 KB

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-15 09:01:46
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 15 мая 2020 г., в 05:03, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> написал(а):
>
> At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>>> GetMultiXactIdMembers believes that 4 is successfully done if 2
>>> returned valid offset, but actually that is not obvious.
>>>
>>> If we add a single giant lock just to isolate ,say,
>>> GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
>>> unnecessarily. Perhaps we need finer-grained locking-key for standby
>>> that works similary to buffer lock on primary, that doesn't cause
>>> confilicts between irrelevant mxids.
>>>
>> We can just replay members before offsets. If offset is already there - members are there too.
>> But I'd be happy if we could mitigate those 1000us too - with a hint about last maixd state in a shared MX state, for example.
>
> Generally in such cases, condition variables would work. In the
> attached PoC, the reader side gets no penalty in the "likely" code
> path. The writer side always calls ConditionVariableBroadcast but the
> waiter list is empty in almost all cases. But I couldn't cause the
> situation where the sleep 1000u is reached.
Thanks! That really looks like a good solution without magic timeouts. Beautiful!
I think I can create temporary extension which calls MultiXact API and tests edge-cases like this 1000us wait.
This extension will also be also useful for me to assess impact of bigger buffers, reduced read locking (as in my 2nd patch) and other tweaks.

>> Actually, if we read empty mxid array instead of something that is replayed just yet - it's not a problem of inconsistency, because transaction in this mxid could not commit before we started. ISTM.
>> So instead of fix, we, probably, can just add a comment. If this reasoning is correct.
>
> The step 4 of the reader side reads the members of the target mxid. It
> is already written if the offset of the *next* mxid is filled-in.
Most often - yes, but members are not guaranteed to be filled in order. Those who win MXMemberControlLock will write first.
But nobody can read members of MXID before it is returned. And its members will be written before returning MXID.

Best regards, Andrey Borodin.

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	x4mmm(at)yandex-team(dot)ru
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-05-20 04:54:04
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Fri, 15 May 2020 14:01:46 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>
>
> > 15 мая 2020 г., в 05:03, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> написал(а):
> >
> > At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
> >>> GetMultiXactIdMembers believes that 4 is successfully done if 2
> >>> returned valid offset, but actually that is not obvious.
> >>>
> >>> If we add a single giant lock just to isolate ,say,
> >>> GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
> >>> unnecessarily. Perhaps we need finer-grained locking-key for standby
> >>> that works similary to buffer lock on primary, that doesn't cause
> >>> confilicts between irrelevant mxids.
> >>>
> >> We can just replay members before offsets. If offset is already there - members are there too.
> >> But I'd be happy if we could mitigate those 1000us too - with a hint about last maixd state in a shared MX state, for example.
> >
> > Generally in such cases, condition variables would work. In the
> > attached PoC, the reader side gets no penalty in the "likely" code
> > path. The writer side always calls ConditionVariableBroadcast but the
> > waiter list is empty in almost all cases. But I couldn't cause the
> > situation where the sleep 1000u is reached.
> Thanks! That really looks like a good solution without magic timeouts. Beautiful!
> I think I can create temporary extension which calls MultiXact API and tests edge-cases like this 1000us wait.
> This extension will also be also useful for me to assess impact of bigger buffers, reduced read locking (as in my 2nd patch) and other tweaks.

Happy to hear that, It would need to use timeout just in case, though.

> >> Actually, if we read empty mxid array instead of something that is replayed just yet - it's not a problem of inconsistency, because transaction in this mxid could not commit before we started. ISTM.
> >> So instead of fix, we, probably, can just add a comment. If this reasoning is correct.
> >
> > The step 4 of the reader side reads the members of the target mxid. It
> > is already written if the offset of the *next* mxid is filled-in.
> Most often - yes, but members are not guaranteed to be filled in order. Those who win MXMemberControlLock will write first.
> But nobody can read members of MXID before it is returned. And its members will be written before returning MXID.

Yeah, right. Otherwise assertion failure happens.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	x4mmm(at)yandex-team(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-07-02 12:02:45
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 15 May 2020, at 02:03, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:

> Generally in such cases, condition variables would work. In the
> attached PoC, the reader side gets no penalty in the "likely" code
> path. The writer side always calls ConditionVariableBroadcast but the
> waiter list is empty in almost all cases. But I couldn't cause the
> situation where the sleep 1000u is reached.

The submitted patch no longer applies, can you please submit an updated
version? I'm marking the patch Waiting on Author in the meantime.

cheers ./daniel

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-07-08 07:03:54
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 2 июля 2020 г., в 17:02, Daniel Gustafsson <daniel(at)yesql(dot)se> написал(а):
>
>> On 15 May 2020, at 02:03, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
>> Generally in such cases, condition variables would work. In the
>> attached PoC, the reader side gets no penalty in the "likely" code
>> path. The writer side always calls ConditionVariableBroadcast but the
>> waiter list is empty in almost all cases. But I couldn't cause the
>> situation where the sleep 1000u is reached.
>
> The submitted patch no longer applies, can you please submit an updated
> version? I'm marking the patch Waiting on Author in the meantime.
Thanks, Daniel! PFA V2

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v2-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch	application/octet-stream	7.6 KB
v2-0002-Make-MultiXact-local-cache-size-configurable.patch	application/octet-stream	3.7 KB
v2-0003-Add-conditional-variable-to-wait-for-next-MultXac.patch	application/octet-stream	3.4 KB

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-08-02 21:30:21
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 8 Jul 2020, at 09:03, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
>
>> 2 июля 2020 г., в 17:02, Daniel Gustafsson <daniel(at)yesql(dot)se> написал(а):
>>
>>> On 15 May 2020, at 02:03, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>>
>>> Generally in such cases, condition variables would work. In the
>>> attached PoC, the reader side gets no penalty in the "likely" code
>>> path. The writer side always calls ConditionVariableBroadcast but the
>>> waiter list is empty in almost all cases. But I couldn't cause the
>>> situation where the sleep 1000u is reached.
>>
>> The submitted patch no longer applies, can you please submit an updated
>> version? I'm marking the patch Waiting on Author in the meantime.
> Thanks, Daniel! PFA V2

This version too has stopped applying according to the CFbot. I've moved it to
the next commitfest as we're out of time in this one and it was only pointed
out now, but kept it Waiting on Author.

cheers ./daniel

From:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-08-28 18:08:34
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08.07.2020 10:03, Andrey M. Borodin wrote:
>
>> 2 июля 2020 г., в 17:02, Daniel Gustafsson <daniel(at)yesql(dot)se> написал(а):
>>
>>> On 15 May 2020, at 02:03, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>>> Generally in such cases, condition variables would work. In the
>>> attached PoC, the reader side gets no penalty in the "likely" code
>>> path. The writer side always calls ConditionVariableBroadcast but the
>>> waiter list is empty in almost all cases. But I couldn't cause the
>>> situation where the sleep 1000u is reached.
>> The submitted patch no longer applies, can you please submit an updated
>> version? I'm marking the patch Waiting on Author in the meantime.
> Thanks, Daniel! PFA V2
>
> Best regards, Andrey Borodin.

1) The first patch is sensible and harmless, so I think it is ready for
committer. I haven't tested the performance impact, though.

2) I like the initial proposal to make various SLRU buffers
configurable, however, I doubt if it really solves the problem, or just
moves it to another place?

The previous patch you sent was based on some version that contained
changes for other slru buffers numbers: 'multixact_offsets_slru_buffers'
and 'multixact_members_slru_buffers'. Have you just forgot to attach
them? The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
Meanwhile, I attach the rebased patch to calm down the CFbot. The
changes are trivial.

2.1) I think that both min and max values for this parameter are too
extreme. Have you tested them?

+ &multixact_local_cache_entries,
+ 256, 2, INT_MAX / 2,

2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.

3) No changes for third patch. I just renamed it for consistency.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
v3-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch	text/x-patch	7.6 KB
v3-0002-Make-MultiXact-local-cache-size-configurable.patch	text/x-patch	3.4 KB
v3-0003-Add-conditional-variable-to-wait-for-next-MultXac.patch	text/x-patch	3.4 KB

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-09-28 14:41:41
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Anastasia!

> 28 авг. 2020 г., в 23:08, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> написал(а):
>
> 1) The first patch is sensible and harmless, so I think it is ready for committer. I haven't tested the performance impact, though.
>
> 2) I like the initial proposal to make various SLRU buffers configurable, however, I doubt if it really solves the problem, or just moves it to another place?
>
> The previous patch you sent was based on some version that contained changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' and 'multixact_members_slru_buffers'. Have you just forgot to attach them? The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
> Meanwhile, I attach the rebased patch to calm down the CFbot. The changes are trivial.
>
> 2.1) I think that both min and max values for this parameter are too extreme. Have you tested them?
>
> + &multixact_local_cache_entries,
> + 256, 2, INT_MAX / 2,
>
> 2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.
>
> 3) No changes for third patch. I just renamed it for consistency.

Thank you for your review.

Indeed, I had 4th patch with tests, but these tests didn't work well: I still did not manage to stress SLRUs to reproduce problem from production...

You are absolutely correct in point 2: I did only tests with sane values. And observed extreme performance degradation with values ~ 64 megabytes. I do not know which highest values should we pick? 1Gb? Or highest possible functioning value?

I greatly appreciate your review, sorry for so long delay. Thanks!

Best regards, Andrey Borodin.

From:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-07 14:23:28
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28.09.2020 17:41, Andrey M. Borodin wrote:
> Hi, Anastasia!
>
>> 28 авг. 2020 г., в 23:08, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> написал(а):
>>
>> 1) The first patch is sensible and harmless, so I think it is ready for committer. I haven't tested the performance impact, though.
>>
>> 2) I like the initial proposal to make various SLRU buffers configurable, however, I doubt if it really solves the problem, or just moves it to another place?
>>
>> The previous patch you sent was based on some version that contained changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' and 'multixact_members_slru_buffers'. Have you just forgot to attach them? The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
>> Meanwhile, I attach the rebased patch to calm down the CFbot. The changes are trivial.
>>
>> 2.1) I think that both min and max values for this parameter are too extreme. Have you tested them?
>>
>> + &multixact_local_cache_entries,
>> + 256, 2, INT_MAX / 2,
>>
>> 2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.
>>
>> 3) No changes for third patch. I just renamed it for consistency.
> Thank you for your review.
>
> Indeed, I had 4th patch with tests, but these tests didn't work well: I still did not manage to stress SLRUs to reproduce problem from production...
>
> You are absolutely correct in point 2: I did only tests with sane values. And observed extreme performance degradation with values ~ 64 megabytes. I do not know which highest values should we pick? 1Gb? Or highest possible functioning value?

I would go with the values that we consider adequate for this setting.
As I see, there is no strict rule about it in guc.c and many variables
have large border values. Anyway, we need to explain it at least in the
documentation and code comments.

It seems that the default was conservative enough, so it can be also a
minimal value too. As for maximum, can you provide any benchmark
results? If we have a peak and a noticeable performance degradation
after that, we can use it to calculate the preferable maxvalue.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-26 01:05:26
Message-ID:	CAPpHfduekT=BrM3rLqvo4ajzKYmDk+6aCoYJrNHhK0Je+v2z=Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Mon, Sep 28, 2020 at 5:41 PM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > 28 авг. 2020 г., в 23:08, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> написал(а):
> >
> > 1) The first patch is sensible and harmless, so I think it is ready for committer. I haven't tested the performance impact, though.
> >
> > 2) I like the initial proposal to make various SLRU buffers configurable, however, I doubt if it really solves the problem, or just moves it to another place?
> >
> > The previous patch you sent was based on some version that contained changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' and 'multixact_members_slru_buffers'. Have you just forgot to attach them? The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
> > Meanwhile, I attach the rebased patch to calm down the CFbot. The changes are trivial.
> >
> > 2.1) I think that both min and max values for this parameter are too extreme. Have you tested them?
> >
> > + &multixact_local_cache_entries,
> > + 256, 2, INT_MAX / 2,
> >
> > 2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.
> >
> > 3) No changes for third patch. I just renamed it for consistency.
>
> Thank you for your review.
>
> Indeed, I had 4th patch with tests, but these tests didn't work well: I still did not manage to stress SLRUs to reproduce problem from production...
>
> You are absolutely correct in point 2: I did only tests with sane values. And observed extreme performance degradation with values ~ 64 megabytes. I do not know which highest values should we pick? 1Gb? Or highest possible functioning value?
>
> I greatly appreciate your review, sorry for so long delay. Thanks!

I took a look at this patchset.

The 1st and 3rd patches look good to me. I made just minor improvements.
1) There is still a case when SimpleLruReadPage_ReadOnly() relocks the
SLRU lock, which is already taken in exclusive mode. I've evaded this
by passing the lock mode as a parameter to
SimpleLruReadPage_ReadOnly().
3) CHECK_FOR_INTERRUPTS() is not needed anymore, because it's called
inside ConditionVariableSleep() if needed. Also, no current wait
events use slashes, and I don't think we should introduce slashes
here. Even if we should, then we should also rename existing wait
events to be consistent with a new one. So, I've renamed the new wait
event to remove the slash.

Regarding the patch 2. I see the current documentation in the patch
doesn't explain to the user how to set the new parameter. I think we
should give users an idea what workloads need high values of
multixact_local_cache_entries parameter and what doesn't. Also, we
should explain the negative aspects of high values
multixact_local_cache_entries. Ideally, we should get the advantage
without overloading users with new nontrivial parameters, but I don't
have a particular idea here.

I'd like to propose committing 1 and 3, but leave 2 for further review.

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
v4-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offsets.patch	application/octet-stream	12.2 KB
v4-0002-Make-MultiXact-local-cache-size-configurable.patch	application/octet-stream	3.7 KB
v4-0003-Add-conditional-variable-to-wait-for-next-MultXact-o.patch	application/octet-stream	4.2 KB

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-26 15:45:29
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 26 окт. 2020 г., в 06:05, Alexander Korotkov <aekorotkov(at)gmail(dot)com> написал(а):
>
> Hi!
>
> On Mon, Sep 28, 2020 at 5:41 PM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>> 28 авг. 2020 г., в 23:08, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> написал(а):
>>>
>>> 1) The first patch is sensible and harmless, so I think it is ready for committer. I haven't tested the performance impact, though.
>>>
>>> 2) I like the initial proposal to make various SLRU buffers configurable, however, I doubt if it really solves the problem, or just moves it to another place?
>>>
>>> The previous patch you sent was based on some version that contained changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' and 'multixact_members_slru_buffers'. Have you just forgot to attach them? The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
>>> Meanwhile, I attach the rebased patch to calm down the CFbot. The changes are trivial.
>>>
>>> 2.1) I think that both min and max values for this parameter are too extreme. Have you tested them?
>>>
>>> + &multixact_local_cache_entries,
>>> + 256, 2, INT_MAX / 2,
>>>
>>> 2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.
>>>
>>> 3) No changes for third patch. I just renamed it for consistency.
>>
>> Thank you for your review.
>>
>> Indeed, I had 4th patch with tests, but these tests didn't work well: I still did not manage to stress SLRUs to reproduce problem from production...
>>
>> You are absolutely correct in point 2: I did only tests with sane values. And observed extreme performance degradation with values ~ 64 megabytes. I do not know which highest values should we pick? 1Gb? Or highest possible functioning value?
>>
>> I greatly appreciate your review, sorry for so long delay. Thanks!
>
> I took a look at this patchset.
>
> The 1st and 3rd patches look good to me. I made just minor improvements.
> 1) There is still a case when SimpleLruReadPage_ReadOnly() relocks the
> SLRU lock, which is already taken in exclusive mode. I've evaded this
> by passing the lock mode as a parameter to
> SimpleLruReadPage_ReadOnly().
> 3) CHECK_FOR_INTERRUPTS() is not needed anymore, because it's called
> inside ConditionVariableSleep() if needed. Also, no current wait
> events use slashes, and I don't think we should introduce slashes
> here. Even if we should, then we should also rename existing wait
> events to be consistent with a new one. So, I've renamed the new wait
> event to remove the slash.
>
> Regarding the patch 2. I see the current documentation in the patch
> doesn't explain to the user how to set the new parameter. I think we
> should give users an idea what workloads need high values of
> multixact_local_cache_entries parameter and what doesn't. Also, we
> should explain the negative aspects of high values
> multixact_local_cache_entries. Ideally, we should get the advantage
> without overloading users with new nontrivial parameters, but I don't
> have a particular idea here.
>
> I'd like to propose committing 1 and 3, but leave 2 for further review.

Thanks for your review, Alexander!
+1 for avoiding double locking in SimpleLruReadPage_ReadOnly().
Other changes seem correct to me too.

I've tried to find optimal value for cache size and it seems to me that it affects multixact scalability much less than sizes of offsets\members buffers. I concur that patch 2 of the patchset does not seem documented enough.

Best regards, Andrey Borodin.

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-27 17:02:20
Message-ID:	CAPpHfduT0Kb+8fMt4t0EF5G_3hig+wYgeRadQnNq20F4VxP7LA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 26, 2020 at 6:45 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> Thanks for your review, Alexander!
> +1 for avoiding double locking in SimpleLruReadPage_ReadOnly().
> Other changes seem correct to me too.
>
>
> I've tried to find optimal value for cache size and it seems to me that it affects multixact scalability much less than sizes of offsets\members buffers. I concur that patch 2 of the patchset does not seem documented enough.

Thank you. I've made a few more minor adjustments to the patchset.
I'm going to push 0001 and 0003 if there are no objections.

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
v5-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch	application/octet-stream	12.9 KB
v5-0002-Make-MultiXact-local-cache-size-configurable.patch	application/octet-stream	3.7 KB
v5-0003-Add-conditional-variable-to-wait-for-next-MultXact.patch	application/octet-stream	4.4 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-27 17:23:26
Message-ID:	CAPpHfdsUtSGxRrKn65F_wrE66sNztNY4TpDGDHh2kvAZZDOMqA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 27, 2020 at 8:02 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> On Mon, Oct 26, 2020 at 6:45 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > Thanks for your review, Alexander!
> > +1 for avoiding double locking in SimpleLruReadPage_ReadOnly().
> > Other changes seem correct to me too.
> >
> >
> > I've tried to find optimal value for cache size and it seems to me that it affects multixact scalability much less than sizes of offsets\members buffers. I concur that patch 2 of the patchset does not seem documented enough.
>
> Thank you. I've made a few more minor adjustments to the patchset.
> I'm going to push 0001 and 0003 if there are no objections.

I get that patchset v5 doesn't pass the tests due to typo in assert.
The fixes version is attached.

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
v6-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch	application/octet-stream	12.9 KB
v6-0002-Make-MultiXact-local-cache-size-configurable.patch	application/octet-stream	3.7 KB
v6-0003-Add-conditional-variable-to-wait-for-next-MultXact.patch	application/octet-stream	4.4 KB

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-28 01:36:51
Message-ID:	20201028013651.de5cj2xadgmba5nf@development
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 27, 2020 at 08:23:26PM +0300, Alexander Korotkov wrote:
>On Tue, Oct 27, 2020 at 8:02 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> On Mon, Oct 26, 2020 at 6:45 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>> > Thanks for your review, Alexander!
>> > +1 for avoiding double locking in SimpleLruReadPage_ReadOnly().
>> > Other changes seem correct to me too.
>> >
>> >
>> > I've tried to find optimal value for cache size and it seems to me that it affects multixact scalability much less than sizes of offsets\members buffers. I concur that patch 2 of the patchset does not seem documented enough.
>>
>> Thank you. I've made a few more minor adjustments to the patchset.
>> I'm going to push 0001 and 0003 if there are no objections.
>
>I get that patchset v5 doesn't pass the tests due to typo in assert.
>The fixes version is attached.
>

I did a quick review on this patch series. A couple comments:

0001
----

This looks quite suspicious to me - SimpleLruReadPage_ReadOnly is
changed to return information about what lock was used, merely to allow
the callers to do an Assert() that the value is not LW_NONE.

IMO we could achieve exactly the same thing by passing a simple flag
that would say 'make sure we got a lock' or something like that. In
fact, aren't all callers doing the assert? That'd mean we can just do
the check always, without the flag. (I see GetMultiXactIdMembers does
two calls and only checks the second result, but I wonder if that's
intended or omission.)

In any case, it'd make the lwlock.c changes unnecessary, I think.

0002
----

Specifies the number cached MultiXact by backend. Any SLRU lookup ...

should be 'number of cached ...'

0003
----

* Conditional variable for waiting till the filling of the next multixact
* will be finished. See GetMultiXactIdMembers() and RecordNewMultiXact()
* for details.

Perhaps 'till the next multixact is filled' or 'gets full' would be
better. Not sure.

This thread started with a discussion about making the SLRU sizes
configurable, but this patch version only adds a local cache. Does this
achieve the same goal, or would we still gain something by having GUCs
for the SLRUs?

If we're claiming this improves performance, it'd be good to have some
workload demonstrating that and measurements. I don't see anything like
that in this thread, so it's a bit hand-wavy. Can someone share details
of such workload (even synthetic one) and some basic measurements?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-28 07:34:58
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tomas, thanks for looking into this!

> 28 окт. 2020 г., в 06:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>
>
> This thread started with a discussion about making the SLRU sizes
> configurable, but this patch version only adds a local cache. Does this
> achieve the same goal, or would we still gain something by having GUCs
> for the SLRUs?
>
> If we're claiming this improves performance, it'd be good to have some
> workload demonstrating that and measurements. I don't see anything like
> that in this thread, so it's a bit hand-wavy. Can someone share details
> of such workload (even synthetic one) and some basic measurements?

All patches in this thread aim at the same goal: improve performance in presence of MultiXact locks contention.
I could not build synthetical reproduction of the problem, however I did some MultiXact stressing here [0]. It's a clumsy test program, because it still is not clear to me which parameters of workload trigger MultiXact locks contention. In generic case I was encountering other locks like *GenLock: XidGenLock, MultixactGenLock etc. Yet our production system encounters this problem approximately once in a month through this year.

Test program locks for share different set of tuples in presence of concurrent full scans.
To produce a set of locks we choose one of 14 bits. If a row number has this bit set to 0 we add lock it.
I've been measuring time to lock all rows 3 time for each of 14 bits, observing total time to set all locks.
During the test I was observing locks in pg_stat_activity, if they did not contain enough MultiXact locks I was tuning parameters further (number of concurrent clients, number of bits, select queries etc).

Why is it so complicated? It seems that other reproductions of a problem were encountering other locks.

Lets describe patches in this thread from the POV of these test.

*** Configurable SLRU buffers for MultiXact members and offsets.
From tests it is clear that high and low values for these buffers affect the test time. Here are time for a one test run with different offsets and members sizes [1]
Our production currently runs with (numbers are pages of buffers)
+#define NUM_MXACTOFFSET_BUFFERS 32
+#define NUM_MXACTMEMBER_BUFFERS 64
And, looking back to incidents in summer and fall 2020, seems like it helped mostly.

But it's hard to give some tuning advises based on test results. Values (32,64) produce 10% better result than current hardcoded values (8,16). In generic case this is not what someone should tune first.

*** Configurable caches of MultiXacts.
Tests were specifically designed to beat caches. So, according to test the bigger cache is - the more time it takes to accomplish the test [2].
Anyway cache is local for backend and it's purpose is deduplication of written MultiXacts, not enhancing reads.

*** Using advantage of SimpleLruReadPage_ReadOnly() in MultiXacts.
This simply aligns MultiXact with Subtransactions and CLOG. Other SLRUs already take advantage of reading SLRU with shared lock.
On synthetical tests without background selects this patch adds another ~4.7% of performance [3] against [4]. This improvement seems consistent between different parameter values, yet within measurements deviation (see difference between warmup run [5] and closing run [6]).
All in all, these attempts to measure impact are hand-wavy too. But it makes sense to use consistent approach among similar subsystems (MultiXacts, Subtrans, CLOG etc).

*** Reduce sleep in GetMultiXactIdMembers() on standby.
The problem with pg_usleep(1000L) within GetMultiXactIdMembers() manifests on standbys during contention of MultiXactOffsetControlLock. It's even harder to reproduce.
Yet it seems obvious that reducing sleep to shorter time frame will make count of sleeping backend smaller.

For consistency I've returned patch with SLRU buffer configs to patchset (other patches are intact). But I'm mostly concerned about patches 1 and 3.

Thanks!

Best regards, Andrey Borodin.

[0] https://github.com/x4m/multixact_stress
[1] https://github.com/x4m/multixact_stress/blob/master/testresults.txt#L22-L39
[2] https://github.com/x4m/multixact_stress/blob/master/testresults.txt#L83-L99
[3] https://github.com/x4m/multixact_stress/blob/master/testresults.txt#L9
[4] https://github.com/x4m/multixact_stress/blob/master/testresults.txt#L29
[5] https://github.com/x4m/multixact_stress/blob/master/testresults.txt#L3
[6] https://github.com/x4m/multixact_stress/blob/master/testresults.txt#L19

Attachment	Content-Type	Size
v6-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch	application/octet-stream	12.9 KB
v6-0004-Add-GUCs-to-tune-MultiXact-SLRUs.patch	application/octet-stream	6.3 KB
v6-0003-Add-conditional-variable-to-wait-for-next-MultXac.patch	application/octet-stream	4.4 KB
v6-0002-Make-MultiXact-local-cache-size-configurable.patch	application/octet-stream	3.7 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-28 19:36:39
Message-ID:	CAPpHfdsC7-ZkaKD9xYoVejdD5OeGc3jr7du3-84nkHuNGuKnKQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Tomas!

Thank you for your review.

On Wed, Oct 28, 2020 at 4:36 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I did a quick review on this patch series. A couple comments:
>
>
> 0001
> ----
>
> This looks quite suspicious to me - SimpleLruReadPage_ReadOnly is
> changed to return information about what lock was used, merely to allow
> the callers to do an Assert() that the value is not LW_NONE.

Yes, but this is not merely to allow callers to do an Assert().
Sometimes in multixacts it could save us some relocks. So, we can
skip relocking lock to exclusive mode if it's in exclusive already.
Adding Assert() to every caller is probably overkill.

> IMO we could achieve exactly the same thing by passing a simple flag
> that would say 'make sure we got a lock' or something like that. In
> fact, aren't all callers doing the assert? That'd mean we can just do
> the check always, without the flag. (I see GetMultiXactIdMembers does
> two calls and only checks the second result, but I wonder if that's
> intended or omission.)

Having just the flag is exactly what the original version by Andrey
did. But if we have to read two multixact offsets pages or multiple
members page in one GetMultiXactIdMembers()), then it does relocks
from exclusive mode to exclusive mode. I decide that once we decide
to optimize this locks, this situation is nice to evade.

> In any case, it'd make the lwlock.c changes unnecessary, I think.

I agree that it would be better to not touch lwlock.c. But I didn't
find a way to evade relocking exclusive mode to exclusive mode without
touching lwlock.c or making code cumbersome in other places.

> 0002
> ----
>
> Specifies the number cached MultiXact by backend. Any SLRU lookup ...
>
> should be 'number of cached ...'

Sounds reasonable.

> 0003
> ----
>
> * Conditional variable for waiting till the filling of the next multixact
> * will be finished. See GetMultiXactIdMembers() and RecordNewMultiXact()
> * for details.
>
> Perhaps 'till the next multixact is filled' or 'gets full' would be
> better. Not sure.

Sounds reasonable as well.

------
Regards,
Alexander Korotkov

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-28 23:21:01
Message-ID:	20201028232101.4awgmc2cu2cemu7p@development
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 28, 2020 at 10:36:39PM +0300, Alexander Korotkov wrote:
>Hi, Tomas!
>
>Thank you for your review.
>
>On Wed, Oct 28, 2020 at 4:36 AM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> I did a quick review on this patch series. A couple comments:
>>
>>
>> 0001
>> ----
>>
>> This looks quite suspicious to me - SimpleLruReadPage_ReadOnly is
>> changed to return information about what lock was used, merely to allow
>> the callers to do an Assert() that the value is not LW_NONE.
>
>Yes, but this is not merely to allow callers to do an Assert().
>Sometimes in multixacts it could save us some relocks. So, we can
>skip relocking lock to exclusive mode if it's in exclusive already.
>Adding Assert() to every caller is probably overkill.
>

Hmm, OK. That can only happen in GetMultiXactIdMembers, which is the
only place where we do retry, right? Do we actually know this makes any
measurable difference? It seems we're mostly imagining that it might
help, but we don't have any actual proof of that (e.g. a workload which
we might benchmark). Or am I missing something?

For me, the extra conditions make it way harder to reason about the
behavior of the code, and I can't convince myself it's worth it.

>> IMO we could achieve exactly the same thing by passing a simple flag
>> that would say 'make sure we got a lock' or something like that. In
>> fact, aren't all callers doing the assert? That'd mean we can just do
>> the check always, without the flag. (I see GetMultiXactIdMembers does
>> two calls and only checks the second result, but I wonder if that's
>> intended or omission.)
>
>Having just the flag is exactly what the original version by Andrey
>did. But if we have to read two multixact offsets pages or multiple
>members page in one GetMultiXactIdMembers()), then it does relocks
>from exclusive mode to exclusive mode. I decide that once we decide
>to optimize this locks, this situation is nice to evade.
>
>> In any case, it'd make the lwlock.c changes unnecessary, I think.
>
>I agree that it would be better to not touch lwlock.c. But I didn't
>find a way to evade relocking exclusive mode to exclusive mode without
>touching lwlock.c or making code cumbersome in other places.
>

Hmm. OK.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-28 23:32:43
Message-ID:	20201028233243.ygm6yqlynkqpzekr@development
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Wed, Oct 28, 2020 at 12:34:58PM +0500, Andrey Borodin wrote:
>Tomas, thanks for looking into this!
>
>> 28 окт. 2020 г., в 06:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>>
>>
>> This thread started with a discussion about making the SLRU sizes
>> configurable, but this patch version only adds a local cache. Does this
>> achieve the same goal, or would we still gain something by having GUCs
>> for the SLRUs?
>>
>> If we're claiming this improves performance, it'd be good to have some
>> workload demonstrating that and measurements. I don't see anything like
>> that in this thread, so it's a bit hand-wavy. Can someone share details
>> of such workload (even synthetic one) and some basic measurements?
>
>All patches in this thread aim at the same goal: improve performance in presence of MultiXact locks contention.
>I could not build synthetical reproduction of the problem, however I did some MultiXact stressing here [0]. It's a clumsy test program, because it still is not clear to me which parameters of workload trigger MultiXact locks contention. In generic case I was encountering other locks like *GenLock: XidGenLock, MultixactGenLock etc. Yet our production system encounters this problem approximately once in a month through this year.
>
>Test program locks for share different set of tuples in presence of concurrent full scans.
>To produce a set of locks we choose one of 14 bits. If a row number has this bit set to 0 we add lock it.
>I've been measuring time to lock all rows 3 time for each of 14 bits, observing total time to set all locks.
>During the test I was observing locks in pg_stat_activity, if they did not contain enough MultiXact locks I was tuning parameters further (number of concurrent clients, number of bits, select queries etc).
>
>Why is it so complicated? It seems that other reproductions of a problem were encountering other locks.
>

It's not my intention to be mean or anything like that, but to me this
means we don't really understand the problem we're trying to solve. Had
we understood it, we should be able to construct a workload reproducing
the issue ...

I understand what the individual patches are doing, and maybe those
changes are desirable in general. But without any benchmarks from a
plausible workload I find it hard to convince myself that:

(a) it actually will help with the issue you're observing on production

and

(b) it's actually worth the extra complexity (e.g. the lwlock changes)

I'm willing to invest some of my time into reviewing/testing this, but I
think we badly need better insight into the issue, so that we can build
a workload reproducing it. Perhaps collecting some perf profiles and a
sample of the queries might help, but I assume you already tried that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-29 07:08:21
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 29 окт. 2020 г., в 04:32, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>
> It's not my intention to be mean or anything like that, but to me this
> means we don't really understand the problem we're trying to solve. Had
> we understood it, we should be able to construct a workload reproducing
> the issue ...
>
> I understand what the individual patches are doing, and maybe those
> changes are desirable in general. But without any benchmarks from a
> plausible workload I find it hard to convince myself that:
>
> (a) it actually will help with the issue you're observing on production
>
> and
> (b) it's actually worth the extra complexity (e.g. the lwlock changes)
>
>
> I'm willing to invest some of my time into reviewing/testing this, but I
> think we badly need better insight into the issue, so that we can build
> a workload reproducing it. Perhaps collecting some perf profiles and a
> sample of the queries might help, but I assume you already tried that.

Thanks, Tomas! This totally makes sense.

Indeed, collecting queries did not help yet. We have loadtest environment equivalent to production (but with 10x less shards), copy of production workload queries. But the problem does not manifest there.
Why do I think the problem is in MultiXacts?
Here is a chart with number of wait events on each host

During the problem MultiXactOffsetControlLock and SLRURead dominate all other lock types. After primary switchover to another node SLRURead continued for a bit there, then disappeared.
Backtraces on standbys during the problem show that most of backends are sleeping in pg_sleep(1000L) and are not included into wait stats on these charts.

Currently I'm considering writing test that directly calls MultiXactIdExpand(), MultiXactIdCreate(), and GetMultiXactIdMembers() from an extension. How do you think, would benchmarks in such tests be meaningful?

Thanks!

Best regards, Andrey Borodin.

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-10-29 13:49:33
Message-ID:	20201029134933.xd4mh2cofuf6tdfz@development
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 29, 2020 at 12:08:21PM +0500, Andrey Borodin wrote:
>
>
>> 29 окт. 2020 г., в 04:32, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>>
>> It's not my intention to be mean or anything like that, but to me this
>> means we don't really understand the problem we're trying to solve. Had
>> we understood it, we should be able to construct a workload reproducing
>> the issue ...
>>
>> I understand what the individual patches are doing, and maybe those
>> changes are desirable in general. But without any benchmarks from a
>> plausible workload I find it hard to convince myself that:
>>
>> (a) it actually will help with the issue you're observing on production
>>
>> and
>> (b) it's actually worth the extra complexity (e.g. the lwlock changes)
>>
>>
>> I'm willing to invest some of my time into reviewing/testing this, but I
>> think we badly need better insight into the issue, so that we can build
>> a workload reproducing it. Perhaps collecting some perf profiles and a
>> sample of the queries might help, but I assume you already tried that.
>
>Thanks, Tomas! This totally makes sense.
>
>Indeed, collecting queries did not help yet. We have loadtest environment equivalent to production (but with 10x less shards), copy of production workload queries. But the problem does not manifest there.
>Why do I think the problem is in MultiXacts?
>Here is a chart with number of wait events on each host
>
>
>During the problem MultiXactOffsetControlLock and SLRURead dominate all other lock types. After primary switchover to another node SLRURead continued for a bit there, then disappeared.

OK, so most of this seems to be due to SLRURead and
MultiXactOffsetControlLock. Could it be that there were too many
multixact members, triggering autovacuum to prevent multixact
wraparound? That might generate a lot of IO on the SLRU. Are you
monitoring the size of the pg_multixact directory?

>Backtraces on standbys during the problem show that most of backends are sleeping in pg_sleep(1000L) and are not included into wait stats on these charts.
>
>Currently I'm considering writing test that directly calls MultiXactIdExpand(), MultiXactIdCreate(), and GetMultiXactIdMembers() from an extension. How do you think, would benchmarks in such tests be meaningful?
>

I don't know. I'd much rather have a SQL-level benchmark than an
extension doing this kind of stuff.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-11-02 12:45:33
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 29 окт. 2020 г., в 18:49, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>
> On Thu, Oct 29, 2020 at 12:08:21PM +0500, Andrey Borodin wrote:
>>
>>
>>> 29 окт. 2020 г., в 04:32, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>>>
>>> It's not my intention to be mean or anything like that, but to me this
>>> means we don't really understand the problem we're trying to solve. Had
>>> we understood it, we should be able to construct a workload reproducing
>>> the issue ...
>>>
>>> I understand what the individual patches are doing, and maybe those
>>> changes are desirable in general. But without any benchmarks from a
>>> plausible workload I find it hard to convince myself that:
>>>
>>> (a) it actually will help with the issue you're observing on production
>>>
>>> and
>>> (b) it's actually worth the extra complexity (e.g. the lwlock changes)
>>>
>>>
>>> I'm willing to invest some of my time into reviewing/testing this, but I
>>> think we badly need better insight into the issue, so that we can build
>>> a workload reproducing it. Perhaps collecting some perf profiles and a
>>> sample of the queries might help, but I assume you already tried that.
>>
>> Thanks, Tomas! This totally makes sense.
>>
>> Indeed, collecting queries did not help yet. We have loadtest environment equivalent to production (but with 10x less shards), copy of production workload queries. But the problem does not manifest there.
>> Why do I think the problem is in MultiXacts?
>> Here is a chart with number of wait events on each host
>>
>>
>> During the problem MultiXactOffsetControlLock and SLRURead dominate all other lock types. After primary switchover to another node SLRURead continued for a bit there, then disappeared.
>
> OK, so most of this seems to be due to SLRURead and
> MultiXactOffsetControlLock. Could it be that there were too many
> multixact members, triggering autovacuum to prevent multixact
> wraparound? That might generate a lot of IO on the SLRU. Are you
> monitoring the size of the pg_multixact directory?

Yes, we had some problems with 'multixact "members" limit exceeded' long time ago.
We tuned autovacuum_multixact_freeze_max_age = 200000000 and vacuum_multixact_freeze_table_age = 75000000 (half of defaults) and since then did not ever encounter this problem (~5 months).
But the MultiXactOffsetControlLock problem persists. Partially the problem was solved by adding more shards. But when one of shards encounters a problem it's either MultiXacts or vacuum causing relation truncation (unrelated to this thread).

Thanks!

Best regards, Andrey Borodin.

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-11-10 00:13:22
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

After the issue reported in [1] got fixed, I've restarted the multi-xact
stress test, hoping to reproduce the issue. But so far no luck :-(

I've started slightly different tests on two machines - on one machine
I've done this:

a) init.sql

create table t (a int);
insert into t select i from generate_series(1,100000000) s(i);
alter table t add primary key (a);

b) select.sql

SELECT * FROM t
WHERE a = (1+mod(abs(hashint4(extract(epoch from now())::int)),
100000000)) FOR KEY SHARE;

c) pgbench -n -c 32 -j 8 -f select.sql -T $((24*3600)) test

The idea is to have large table and many clients hitting a small random
subset of the rows. A sample of wait events from ~24h run looks like this:

So, nothing particularly interesting - there certainly are not many wait
events related to SLRU.

On the other machine I did this:

a) init.sql
create table t (a int primary key);
insert into t select i from generate_series(1,1000) s(i);

b) select.sql
select * from t for key share;

c) pgbench -n -c 32 -j 8 -f select.sql -T $((24*3600)) test

and the wait events (24h run too) look like this:

Also nothing particularly interesting - few SLRU wait events.

So unfortunately this does not really reproduce the SLRU locking issues
you're observing - clearly, there has to be something else triggering
it. Perhaps this workload is too simplistic, or maybe we need to run
different queries. Or maybe the hw needs to be somewhat different (more
CPUs? different storage?)

[1]
https://www.postgresql.org/message-id/20201104013205.icogbi773przyny5@development

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-11-10 06:16:49
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 10 нояб. 2020 г., в 05:13, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> написал(а):
> After the issue reported in [1] got fixed, I've restarted the multi-xact
> stress test, hoping to reproduce the issue. But so far no luck :-(

Tomas, many thanks for looking into this. I figured out that to make multixact sets bigger transactions must hang for a while and lock large set of tuples. But not continuous range to avoid locking on buffer_content.
I did not manage to implement this via pgbench, that's why I was trying to hack on separate go program. But, essentially, no luck either.
I was observing something resemblant though

пятница, 8 мая 2020 г. 15:08:37 (every 1s)

but this picture was not stable.

How do you collect wait events for aggregation? just insert into some table with cron?

Thanks!

Best regards, Andrey Borodin.

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-11-10 18:07:07
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/10/20 7:16 AM, Andrey Borodin wrote:
>
>
>> 10 нояб. 2020 г., в 05:13, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> написал(а):
>> After the issue reported in [1] got fixed, I've restarted the multi-xact
>> stress test, hoping to reproduce the issue. But so far no luck :-(
>
>
> Tomas, many thanks for looking into this. I figured out that to make multixact sets bigger transactions must hang for a while and lock large set of tuples. But not continuous range to avoid locking on buffer_content.
> I did not manage to implement this via pgbench, that's why I was trying to hack on separate go program. But, essentially, no luck either.
> I was observing something resemblant though
>
> пятница, 8 мая 2020 г. 15:08:37 (every 1s)
>
> pid | wait_event | wait_event_type | state | query
> -------+----------------------------+-----------------+--------+----------------------------------------------------
> 41344 | ClientRead | Client | idle | insert into t1 select generate_series(1,1000000,1)
> 41375 | MultiXactOffsetControlLock | LWLock | active | select * from t1 where i = ANY ($1) for share
> 41377 | MultiXactOffsetControlLock | LWLock | active | select * from t1 where i = ANY ($1) for share
> 41378 | | | active | select * from t1 where i = ANY ($1) for share
> 41379 | MultiXactOffsetControlLock | LWLock | active | select * from t1 where i = ANY ($1) for share
> 41381 | | | active | select * from t1 where i = ANY ($1) for share
> 41383 | MultiXactOffsetControlLock | LWLock | active | select * from t1 where i = ANY ($1) for share
> 41385 | MultiXactOffsetControlLock | LWLock | active | select * from t1 where i = ANY ($1) for share
> (8 rows)
>
> but this picture was not stable.
>

Seems we haven't made much progress in reproducing the issue :-( I guess
we'll need to know more about the machine where this happens. Is there
anything special about the hardware/config? Are you monitoring size of
the pg_multixact directory?

> How do you collect wait events for aggregation? just insert into some table with cron?
>

No, I have a simple shell script (attached) sampling data from
pg_stat_activity regularly. Then I load it into a table and aggregate to
get a summary.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
collect-wait-events.sh	application/x-shellscript	234 bytes

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-11-10 18:41:39
Message-ID:	CA+hUKGJa=2TszMLy=eM5GyzzEYAyvN2Nsjh2nEVF8hBF99zX0w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 11, 2020 at 7:07 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> Seems we haven't made much progress in reproducing the issue :-( I guess
> we'll need to know more about the machine where this happens. Is there
> anything special about the hardware/config? Are you monitoring size of
> the pg_multixact directory?

Which release was the original problem seen on?

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-11-13 11:49:36
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 10 нояб. 2020 г., в 23:07, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> написал(а):
>
> On 11/10/20 7:16 AM, Andrey Borodin wrote:
>>
>>
>> but this picture was not stable.
>>
>
> Seems we haven't made much progress in reproducing the issue :-( I guess
> we'll need to know more about the machine where this happens. Is there
> anything special about the hardware/config? Are you monitoring size of
> the pg_multixact directory?

It's Ubuntu 18.04.4 LTS, Intel Xeon E5-2660 v4, 56 CPU cores with 256Gb of RAM.
PostgreSQL 10.14, compiled by gcc 7.5.0, 64-bit

No, unfortunately we do not have signals for SLRU sizes.
3.5Tb mdadm raid10 over 28 SSD drives, 82% full.

First incident triggering investigation was on 2020-04-19, at that time cluster was running on PG 10.11. But I think it was happening before.

I'd say nothing special...

>
>> How do you collect wait events for aggregation? just insert into some table with cron?
>>
>
> No, I have a simple shell script (attached) sampling data from
> pg_stat_activity regularly. Then I load it into a table and aggregate to
> get a summary.

Thanks!

Best regards, Andrey Borodin.

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-08 16:05:34
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 13/11/2020 à 12:49, Andrey Borodin a écrit :
>
>> 10 нояб. 2020 г., в 23:07, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> написал(а):
>>
>> On 11/10/20 7:16 AM, Andrey Borodin wrote:
>>>
>>> but this picture was not stable.
>>>
>> Seems we haven't made much progress in reproducing the issue :-( I guess
>> we'll need to know more about the machine where this happens. Is there
>> anything special about the hardware/config? Are you monitoring size of
>> the pg_multixact directory?
> It's Ubuntu 18.04.4 LTS, Intel Xeon E5-2660 v4, 56 CPU cores with 256Gb of RAM.
> PostgreSQL 10.14, compiled by gcc 7.5.0, 64-bit
>
> No, unfortunately we do not have signals for SLRU sizes.
> 3.5Tb mdadm raid10 over 28 SSD drives, 82% full.
>
> First incident triggering investigation was on 2020-04-19, at that time cluster was running on PG 10.11. But I think it was happening before.
>
> I'd say nothing special...
>
>>> How do you collect wait events for aggregation? just insert into some table with cron?
>>>
>> No, I have a simple shell script (attached) sampling data from
>> pg_stat_activity regularly. Then I load it into a table and aggregate to
>> get a summary.
> Thanks!
>
> Best regards, Andrey Borodin.

Hi,

Some time ago I have encountered a contention on
MultiXactOffsetControlLock with a performances benchmark. Here are the
wait event monitoring result with a pooling each 10 seconds and a 30
minutes run for the benchmarl:

After reading this thread I change the value of the buffer size to 32
and 64 and obtain the following results:

There was still contention on multixact but less than the first run. I
have increased the buffers to 128 and 512 and obtain the best results
for this bench:

I hope these metrics can have some interest to show the utility of this
patch but unfortunately I can not be more precise and provide reports
for the entire patch. The problem is that this benchmark is run on an
application that use PostgreSQL 11 and I can not back port the full
patch, there was too much changes since PG11. I have just increase the
size of NUM_MXACTOFFSET_BUFFERS and NUM_MXACTMEMBER_BUFFERS. This allow
us to triple the number of simultaneous connections between the first
and the last test.

I know that this report is not really helpful but at least I can give
more information on the benchmark that was used. This is the proprietary
zRef benchmark which compares the same Cobol programs (transactional and
batch) executed both on mainframes and on x86 servers. Instead of a DB2
z/os database we use PostgreSQL v11. This test has extensive use of
cursors (each select, even read only, is executed through a cursor) and
the contention was observed with update on tables with some foreign
keys. There is no explicit FOR SHARE on the queries, only some FOR
UPDATE clauses. I guess that the multixact contention is the result of
the for share locks produced for FK.

So in our case being able to tune the multixact buffers could help a lot
to improve the performances.

--
Gilles Darold

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-08 17:52:52
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Gilles!

Many thanks for your message!

> 8 дек. 2020 г., в 21:05, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>
> I know that this report is not really helpful

Quite contrary - this benchmarks prove that controllable reproduction exists. I've rebased patches for PG11. Can you please benchmark them (without extending SLRU)?

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v1106-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-o.patch	application/octet-stream	12.7 KB
v1106-0002-Make-MultiXact-local-cache-size-configurable.patch	application/octet-stream	4.8 KB
v1106-0003-Add-conditional-variable-to-wait-for-next-Mult.patch	application/octet-stream	4.4 KB
v1106-0004-Add-GUCs-to-tune-MultiXact-SLRUs.patch	application/octet-stream	6.2 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-09 10:51:36
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Andrey,

Thanks for the backport. I have issue with the first patch "Use shared
lock in GetMultiXactIdMembers for offsets and members"
(v1106-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-o.patch) the
applications are not working anymore when I'm applying it. Also PG
regression tests are failing too on several part.

test insert_conflict              ... ok
test create_function_1            ... FAILED
test create_type                  ... FAILED
test create_table                 ... FAILED
test create_function_2            ... FAILED
test copy                         ... FAILED
test copyselect                   ... ok
test copydml                      ... ok
test create_misc                  ... FAILED
test create_operator              ... FAILED
test create_procedure             ... ok
test create_index                 ... FAILED
test index_including              ... ok
test create_view                  ... FAILED
test create_aggregate             ... ok
test create_function_3            ... ok
test create_cast                  ... ok
test constraints                  ... FAILED
test triggers                     ... FAILED
test inherit                      ...
^C

This is also where I left my last try to back port for PG11, I will try
to fix it again but it could take time to have it working.

Best regards,

--
Gilles Darold

Le 08/12/2020 à 18:52, Andrey Borodin a écrit :
> Hi Gilles!
>
> Many thanks for your message!
>
>> 8 дек. 2020 г., в 21:05, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>>
>> I know that this report is not really helpful
> Quite contrary - this benchmarks prove that controllable reproduction exists. I've rebased patches for PG11. Can you please benchmark them (without extending SLRU)?
>
> Best regards, Andrey Borodin.
>

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-09 11:06:10
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 09/12/2020 à 11:51, Gilles Darold a écrit :
> Also PG regression tests are failing too on several part.

Forget this, I have not run the regression tests in the right repository:

...

=======================
All 189 tests passed.
=======================

I'm looking why the application is failing.

--
Gilles Darold
http://www.darold.net/

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-10 14:45:56
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Running tests yesterday with the patches has reported log of failures
with error on INSERT and UPDATE statements:

ERROR: lock MultiXactOffsetControlLock is not held

After a patch review this morning I think I have found what's going
wrong. In patch
v6-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch I think
there is a missing reinitialisation of the lockmode variable to LW_NONE
inside the retry loop after the call to LWLockRelease() in
src/backend/access/transam/multixact.c:1392:GetMultiXactIdMembers().
I've attached a new version of the patch for master that include the fix
I'm using now with PG11 and with which everything works very well now.

I'm running more tests to see the impact on the performances to play
with multixact_offsets_slru_buffers, multixact_members_slru_buffers and
multixact_local_cache_entries. I will reports the results later today.

--
Gilles Darold
http://www.darold.net/

Attachment	Content-Type	Size
v7-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch	text/x-patch	11.6 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-11 17:50:25
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 10/12/2020 à 15:45, Gilles Darold a écrit :
> Le 08/12/2020 à 18:52, Andrey Borodin a écrit :
>> Hi Gilles!
>>
>> Many thanks for your message!
>>
>>> 8 дек. 2020 г., в 21:05, Gilles Darold<gilles(at)darold(dot)net> написал(а):
>>>
>>> I know that this report is not really helpful
>> Quite contrary - this benchmarks prove that controllable reproduction exists. I've rebased patches for PG11. Can you please benchmark them (without extending SLRU)?
>>
>> Best regards, Andrey Borodin.
>>
> Hi,
>
>
> Running tests yesterday with the patches has reported log of failures
> with error on INSERT and UPDATE statements:
>
>
> ERROR: lock MultiXactOffsetControlLock is not held
>
>
> After a patch review this morning I think I have found what's going
> wrong. In patch
> v6-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch I
> think there is a missing reinitialisation of the lockmode variable to
> LW_NONE inside the retry loop after the call to LWLockRelease() in
> src/backend/access/transam/multixact.c:1392:GetMultiXactIdMembers().
> I've attached a new version of the patch for master that include the
> fix I'm using now with PG11 and with which everything works very well now.
>
>
> I'm running more tests to see the impact on the performances to play
> with multixact_offsets_slru_buffers, multixact_members_slru_buffers
> and multixact_local_cache_entries. I will reports the results later today.
>

Hi,

Sorry for the delay, I have done some further tests to try to reach the
limit without bottlenecks on multixact or shared buffers. The tests was
done on a Microsoft Asure machine with 2TB of RAM and 4 sockets Intel
Xeon Platinum 8280M (128 cpu). PG configuration:

    max_connections = 4096
    shared_buffers = 64GB
    max_prepared_transactions = 2048
    work_mem = 256MB
    maintenance_work_mem = 2GB
    wal_level = minimal
    synchronous_commit = off
    commit_delay = 1000
    commit_siblings = 10
    checkpoint_timeout = 1h
    max_wal_size = 32GB
    checkpoint_completion_target = 0.9

I have tested with several values for the different buffer's variables
starting from:

    multixact_offsets_slru_buffers = 64
    multixact_members_slru_buffers = 128
    multixact_local_cache_entries = 256

to the values with the best performances we achieve with this test to
avoid MultiXactOffsetControlLock or MultiXactMemberControlLock:

    multixact_offsets_slru_buffers = 128
    multixact_members_slru_buffers = 512
    multixact_local_cache_entries = 1024

Also shared_buffers have been increased up to 256GB to avoid
buffer_mapping contention.

Our last best test reports the following wait events:

At this stage I don't think we can have better performances by tuning
these buffers at least with PG11.

About performances gain related to the patch for shared lock in
GetMultiXactIdMembers unfortunately I can not see a difference with or
without this patch, it could be related to our particular benchmark. But
clearly the patch on multixact buffers should be committed as this is
really helpfull to be able to tuned PG when multixact bottlenecks are found.

Best regards,

--
Gilles Darold
LzLabs GmbH
https://www.lzlabs.com/

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-13 09:17:51
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 11/12/2020 à 18:50, Gilles Darold a écrit :
> Le 10/12/2020 à 15:45, Gilles Darold a écrit :
>> Le 08/12/2020 à 18:52, Andrey Borodin a écrit :
>>> Hi Gilles!
>>>
>>> Many thanks for your message!
>>>
>>>> 8 дек. 2020 г., в 21:05, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>>>>
>>>> I know that this report is not really helpful
>>> Quite contrary - this benchmarks prove that controllable reproduction exists. I've rebased patches for PG11. Can you please benchmark them (without extending SLRU)?
>>>
>>> Best regards, Andrey Borodin.
>>>
>> Hi,
>>
>>
>> Running tests yesterday with the patches has reported log of failures
>> with error on INSERT and UPDATE statements:
>>
>>
>> ERROR: lock MultiXactOffsetControlLock is not held
>>
>>
>> After a patch review this morning I think I have found what's going
>> wrong. In patch
>> v6-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch I
>> think there is a missing reinitialisation of the lockmode variable to
>> LW_NONE inside the retry loop after the call to LWLockRelease() in
>> src/backend/access/transam/multixact.c:1392:GetMultiXactIdMembers().
>> I've attached a new version of the patch for master that include the
>> fix I'm using now with PG11 and with which everything works very well
>> now.
>>
>>
>> I'm running more tests to see the impact on the performances to play
>> with multixact_offsets_slru_buffers, multixact_members_slru_buffers
>> and multixact_local_cache_entries. I will reports the results later
>> today.
>>
>
> Hi,
>
> Sorry for the delay, I have done some further tests to try to reach
> the limit without bottlenecks on multixact or shared buffers. The
> tests was done on a Microsoft Asure machine with 2TB of RAM and 4
> sockets Intel Xeon Platinum 8280M (128 cpu). PG configuration:
>
>     max_connections = 4096
>     shared_buffers = 64GB
>     max_prepared_transactions = 2048
>     work_mem = 256MB
>     maintenance_work_mem = 2GB
>     wal_level = minimal
>     synchronous_commit = off
>     commit_delay = 1000
>     commit_siblings = 10
>     checkpoint_timeout = 1h
>     max_wal_size = 32GB
>     checkpoint_completion_target = 0.9
>
> I have tested with several values for the different buffer's variables
> starting from:
>
>     multixact_offsets_slru_buffers = 64
>     multixact_members_slru_buffers = 128
>     multixact_local_cache_entries = 256
>
> to the values with the best performances we achieve with this test to
> avoid MultiXactOffsetControlLock or MultiXactMemberControlLock:
>
>     multixact_offsets_slru_buffers = 128
>     multixact_members_slru_buffers = 512
>     multixact_local_cache_entries = 1024
>
> Also shared_buffers have been increased up to 256GB to avoid
> buffer_mapping contention.
>
> Our last best test reports the following wait events:
>
>     event_type |           event            |    sum
>     ------------+----------------------------+-----------
>     Client     | ClientRead                 | 321690211
>     LWLock     | buffer_content             |   2970016
>     IPC        | ProcArrayGroupUpdate       |   2317388
>     LWLock     | ProcArrayLock              |   1445828
>     LWLock     | WALWriteLock               |   1187606
>     LWLock     | SubtransControlLock        |    972889
>     Lock       | transactionid              |    840560
>     Lock       | relation                   |    587600
>     Activity   | LogicalLauncherMain        |    529599
>     Activity   | AutoVacuumMain             |    528097
>
> At this stage I don't think we can have better performances by tuning
> these buffers at least with PG11.
>
> About performances gain related to the patch for shared lock in
> GetMultiXactIdMembers unfortunately I can not see a difference with or
> without this patch, it could be related to our particular benchmark.
> But clearly the patch on multixact buffers should be committed as this
> is really helpfull to be able to tuned PG when multixact bottlenecks
> are found.

I've done more review on these patches.

1) as reported in my previous message patch 0001 looks useless as it
doesn't allow measurable performances gain.

2) In patch 0004 there is two typo: s/informaion/information/ will fix them

3) the GUC are missing in the postgresql.conf.sample file, see patch in
attachment for a proposal.

Best regards,

--
Gilles Darold
LzLabs GmbH
https://www.lzlabs.com/

Attachment	Content-Type	Size
postgresql_conf_multixact_buffers_GUCs.patch	text/x-patch	819 bytes

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-13 17:24:49
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 13 дек. 2020 г., в 14:17, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>
> I've done more review on these patches.

Thanks, Gilles! I'll incorporate all your fixes to patchset.
Can you also benchmark conditional variable sleep? The patch "Add conditional variable to wait for next MultXact offset in corner case"?
The problem manifests on Standby when Primary is heavily loaded with MultiXactOffsetControlLock.

Thanks!

Best regards, Andrey Borodin.

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-14 06:31:32
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 13 дек. 2020 г., в 22:24, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
>
>
>> 13 дек. 2020 г., в 14:17, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>>
>> I've done more review on these patches.
>
> Thanks, Gilles! I'll incorporate all your fixes to patchset.

PFA patches.
Also, I've noted that patch "Add conditional variable to wait for next MultXact offset in corner case" removes CHECK_FOR_INTERRUPTS();, I'm not sure it's correct.

Thanks!

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v8-0004-Add-GUCs-to-tune-MultiXact-SLRUs.patch	application/octet-stream	7.2 KB
v8-0003-Add-conditional-variable-to-wait-for-next-MultXac.patch	application/octet-stream	4.4 KB
v8-0002-Make-MultiXact-local-cache-size-configurable.patch	application/octet-stream	3.7 KB
v8-0001-Use-shared-lock-in-GetMultiXactIdMembers-for.patch	application/octet-stream	13.2 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2020-12-23 16:31:17
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 13/12/2020 à 18:24, Andrey Borodin a écrit :
>
>> 13 дек. 2020 г., в 14:17, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>>
>> I've done more review on these patches.
> Thanks, Gilles! I'll incorporate all your fixes to patchset.
> Can you also benchmark conditional variable sleep? The patch "Add conditional variable to wait for next MultXact offset in corner case"?
> The problem manifests on Standby when Primary is heavily loaded with MultiXactOffsetControlLock.
>
> Thanks!
>
> Best regards, Andrey Borodin.

Hi Andrey,

Sorry for the response delay, we have run several others tests trying to
figure out the performances gain per patch but unfortunately we have
very heratic results. With the same parameters and patches the test
doesn't returns the same results following the day or the hour of the
day. This is very frustrating and I suppose that this is related to the
Azure architecture. The only thing that I am sure is that we had the
best performances results with all patches and

multixact_offsets_slru_buffers = 256
multixact_members_slru_buffers = 512
multixact_local_cache_entries = 4096

but I can not say if all or part of the patches are improving the
performances. My feeling is that performances gain related to patches 1
(shared lock) and 3 (conditional variable) do not have much to do with
the performances gain compared to just tuning the multixact buffers.
This is when the multixact contention is observed but perhaps they are
delaying the contention. It's all the more frustrating that we had a
test case to reproduce the contention but not the architecture apparently.

Can't do much more at this point.

Best regards,

--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-02-15 17:17:40
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 23 дек. 2020 г., в 21:31, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>
> Sorry for the response delay, we have run several others tests trying to figure out the performances gain per patch but unfortunately we have very heratic results. With the same parameters and patches the test doesn't returns the same results following the day or the hour of the day. This is very frustrating and I suppose that this is related to the Azure architecture. The only thing that I am sure is that we had the best performances results with all patches and
>
> multixact_offsets_slru_buffers = 256
> multixact_members_slru_buffers = 512
> multixact_local_cache_entries = 4096
>
>
> but I can not say if all or part of the patches are improving the performances. My feeling is that performances gain related to patches 1 (shared lock) and 3 (conditional variable) do not have much to do with the performances gain compared to just tuning the multixact buffers. This is when the multixact contention is observed but perhaps they are delaying the contention. It's all the more frustrating that we had a test case to reproduce the contention but not the architecture apparently.

Hi! Thanks for the input.
I think we have a consensus here that configuring SLRU size is beneficial for MultiXacts.
There is proposal in nearby thread [0] on changing default value of commit_ts SLRU buffers.
In my experience from time to time there can be problems with subtransactions cured by extending subtrans SLRU.

Let's make all SLRUs configurable?
PFA patch with draft of these changes.

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/20210115220744.GA24457%40alvherre.pgsql

Attachment	Content-Type	Size
v9-0001-Make-all-SLRU-buffer-sizes-configurable.patch	application/octet-stream	19.1 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-11 15:50:48
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 15/02/2021 à 18:17, Andrey Borodin a écrit :
>
>> 23 дек. 2020 г., в 21:31, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>>
>> Sorry for the response delay, we have run several others tests trying to figure out the performances gain per patch but unfortunately we have very heratic results. With the same parameters and patches the test doesn't returns the same results following the day or the hour of the day. This is very frustrating and I suppose that this is related to the Azure architecture. The only thing that I am sure is that we had the best performances results with all patches and
>>
>> multixact_offsets_slru_buffers = 256
>> multixact_members_slru_buffers = 512
>> multixact_local_cache_entries = 4096
>>
>>
>> but I can not say if all or part of the patches are improving the performances. My feeling is that performances gain related to patches 1 (shared lock) and 3 (conditional variable) do not have much to do with the performances gain compared to just tuning the multixact buffers. This is when the multixact contention is observed but perhaps they are delaying the contention. It's all the more frustrating that we had a test case to reproduce the contention but not the architecture apparently.
> Hi! Thanks for the input.
> I think we have a consensus here that configuring SLRU size is beneficial for MultiXacts.
> There is proposal in nearby thread [0] on changing default value of commit_ts SLRU buffers.
> In my experience from time to time there can be problems with subtransactions cured by extending subtrans SLRU.
>
> Let's make all SLRUs configurable?
> PFA patch with draft of these changes.
>
> Best regards, Andrey Borodin.
>
>
> [0] https://www.postgresql.org/message-id/flat/20210115220744.GA24457%40alvherre.pgsql
>

The patch doesn't apply anymore in master cause of error: patch failed:
src/backend/utils/init/globals.c:150

An other remark about this patch is that it should be mentionned in the
documentation (doc/src/sgml/config.sgml) that the new configuration
variables need a server restart, for example by adding "This parameter
can only be set at server start." like for shared_buffers. Patch on
postgresql.conf mention it.

And some typo to be fixed:

s/Tipically/Typically/

s/asincronous/asyncronous/

s/confugured/configured/

s/substrnsactions/substransactions/

--
Gilles Darold
LzLabs GmbH

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-12 12:44:02
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 11 марта 2021 г., в 20:50, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>
>
> The patch doesn't apply anymore in master cause of error: patch failed: src/backend/utils/init/globals.c:150
>
>
>
> An other remark about this patch is that it should be mentionned in the documentation (doc/src/sgml/config.sgml) that the new configuration variables need a server restart, for example by adding "This parameter can only be set at server start." like for shared_buffers. Patch on postgresql.conf mention it.
>
> And some typo to be fixed:
>
>
>
> s/Tipically/Typically/
>
> s/asincronous/asyncronous/
>
> s/confugured/configured/
>
> s/substrnsactions/substransactions/
>
>

Thanks, Gilles! Fixed.

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v10-0001-Make-all-SLRU-buffer-sizes-configurable.patch	application/octet-stream	19.5 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Gilles Darold <gilles(at)darold(dot)net>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-15 15:41:04
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 12/03/2021 à 13:44, Andrey Borodin a écrit :
>
>> 11 марта 2021 г., в 20:50, Gilles Darold <gilles(at)darold(dot)net> написал(а):
>>
>>
>> The patch doesn't apply anymore in master cause of error: patch failed: src/backend/utils/init/globals.c:150
>>
>>
>>
>> An other remark about this patch is that it should be mentionned in the documentation (doc/src/sgml/config.sgml) that the new configuration variables need a server restart, for example by adding "This parameter can only be set at server start." like for shared_buffers. Patch on postgresql.conf mention it.
>>
>> And some typo to be fixed:
>>
>>
>>
>> s/Tipically/Typically/
>>
>> s/asincronous/asyncronous/
>>
>> s/confugured/configured/
>>
>> s/substrnsactions/substransactions/
>>
>>
> Thanks, Gilles! Fixed.
>
> Best regards, Andrey Borodin.
>

Hi Andrey,

I found two problems in this patch, first in src/include/miscadmin.h
multixact_members_slru_buffers is declared twice:

extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_slru_buffers;
+extern PGDLLIMPORT int multixact_members_slru_buffers;
+extern PGDLLIMPORT int multixact_members_slru_buffers; <---------
+extern PGDLLIMPORT int subtrans_slru_buffers;

In file src/backend/access/transam/multixact.c the second variable
should be multixact_buffers_slru_buffers and not
multixact_offsets_slru_buffers.

@@ -1848,13 +1848,13 @@ MultiXactShmemInit(void)
MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;

        SimpleLruInit(MultiXactOffsetCtl,
-                                 "MultiXactOffset",
NUM_MULTIXACTOFFSET_BUFFERS, 0,
+                                 "MultiXactOffset",
multixact_offsets_slru_buffers, 0,
                                  MultiXactOffsetSLRULock,
"pg_multixact/offsets",
                                  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
                                  SYNC_HANDLER_MULTIXACT_OFFSET);
        SlruPagePrecedesUnitTests(MultiXactOffsetCtl,
MULTIXACT_OFFSETS_PER_PAGE);
        SimpleLruInit(MultiXactMemberCtl,
-                                 "MultiXactMember",
NUM_MULTIXACTMEMBER_BUFFERS, 0,
+                                 "MultiXactMember",
multixact_offsets_slru_buffers, 0,    <------------------
                                  MultiXactMemberSLRULock,
"pg_multixact/members",
                                  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
                                  SYNC_HANDLER_MULTIXACT_MEMBER);

Please fix them so that I can end the review.

--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-24 21:31:56
Message-ID:	CA+hUKGJ4TwipnMDmKdORu4zUo_rofsuu5FQsgV=UHJ3bY38ERw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Andrey,

On Sat, Mar 13, 2021 at 1:44 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> [v10]

+int multixact_offsets_slru_buffers = 8;
+int multixact_members_slru_buffers = 16;
+int subtrans_slru_buffers = 32;
+int notify_slru_buffers = 8;
+int serial_slru_buffers = 16;
+int clog_slru_buffers = 0;
+int commit_ts_slru_buffers = 0;

I don't think we should put "slru" (the name of the buffer replacement
algorithm, implementation detail) in the GUC names.

+ It defaults to 0, in this case CLOG size is taken as
<varname>shared_buffers</varname> / 512.

We already know that increasing the number of CLOG buffers above the
current number hurts as the linear search begins to dominate
(according to the commit message for 5364b357), and it doesn't seem
great to ship a new feature that melts your CPU when you turn it up.
Perhaps, to ship this, we need to introduce a buffer mapping table? I
have attached a "one coffee" attempt at that, on top of your v10 patch
(unmodified), for discussion. It survives basic testing but I don't
know how it performs.

Attachment	Content-Type	Size
v10-0001-Make-all-SLRU-buffer-sizes-configurable.patch	text/x-patch	19.5 KB
v10-0002-Add-buffer-mapping-table-for-SLRUs.patch	text/x-patch	7.0 KB

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-25 01:03:38
Message-ID:	CA+hUKGK2bkx91=CWRuPqWdXAu2JSS0mDPBh8DNuSeKhy8eOLgg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 25, 2021 at 10:31 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> We already know that increasing the number of CLOG buffers above the
> current number hurts as the linear search begins to dominate
> (according to the commit message for 5364b357), and it doesn't seem
> great to ship a new feature that melts your CPU when you turn it up.
> Perhaps, to ship this, we need to introduce a buffer mapping table? I
> have attached a "one coffee" attempt at that, on top of your v10 patch
> (unmodified), for discussion. It survives basic testing but I don't
> know how it performs.

Hrrr... Cfbot showed an assertion failure. Here's the two coffee
version with a couple of silly mistakes fixed.

Attachment	Content-Type	Size
v11-0001-Make-all-SLRU-buffer-sizes-configurable.patch	text/x-patch	19.5 KB
v11-0002-Add-buffer-mapping-table-for-SLRUs.patch	text/x-patch	7.0 KB

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-26 03:46:06
Message-ID:	CA+hUKGL0g+Nmf=2XePaXtw82BkW9CRzoxxXB4gqZOk7tHxNJXg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Andrey, all,

I propose some changes, and I'm attaching a new version:

I renamed the GUCs as clog_buffers etc (no "_slru_"). I fixed some
copy/paste mistakes where the different GUCs were mixed up. I made
some changes to the .conf.sample. I rewrote the documentation so that
it states the correct unit and defaults, and refers to the
subdirectories that are cached by these buffers instead of trying to
give a new definition of each of the SLRUs.

Do you like those changes?

Some things I thought about but didn't change:

I'm not entirely sure if we should use the internal and historical
names well known to hackers (CLOG), or the visible directory names (I
mean, we could use pg_xact_buffers instead of clog_buffers). I am not
sure why these GUCs need to be PGDLLIMPORT, but I see that NBuffers is
like that.

I wanted to do some very simple smoke testing of CLOG sizes on my
local development machine:

pgbench -i -s1000 postgres
pgbench -t4000000 -c8 -j8 -Mprepared postgres

I disabled autovacuum after running that just to be sure it wouldn't
interfere with my experiment:

alter table pgbench_accounts set (autovacuum_enabled = off);

Then I shut the cluster down and made a copy, so I could do some
repeated experiments from the same initial conditions each time. At
this point I had 30 files 0000-001E under pg_xact, holding 256kB = ~1
million transactions each. It'd take ~960 buffers to cache it all.
So how long does VACUUM FREEZE pgbench_accounts take?

I tested with just the 0001 patch, and also with the 0002 patch
(improved version, attached):

clog_buffers=128: 0001=2:28.499, 0002=2:17:891
clog_buffers=1024: 0001=1:38.485, 0002=1:29.701

I'm sure the speedup of the 0002 patch can be amplified by increasing
the number of transactions referenced in the table OR number of
clog_buffers, considering that the linear search produces
O(transactions * clog_buffers) work. That was 32M transactions and
8MB of CLOG, but I bet if you double both of those numbers once or
twice things start to get hot. I don't see why you shouldn't be able
to opt to cache literally all of CLOG if you want (something like 50MB
assuming default autovacuum_freeze_max_age, scale to taste, up to
512MB for the theoretical maximum useful value).

I'm not saying the 0002 patch is bug-free yet though, it's a bit finickity.

Attachment	Content-Type	Size
v12-0001-Make-all-SLRU-buffer-sizes-configurable.patch	text/x-patch	20.6 KB
v12-0002-Add-buffer-mapping-table-for-SLRUs.patch	text/x-patch	8.5 KB

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-26 06:00:51
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Thomas, Gilles, all!

Thanks for reviewing this!

> 25 марта 2021 г., в 02:31, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> I don't think we should put "slru" (the name of the buffer replacement
> algorithm, implementation detail) in the GUC names.
+1

> + It defaults to 0, in this case CLOG size is taken as
> <varname>shared_buffers</varname> / 512.
>
> We already know that increasing the number of CLOG buffers above the
> current number hurts as the linear search begins to dominate
Uh, my intent was to copy original approach of CLOG SLRU size, I just missed that Min(,) thing in shared_buffers logic.

> 26 марта 2021 г., в 08:46, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> Hi Andrey, all,
>
> I propose some changes, and I'm attaching a new version:
>
> I renamed the GUCs as clog_buffers etc (no "_slru_"). I fixed some
> copy/paste mistakes where the different GUCs were mixed up. I made
> some changes to the .conf.sample. I rewrote the documentation so that
> it states the correct unit and defaults, and refers to the
> subdirectories that are cached by these buffers instead of trying to
> give a new definition of each of the SLRUs.
>
> Do you like those changes?
Yes!

> Some things I thought about but didn't change:
>
> I'm not entirely sure if we should use the internal and historical
> names well known to hackers (CLOG), or the visible directory names (I
> mean, we could use pg_xact_buffers instead of clog_buffers).
While it is good idea to make notes about directory name, I think the real naming criteria is to help find this GUC when user encounters wait events in pg_stat_activity. I think there is no CLOG mentions in docs [0], only XactBuffer, XactSLRU et c.

> I'm not saying the 0002 patch is bug-free yet though, it's a bit finickity.
I think the idea of speeding up linear search is really really good for scaling SLRUs. It's not even about improving normal performance of the cluster, but it's important from preventing pathological degradation under certain circumstances. Bigger cache really saves SLAs :) I'll look into the patch more closely this weekend. Thank you!

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-26 15:52:22
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 26 марта 2021 г., в 11:00, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
>> I'm not saying the 0002 patch is bug-free yet though, it's a bit finickity.
> I think the idea of speeding up linear search is really really good for scaling SLRUs. It's not even about improving normal performance of the cluster, but it's important from preventing pathological degradation under certain circumstances. Bigger cache really saves SLAs :) I'll look into the patch more closely this weekend. Thank you!

Some thoughts on HashTable patch:
1. Can we allocate bigger hashtable to reduce probability of collisions?
2. Can we use specialised hashtable for this case? I'm afraid hash_search() does comparable number of CPU cycles as simple cycle from 0 to 128. We could inline everything and avoid hashp->hash(keyPtr, hashp->keysize) call. I'm not insisting on special hash though, just an idea.
3. pageno in SlruMappingTableEntry seems to be unused.

Thanks!

Best regards, Andrey Borodin.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-26 20:26:40
Message-ID:	CA+hUKGKVqrxOp82zER1=XN=yPwV_-OCGAg=ez=1iz9rG+A7Smw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 27, 2021 at 4:52 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> Some thoughts on HashTable patch:
> 1. Can we allocate bigger hashtable to reduce probability of collisions?

Yeah, good idea, might require some study.

> 2. Can we use specialised hashtable for this case? I'm afraid hash_search() does comparable number of CPU cycles as simple cycle from 0 to 128. We could inline everything and avoid hashp->hash(keyPtr, hashp->keysize) call. I'm not insisting on special hash though, just an idea.

I tried really hard to not fall into this rabbit h.... [hack hack
hack], OK, here's a first attempt to use simplehash, Andres's
steampunk macro-based robinhood template that we're already using for
several other things, and murmurhash which is inlineable and
branch-free. I had to tweak it to support "in-place" creation and
fixed size (in other words, no allocators, for use in shared memory).
Then I was annoyed that I had to add a "status" member to our struct,
so I tried to fix that. Definitely needs more work to think about
failure modes when running out of memory, how much spare space you
need, etc.

I have not experimented with this much beyond hacking until the tests
pass, but it *should* be more efficient...

> 3. pageno in SlruMappingTableEntry seems to be unused.

It's the key (dynahash uses the first N bytes of your struct as the
key, but in this new simplehash version it's more explicit).

Attachment	Content-Type	Size
v13-0001-Make-all-SLRU-buffer-sizes-configurable.patch	application/x-patch	20.6 KB
v13-0002-Make-simplehash-easy-to-use-in-shmem.patch	application/x-patch	5.9 KB
v13-0003-Support-intrusive-status-flag-in-simplehash.patch	application/x-patch	5.7 KB
v13-0004-Add-buffer-mapping-table-for-SLRUs.patch	application/x-patch	8.5 KB
v13-0005-fixup-use-simplehash-instead-of-dynahash.patch	application/x-patch	4.6 KB

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-27 05:31:54
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 27 марта 2021 г., в 01:26, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> On Sat, Mar 27, 2021 at 4:52 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>> Some thoughts on HashTable patch:
>> 1. Can we allocate bigger hashtable to reduce probability of collisions?
>
> Yeah, good idea, might require some study.
In a long run we always have this table filled with nslots. But the keys will be usually consecutive numbers (current working set of CLOG\Multis\etc). So in a happy hashing scenario collisions will only appear for some random backward jumps. I think just size = nslots * 2 will produce results which cannot be improved significantly.
And this reflects original growth strategy SH_GROW(tb, tb->size * 2).

>> 2. Can we use specialised hashtable for this case? I'm afraid hash_search() does comparable number of CPU cycles as simple cycle from 0 to 128. We could inline everything and avoid hashp->hash(keyPtr, hashp->keysize) call. I'm not insisting on special hash though, just an idea.
>
> I tried really hard to not fall into this rabbit h.... [hack hack
> hack], OK, here's a first attempt to use simplehash,

> Andres's
> steampunk macro-based robinhood template
Sounds magnificent.

> that we're already using for
> several other things
I could not find much tests to be sure that we do not break something...

> , and murmurhash which is inlineable and
> branch-free.
I think pageno is a hash already. Why hash any further? And pages accessed together will have smaller access time due to colocation.

> I had to tweak it to support "in-place" creation and
> fixed size (in other words, no allocators, for use in shared memory).
We really need to have a test to know what happens when this structure goes out of memory, as you mentioned below. What would be apropriate place for simplehash tests?

> Then I was annoyed that I had to add a "status" member to our struct,
> so I tried to fix that.
Indeed, sizeof(SlruMappingTableEntry) == 9 seems strange. Will simplehash align it well?

Thanks!

Best regards, Andrey Borodin.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-28 21:15:31
Message-ID:	CA+hUKGJubZLN16cTJjaeEpWdaTsKYuLaV67N+0Mps521zYo6ig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 27, 2021 at 6:31 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > 27 марта 2021 г., в 01:26, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
> > , and murmurhash which is inlineable and
> > branch-free.

> I think pageno is a hash already. Why hash any further? And pages accessed together will have smaller access time due to colocation.

Yeah, if clog_buffers is large enough then it's already a "perfect
hash", but if it's not then you might get some weird "harmonic"
effects (not sure if that's the right word), basically higher or lower
collision rate depending on coincidences in the data. If you apply a
hash, the collisions should be evenly spread out so at least it'll be
somewhat consistent. Does that make sense?

(At some point I figured out that the syscaches have lower collision
rates and perform better if you use oids directly instead of hashing
them... but then it's easy to create a pathological pattern of DDL
that turns your hash table into a linked list. Not sure what to think
about that.)

> > I had to tweak it to support "in-place" creation and
> > fixed size (in other words, no allocators, for use in shared memory).

> We really need to have a test to know what happens when this structure goes out of memory, as you mentioned below. What would be apropriate place for simplehash tests?

Good questions. This has to be based on being guaranteed to have
enough space for all of the entries, so the question is really just
"how bad can performance get with different load factors". FWIW there
were some interesting cases with clustering when simplehash was first
used in the executor (see commits ab9f2c42 and parent) which required
some work on hashing quality to fix.

> > Then I was annoyed that I had to add a "status" member to our struct,
> > so I tried to fix that.

> Indeed, sizeof(SlruMappingTableEntry) == 9 seems strange. Will simplehash align it well?

With that "intrusive status" patch, the size is back to 8. But I
think I made a mistake: I made it steal some key space to indicate
presence, but I think the presence test should really get access to
the whole entry so that you can encode it in more ways. For example,
with slotno == -1.

Alright, considering the date, if we want to get this into PostgreSQL
14 it's time to make some decisions.

1. Do we want customisable SLRU sizes in PG14?

+1 from me, we have multiple reports of performance gains from
increasing various different SLRUs, and it's easy to find workloads
that go faster.

One thought: it'd be nice if the user could *see* the current size,
when using the default. SHOW clog_buffers -> 0 isn't very helpful if
you want to increase it, but don't know what it's currently set to.
Not sure off the top of my head how best to do that.

2. What names do we want the GUCs to have? Here's what we have:

Proposed GUC Directory System views
clog_buffers pg_xact Xact
multixact_offsets_buffers pg_multixact/offsets MultiXactOffset
multixact_members_buffers pg_multixact/members MultiXactMember
notify_buffers pg_notify Notify
serial_buffers pg_serial Serial
subtrans_buffers pg_subtrans Subtrans
commit_ts_buffers pg_commit_ts CommitTs

By system views, I mean pg_stat_slru, pg_shmem_allocations and
pg_stat_activity (lock names add "SLRU" on the end).

Observations:

It seems obvious that "clog_buffers" should be renamed to "xact_buffers".
It's not clear whether the multixact GUCs should have the extra "s"
like the directories, or not, like the system views.
It see that we have "Shared Buffer Lookup Table" in
pg_shmem_allocations, so where I generated names like "Subtrans
Mapping Table" I should change that to "Lookup" to match.

3. What recommendations should we make about how to set it?

I think the answer depends partially on the next questions! I think
we should probably at least say something short about the pg_stat_slru
view (cache miss rate) and pg_stat_actitity view (waits on locks), and
how to tell if you might need to increase it. I think this probably
needs a new paragraph, separate from the docs for the individual GUC.

4. Do we want to ship the dynahash patch?

+0.9. The slight hesitation is that it's new code written very late
in the cycle, so it may still have bugs or unintended consequences,
and as you said, at small sizes the linear search must be faster than
the hash computation. Could you help test it, and try to break it?
Can we quantify the scaling effect for some interesting workloads, to
see at what size the dynahash beats the linear search, so that we can
make an informed decision? Of course, without a hash table, large
sizes will surely work badly, so it'd be tempting to restrict the size
you can set the GUC to.

If we do include the dynahash patch, then I think it would also be
reasonable to change the formula for the default, to make it higher on
large systems. The restriction to 128 buffers (= 1MB) doesn't make
much sense on a high frequency OLTP system with 128GB of shared
buffers or even 4GB. I think "unleashing better defaults" would
actually be bigger news than the GUC for typical users, because
they'll just see PG14 use a few extra MB and go faster without having
to learn about these obscure new settings.

5. Do we want to ship the simplehash patch?

-0.5. It's a bit too exciting for the last minute, so I'd be inclined
to wait until the next cycle to do some more research and testing. I
know it's a better idea in the long run.

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-29 08:26:02
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 29 марта 2021 г., в 02:15, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> On Sat, Mar 27, 2021 at 6:31 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>> 27 марта 2021 г., в 01:26, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>>> , and murmurhash which is inlineable and
>>> branch-free.
>
>> I think pageno is a hash already. Why hash any further? And pages accessed together will have smaller access time due to colocation.
>
> Yeah, if clog_buffers is large enough then it's already a "perfect
> hash", but if it's not then you might get some weird "harmonic"
> effects (not sure if that's the right word), basically higher or lower
> collision rate depending on coincidences in the data. If you apply a
> hash, the collisions should be evenly spread out so at least it'll be
> somewhat consistent. Does that make sense?
As far as I understand "Harmonic" effects only make sense if the distribution is unknown. Hash protects from "periodic" data when periods are equal to hash table size. I don't think we need to protect from this case, SLRU data is expected to be localised...
Cost of this protection is necessity to calculate murmur hash on each SLRU lookup. Probably, 10-100ns. Seems like not a big deal.

> (At some point I figured out that the syscaches have lower collision
> rates and perform better if you use oids directly instead of hashing
> them... but then it's easy to create a pathological pattern of DDL
> that turns your hash table into a linked list. Not sure what to think
> about that.)
>
>>> I had to tweak it to support "in-place" creation and
>>> fixed size (in other words, no allocators, for use in shared memory).
>
>> We really need to have a test to know what happens when this structure goes out of memory, as you mentioned below. What would be apropriate place for simplehash tests?
>
> Good questions. This has to be based on being guaranteed to have
> enough space for all of the entries, so the question is really just
> "how bad can performance get with different load factors". FWIW there
> were some interesting cases with clustering when simplehash was first
> used in the executor (see commits ab9f2c42 and parent) which required
> some work on hashing quality to fix.
Interesting read, I didn't know much about simple hash, but seems like there is still many cases where it can be used for good. I always wondered why Postgres uses only Larson's linear hash.

>
>>> Then I was annoyed that I had to add a "status" member to our struct,
>>> so I tried to fix that.
>
>> Indeed, sizeof(SlruMappingTableEntry) == 9 seems strange. Will simplehash align it well?
>
> With that "intrusive status" patch, the size is back to 8. But I
> think I made a mistake: I made it steal some key space to indicate
> presence, but I think the presence test should really get access to
> the whole entry so that you can encode it in more ways. For example,
> with slotno == -1.
>
> Alright, considering the date, if we want to get this into PostgreSQL
> 14 it's time to make some decisions.
>
> 1. Do we want customisable SLRU sizes in PG14?
>
> +1 from me, we have multiple reports of performance gains from
> increasing various different SLRUs, and it's easy to find workloads
> that go faster.
Yes, this is main point of this discussion. So +1 from me too.

>
> One thought: it'd be nice if the user could *see* the current size,
> when using the default. SHOW clog_buffers -> 0 isn't very helpful if
> you want to increase it, but don't know what it's currently set to.
> Not sure off the top of my head how best to do that.
Don't we expect that SHOW command indicate exactly same value as in config or SET command? If this convention does not exist - probably showing effective value is a good idea.

> 2. What names do we want the GUCs to have? Here's what we have:
>
> Proposed GUC Directory System views
> clog_buffers pg_xact Xact
> multixact_offsets_buffers pg_multixact/offsets MultiXactOffset
> multixact_members_buffers pg_multixact/members MultiXactMember
> notify_buffers pg_notify Notify
> serial_buffers pg_serial Serial
> subtrans_buffers pg_subtrans Subtrans
> commit_ts_buffers pg_commit_ts CommitTs
>
> By system views, I mean pg_stat_slru, pg_shmem_allocations and
> pg_stat_activity (lock names add "SLRU" on the end).
>
> Observations:
>
> It seems obvious that "clog_buffers" should be renamed to "xact_buffers".
+1
> It's not clear whether the multixact GUCs should have the extra "s"
> like the directories, or not, like the system views.
I think we show break the ties by native English speaker's ears or typing habits. I'm not a native speaker.

> It see that we have "Shared Buffer Lookup Table" in
> pg_shmem_allocations, so where I generated names like "Subtrans
> Mapping Table" I should change that to "Lookup" to match.
>
> 3. What recommendations should we make about how to set it?
>
> I think the answer depends partially on the next questions! I think
> we should probably at least say something short about the pg_stat_slru
> view (cache miss rate) and pg_stat_actitity view (waits on locks), and
> how to tell if you might need to increase it. I think this probably
> needs a new paragraph, separate from the docs for the individual GUC.
I can only suggest incident-driven approach.
1. Observe ridiculous amount of backends waiting on particular SLRU.
2. Double SLRU buffers for that SLRU.
3. Goto 1.
I don't think we should mention this approach in docs.

> 4. Do we want to ship the dynahash patch?

This patch allows to throw infinite amount of memory on a problem of SLRU waiting for IO. So the scale of improvement is much higher. Do I want that we ship this patch? Definitely. Does this change much? I don't know.

>
> +0.9. The slight hesitation is that it's new code written very late
> in the cycle, so it may still have bugs or unintended consequences,
> and as you said, at small sizes the linear search must be faster than
> the hash computation. Could you help test it, and try to break it?
I'll test it and try to break.

> Can we quantify the scaling effect for some interesting workloads, to
> see at what size the dynahash beats the linear search, so that we can
> make an informed decision?
I think we cannot statistically distinguish linear search from hash search by means of SLRU. But we can create some synthetic benchmarks.

> Of course, without a hash table, large
> sizes will surely work badly, so it'd be tempting to restrict the size
> you can set the GUC to.
>
> If we do include the dynahash patch, then I think it would also be
> reasonable to change the formula for the default, to make it higher on
> large systems. The restriction to 128 buffers (= 1MB) doesn't make
> much sense on a high frequency OLTP system with 128GB of shared
> buffers or even 4GB. I think "unleashing better defaults" would
> actually be bigger news than the GUC for typical users, because
> they'll just see PG14 use a few extra MB and go faster without having
> to learn about these obscure new settings.
I agree. I don't see why we would need to limit buffers to 128 in presence of hash search.

> 5. Do we want to ship the simplehash patch?
>
> -0.5. It's a bit too exciting for the last minute, so I'd be inclined
> to wait until the next cycle to do some more research and testing. I
> know it's a better idea in the long run.
OK, obviously, it's safer decision.

My TODO list:
1. Try to break patch set v13-[0001-0004]
2. Think how to measure performance of linear search versus hash search in SLRU buffer mapping.

Thanks!

Best regards, Andrey Borodin.

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-03-31 21:09:43
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 29 марта 2021 г., в 11:26, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
> My TODO list:
> 1. Try to break patch set v13-[0001-0004]
> 2. Think how to measure performance of linear search versus hash search in SLRU buffer mapping.

Hi Thomas!
I'm still doing my homework. And to this moment all my catch is that "utils/dynahash.h" is not necessary.

I'm thinking about hashtables and measuring performance near optimum of linear search does not seem a good idea now.
It's impossible to prove that difference is statistically significant on all platforms. But even on one platform measurements are just too noisy.

Shared buffers lookup table is indeed very similar to this SLRU lookup table. And it does not try to use more memory than needed. I could not find pgbench-visible impact of growing shared buffer lookup table. Obviously, because it's not a bottleneck on regular workload. And it's hard to guess representative pathological workload.

In fact, this thread started with proposal to use reader-writer lock for multis (instead of exclusive lock), and this proposal encountered same problem. It's very hard to create stable reproduction of pathological workload when this lock is heavily contented. Many people observed the problem, but still there is no open repro.

I bet it will be hard to prove that simplehash is any better then HTAB. But if it is really better, shared buffers could benefit from the same technique.

I think its just fine to use HTAB with normal size, as long as shared buffers do so. But there we allocate slightly more space InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS). Don't we need to allocate nslots + 1 ? It seems that we always do SlruMappingRemove() before SlruMappingAdd() and it is not necessary.

Thanks!

Best regards, Andrey Borodin.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-01 03:40:16
Message-ID:	CA+hUKGKEdeVKpOqBxzRF+Z4=j0T+CA7ERrXni5De71Mm6-dBWA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 1, 2021 at 10:09 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > 29 марта 2021 г., в 11:26, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
> > My TODO list:
> > 1. Try to break patch set v13-[0001-0004]
> > 2. Think how to measure performance of linear search versus hash search in SLRU buffer mapping.
>
> Hi Thomas!
> I'm still doing my homework. And to this moment all my catch is that "utils/dynahash.h" is not necessary.

Thanks. Here's a new patch with that fixed, and also:

1. New names ("... Mapping Table" -> "... Lookup Table" in
pg_shmem_allocations view, "clog_buffers" -> "xact_buffers") and a
couple of typos fixed here and there.

2. Remove the cap of 128 buffers for xact_buffers as agreed. We
still need a cap though, to avoid a couple of kinds of overflow inside
slru.c, both when computing the default value and accepting a
user-provided number. I introduced SLRU_MAX_ALLOWED_BUFFERS to keep
it <= 1GB, and tested this on a 32 bit build with extreme block sizes.

Likewise, I removed the cap of 16 buffers for commit_ts_buffers, but
only if you have track_commit_timestamp enabled. It seems silly to
waste 1MB per 1GB of shared_buffers on a feature you're not using. So
the default is capped at 16 in that case to preserve existing
behaviour, but otherwise can be set very large if you want.

I think it's plausible that we'll want to make the multixact sizes
adaptive too, but I that might have to be a job for later. Likewise,
I am sure that substransaction-heavy workloads might be slower than
they need to be due to the current default, but I have not done the
research, With these new GUCs, people are free to experiment and
develop theories about what the defaults should be in v15.

3. In the special case of xact_buffers, there is a lower cap of
512MB, because there's no point trying to cache more xids than there
can be in existence, and that is computed by working backwards from
CLOG_XACTS_PER_PAGE etc. It's not possible to do the same sort of
thing for the other SLRUs without overflow problems, and it doesn't
seem worth trying to fix that right now (1GB of cached commit
timestamps ought to be enough for anyone™, while the theoretical
maximum is 10 bytes for 2b xids = 20GB).

To make this more explicit for people not following our discussion in
detail: with shared_buffers=0...512MB, the behaviour doesn't change.
But for shared_buffers=1GB you'll get twice as many xact_buffers as
today (2MB instead of being capped at 1MB), and it keeps scaling
linearly from there at 0.2%. In other words, all real world databases
will get a boost in this department.

4. Change the default for commit_ts_buffers back to shared_buffers /
1024 (with a minimum of 4), because I think you might have changed it
by a copy and paste error -- or did you intend to make the default
higher?

5. Improve descriptions for the GUCs, visible in pg_settings view, to
match the documentation for related wait events. So "for commit log
SLRU" -> "for the transaction status SLRU cache", and similar
corrections elsewhere. (I am tempted to try to find a better word
than "SLRU", which doesn't seem relevant to users, but for now
consistency is good.)

6. Added a callback so that SHOW xact_buffers and SHOW
commit_ts_buffers display the real size in effect (instead of "0" for
default).

I tried running with xact_buffers=1 and soon saw why you change it to
interpret 1 the same as 0; with 1 you hit buffer starvation and get
stuck. I wish there were some way to say "the default for this GUC is
0, but if it's not zero then it's got to be at least 4". I didn't
study the theoretical basis for the previous minimum value of 4, so I
think we should keep it that way, so that if you say 3 you get 4. I
thought it was better to express that like so:

/* Use configured value if provided. */
if (xact_buffers > 0)
return Max(4, xact_buffers);
return Min(CLOG_MAX_ALLOWED_BUFFERS, Max(4, NBuffers / 512));

> I'm thinking about hashtables and measuring performance near optimum of linear search does not seem a good idea now.
> It's impossible to prove that difference is statistically significant on all platforms. But even on one platform measurements are just too noisy.
>
> Shared buffers lookup table is indeed very similar to this SLRU lookup table. And it does not try to use more memory than needed. I could not find pgbench-visible impact of growing shared buffer lookup table. Obviously, because it's not a bottleneck on regular workload. And it's hard to guess representative pathological workload.

Thanks for testing. I agree that it's a good idea to follow the main
buffer pool's approach for our first version of this. One small
difference is that the SLRU patch performs the hash computation while
it holds the lock. If we computed the hash first and used
hash_search_with_hash_value(), we could compute it before we obtain
the lock, like the main buffer pool.

If we computed the hash value first, we could also ignore the rule in
the documentation for hash_search_with_hash_value() that says that you
must calculate it with get_hash_value(), and just call the hash
function ourselves, so that it's fully inlinable. The same
opportunity exists for the main buffer pool. That'd get you one of
the micro-optimisations that simplehash.h offers. Whether it's worth
bothering with, I don't know.

> In fact, this thread started with proposal to use reader-writer lock for multis (instead of exclusive lock), and this proposal encountered same problem. It's very hard to create stable reproduction of pathological workload when this lock is heavily contented. Many people observed the problem, but still there is no open repro.
>
> I bet it will be hard to prove that simplehash is any better then HTAB. But if it is really better, shared buffers could benefit from the same technique.

Agreed, there are a lot of interesting future projects in this area,
when you compare the main buffer pool, these special buffer pools, and
maybe also the "shared relfilenode" pool patch I have proposed for v15
(CF entry 2933). All have mapping tables and buffer replacement
algorithms (why should they be different?), one has partitions, some
have atomic-based header locks, they interlock with WAL differently
(on page LSN, FPIs and checksum support), ... etc etc.

> I think its just fine to use HTAB with normal size, as long as shared buffers do so. But there we allocate slightly more space InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS). Don't we need to allocate nslots + 1 ? It seems that we always do SlruMappingRemove() before SlruMappingAdd() and it is not necessary.

Yeah, we never try to add more elements than allowed, because we have
a big lock around the mapping. The main buffer mapping table has a
more concurrent design and might temporarily have one extra entry per
partition.

Attachment	Content-Type	Size
v14-0001-Add-a-buffer-mapping-table-for-SLRUs.patch	text/x-patch	8.6 KB
v14-0002-Make-all-SLRU-buffer-sizes-configurable.patch	text/x-patch	25.7 KB

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-03 19:57:23
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 1 апр. 2021 г., в 06:40, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> 2. Remove the cap of 128 buffers for xact_buffers as agreed. We
> still need a cap though, to avoid a couple of kinds of overflow inside
> slru.c, both when computing the default value and accepting a
> user-provided number. I introduced SLRU_MAX_ALLOWED_BUFFERS to keep
> it <= 1GB, and tested this on a 32 bit build with extreme block sizes.
BTW we do not document maximum values right now.
I was toying around with big values. For example if we set different big xact_buffers we can get something like
FATAL: not enough shared memory for data structure "Notify" (72768 bytes requested)
FATAL: not enough shared memory for data structure "Async Queue Control" (2492 bytes requested)
FATAL: not enough shared memory for data structure "Checkpointer Data" (393280 bytes requested)

But never anything about xact_buffers. I don't think it's important, though.

>
> Likewise, I removed the cap of 16 buffers for commit_ts_buffers, but
> only if you have track_commit_timestamp enabled.
Is there a reason to leave 16 pages if commit_ts is disabled? They might be useful for some artefacts of previously enabled commit_ts?

> 4. Change the default for commit_ts_buffers back to shared_buffers /
> 1024 (with a minimum of 4), because I think you might have changed it
> by a copy and paste error -- or did you intend to make the default
> higher?
I changed default due to some experiments with https://www.postgresql.org/message-id/flat/20210115220744.GA24457%40alvherre.pgsql
In fact most important part of that thread was removing the cap, which is done by the patchset now.

Thanks!

Best regards, Andrey Borodin.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-07 05:59:19
Message-ID:	CA+hUKGKTESKn9-BkUT8T3vwcgZQJqsV4PSQKpE+M1qUgfSTLMg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Apr 4, 2021 at 7:57 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> I was toying around with big values. For example if we set different big
xact_buffers we can get something like
> FATAL: not enough shared memory for data structure "Notify" (72768 bytes
requested)
> FATAL: not enough shared memory for data structure "Async Queue Control"
(2492 bytes requested)
> FATAL: not enough shared memory for data structure "Checkpointer Data"
(393280 bytes requested)
>
> But never anything about xact_buffers. I don't think it's important,
though.

I had added the hash table size in SimpleLruShmemSize(), but then
SimpleLruInit() passed that same value in when allocating the struct, so
the struct was oversized. Oops. Fixed.

> > Likewise, I removed the cap of 16 buffers for commit_ts_buffers, but
> > only if you have track_commit_timestamp enabled.

> Is there a reason to leave 16 pages if commit_ts is disabled? They might
be useful for some artefacts of previously enabled commit_ts?

Alvaro, do you have an opinion on that?

The remaining thing that bothers me about this patch set is that there is
still a linear search in the replacement algorithm, and it runs with an
exclusive lock. That creates a serious problem for large caches that still
aren't large enough. I wonder if we can do something to improve that
situation in the time we have. I considered a bunch of ideas but could
only find one that fits with slru.c's simplistic locking while tracking
recency. What do you think about a hybrid of SLRU and random replacement,
that retains some characteristics of both? You could think of it as being
a bit like the tournament selection of the genetic algorithm, with a
tournament size of (say) 8 or 16. Any ideas on how to evaluate this and
choose the number? See attached.

Attachment	Content-Type	Size
v15-0001-Add-a-buffer-mapping-table-for-SLRUs.patch	text/x-patch	9.0 KB
v15-0002-Make-all-SLRU-buffer-sizes-configurable.patch	text/x-patch	25.7 KB
v15-0003-Use-hybrid-random-SLRU-replacement-for-SLRUs.patch	text/x-patch	3.3 KB

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-07 11:44:32
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 7 апр. 2021 г., в 08:59, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> The remaining thing that bothers me about this patch set is that there is still a linear search in the replacement algorithm, and it runs with an exclusive lock. That creates a serious problem for large caches that still aren't large enough. I wonder if we can do something to improve that situation in the time we have. I considered a bunch of ideas but could only find one that fits with slru.c's simplistic locking while tracking recency. What do you think about a hybrid of SLRU and random replacement, that retains some characteristics of both? You could think of it as being a bit like the tournament selection of the genetic algorithm, with a tournament size of (say) 8 or 16. Any ideas on how to evaluate this and choose the number? See attached.
> <v15-0001-Add-a-buffer-mapping-table-for-SLRUs.patch><v15-0002-Make-all-SLRU-buffer-sizes-configurable.patch><v15-0003-Use-hybrid-random-SLRU-replacement-for-SLRUs.patch>

Maybe instead of fully associative cache with random replacement we could use 1-associative cache?
i.e. each page can reside only in one spcific buffer slot. If there's something else - evict it.
I think this would be as efficient as RR cache. And it's soooo fast.

Best regards, Andrey Borodin.

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-07 12:13:26
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 7 апр. 2021 г., в 14:44, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
> Maybe instead of fully associative cache with random replacement we could use 1-associative cache?
> i.e. each page can reside only in one spcific buffer slot. If there's something else - evict it.
> I think this would be as efficient as RR cache. And it's soooo fast.

I thought a bit more and understood that RR is protected from two competing pages in working set, while 1-associative cache is not. So, discard that idea.

Best regards, Andrey Borodin.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-08 00:30:41
Message-ID:	CA+hUKGLCLDtgDj2Xsf0uBk5WXDCeHxBDDJPsyY7m65Fde-=pyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 8, 2021 at 12:13 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > 7 апр. 2021 г., в 14:44, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
> > Maybe instead of fully associative cache with random replacement we could use 1-associative cache?
> > i.e. each page can reside only in one spcific buffer slot. If there's something else - evict it.
> > I think this would be as efficient as RR cache. And it's soooo fast.
>
> I thought a bit more and understood that RR is protected from two competing pages in working set, while 1-associative cache is not. So, discard that idea.

It's an interesting idea. I know that at least one proprietary fork
just puts the whole CLOG in memory for direct indexing, which is what
we'd have here if we said "oh, your xact_buffers setting is so large
I'm just going to use slotno = pageno & mask".

Here's another approach that is a little less exciting than
"tournament RR" (or whatever that should be called; I couldn't find an
established name for it). This version is just our traditional linear
search, except that it stops at 128, and remembers where to start from
next time (like a sort of Fisher-Price GCLOCK hand). This feels more
committable to me. You can argue that all buffers above 128 are bonus
buffers that PostgreSQL 13 didn't have, so the fact that we can no
longer find the globally least recently used page when you set
xact_buffers > 128 doesn't seem too bad to me, as an incremental step
(but to be clear, of course we can do better than this with more work
in later releases).

Attachment	Content-Type	Size
v16-0001-Add-a-buffer-mapping-table-for-SLRUs.patch	text/x-patch	9.0 KB
v16-0002-Make-all-SLRU-buffer-sizes-configurable.patch	text/x-patch	25.7 KB
v16-0003-Limit-SLRU-buffer-replacement-search.patch	text/x-patch	3.0 KB

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-08 07:24:55
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 8 апр. 2021 г., в 03:30, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> Here's another approach that is a little less exciting than
> "tournament RR" (or whatever that should be called; I couldn't find an
> established name for it). This version is just our traditional linear
> search, except that it stops at 128, and remembers where to start from
> next time (like a sort of Fisher-Price GCLOCK hand). This feels more
> committable to me. You can argue that all buffers above 128 are bonus
> buffers that PostgreSQL 13 didn't have, so the fact that we can no
> longer find the globally least recently used page when you set
> xact_buffers > 128 doesn't seem too bad to me, as an incremental step
> (but to be clear, of course we can do better than this with more work
> in later releases).
I agree that this version of eviction seems much more effective and less intrusive than RR. And it's still LRU, which is important for subsystem that is called SLRU.
shared->search_slotno is initialized implicitly with memset(). But this seems like a common practice.
Also comment above "max_search = Min(shared->num_slots, MAX_REPLACEMENT_SEARCH);" does not reflect changes.

Besides this patch looks good to me.

Thanks!

Best regards, Andrey Borodin.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-08 12:22:39
Message-ID:	CA+hUKGKzPSXYowH=Y8NiuUD8gdyD225-WHaEzXT1E0Sa8RopVQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 8, 2021 at 7:24 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> I agree that this version of eviction seems much more effective and less intrusive than RR. And it's still LRU, which is important for subsystem that is called SLRU.
> shared->search_slotno is initialized implicitly with memset(). But this seems like a common practice.
> Also comment above "max_search = Min(shared->num_slots, MAX_REPLACEMENT_SEARCH);" does not reflect changes.
>
> Besides this patch looks good to me.

Thanks! I chickened out of committing a buffer replacement algorithm
patch written 11 hours before the feature freeze, but I also didn't
really want to commit the GUC patch without that. Ahh, if only we'd
latched onto the real problems here just a little sooner, but there is
always PostgreSQL 15, I heard it's going to be amazing. Moved to next
CF.

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-04-11 18:37:21
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 8 апр. 2021 г., в 15:22, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> On Thu, Apr 8, 2021 at 7:24 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>> I agree that this version of eviction seems much more effective and less intrusive than RR. And it's still LRU, which is important for subsystem that is called SLRU.
>> shared->search_slotno is initialized implicitly with memset(). But this seems like a common practice.
>> Also comment above "max_search = Min(shared->num_slots, MAX_REPLACEMENT_SEARCH);" does not reflect changes.
>>
>> Besides this patch looks good to me.
>
> Thanks! I chickened out of committing a buffer replacement algorithm
> patch written 11 hours before the feature freeze, but I also didn't
> really want to commit the GUC patch without that. Ahh, if only we'd
> latched onto the real problems here just a little sooner, but there is
> always PostgreSQL 15, I heard it's going to be amazing. Moved to next
> CF.

I have one more idea inspired by CPU caches.
Let's make SLRU n-associative, where n ~ 8.
We can divide buffers into "banks", number of banks must be power of 2.
All banks are of equal size. We choose bank size to approximately satisfy user's configured buffer size.
Each page can live only within one bank. We use same search and eviction algorithms as we used in SLRU, but we only need to search\evict over 8 elements.
All SLRU data of a single bank will be colocated within at most 2 cache line.

I did not come up with idea how to avoid multiplication of bank_number * bank_size in case when user configured 31337 buffers (any number that is radically not a power of 2).

PFA patch implementing this idea.

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch	application/octet-stream	4.2 KB
v17-0001-Make-all-SLRU-buffer-sizes-configurable.patch	application/octet-stream	25.7 KB

From:	Васильев Дмитрий <vadv(dot)mkn(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-06-14 17:40:16
Message-ID:	CAB-SwXb8OqH013fNFxDgM3TBizRtwvRE8S8BwsOg5zmnjy9DRw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

пн, 14 июн. 2021 г. в 15:07, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>:

> PFA patch implementing this idea.
>

I'm benchmarked v17 patches.
Testing was done on a 96-core machine, with PGDATA completely placed in
tmpfs.
PostgreSQL was built with CFLAGS -O2.

for-update PgBench script:
\set aid random_zipfian(1, 100, 2)
begin;
select :aid from pgbench_accounts where aid = :aid for update;
update pgbench_accounts set abalance = abalance + 1 where aid = :aid;
update pgbench_accounts set abalance = abalance * 2 where aid = :aid;
update pgbench_accounts set abalance = abalance - 2 where aid = :aid;
end;

Before each test sample data was filled with "pgbench -i -s 100", testing
was performed 3 times for 1 hour each test.
The benchmark results are presented with changing
multi_xact_members_buffers and multicast_offsets_buffers (1:2 respectively):
settings tps
multixact_members_buffers_64Kb 693.2
multixact_members_buffers_128Kb 691.4
multixact_members_buffers_192Kb 696.3
multixact_members_buffers_256Kb 694.4
multixact_members_buffers_320Kb 692.3
multixact_members_buffers_448Kb 693.7
multixact_members_buffers_512Kb 693.3
vanilla 676.1

Best regards, Dmitry Vasiliev.

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2021-12-26 10:09:59
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> 8 апр. 2021 г., в 15:22, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>>
> I have one more idea inspired by CPU caches.
> Let's make SLRU n-associative, where n ~ 8.
> We can divide buffers into "banks", number of banks must be power of 2.
> All banks are of equal size. We choose bank size to approximately satisfy user's configured buffer size.
> Each page can live only within one bank. We use same search and eviction algorithms as we used in SLRU, but we only need to search\evict over 8 elements.
> All SLRU data of a single bank will be colocated within at most 2 cache line.
>
> I did not come up with idea how to avoid multiplication of bank_number * bank_size in case when user configured 31337 buffers (any number that is radically not a power of 2).

We can avoid this multiplication by using gapped memory under SLRU page_statuses, but from my POV here complexity does not worth possible performance gain.

PFA rebase of the patchset. Also I've added a patch to combine page_number, page_status, and page_dirty together to touch less cachelines.

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v-18-0001-Make-all-SLRU-buffer-sizes-configurable.patch	text/x-diff	25.7 KB
v-18-0003-Pack-SLRU-page_number-page_status-and-page_dirt.patch	text/x-diff	23.6 KB
v-18-0002-Divide-SLRU-buffers-into-8-associative-banks.patch	text/x-diff	4.7 KB

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-01-14 09:28:38
Message-ID:	20220114092838.gbr6ea7zxbaespsr@jrouhaud
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,
On Sun, Dec 26, 2021 at 03:09:59PM +0500, Andrey Borodin wrote:
>
>
> PFA rebase of the patchset. Also I've added a patch to combine page_number, page_status, and page_dirty together to touch less cachelines.

The cfbot reports some errors on the latest version of the patch:

https://cirrus-ci.com/task/6121317215764480
[04:56:38.432] su postgres -c "make -s -j${BUILD_JOBS} world-bin"
[04:56:48.270] In file included from async.c:134:
[04:56:48.270] ../../../src/include/access/slru.h:47:28: error: expected identifier or ‘(’ before ‘:’ token
[04:56:48.270] 47 | typedef enum SlruPageStatus:int16_t
[04:56:48.270] | ^
[04:56:48.270] ../../../src/include/access/slru.h:53:3: warning: data definition has no type or storage class
[04:56:48.270] 53 | } SlruPageStatus;
[04:56:48.270] | ^~~~~~~~~~~~~~
[04:56:48.270] ../../../src/include/access/slru.h:53:3: warning: type defaults to ‘int’ in declaration of ‘SlruPageStatus’ [-Wimplicit-int]
[04:56:48.270] ../../../src/include/access/slru.h:58:2: error: expected specifier-qualifier-list before ‘SlruPageStatus’
[04:56:48.270] 58 | SlruPageStatus page_status;
[04:56:48.270] | ^~~~~~~~~~~~~~
[04:56:48.270] async.c: In function ‘asyncQueueAddEntries’:
[04:56:48.270] async.c:1448:41: error: ‘SlruPageEntry’ has no member named ‘page_dirty’
[04:56:48.270] 1448 | NotifyCtl->shared->page_entries[slotno].page_dirty = true;
[04:56:48.270] | ^
[04:56:48.271] make[3]: *** [<builtin>: async.o] Error 1
[04:56:48.271] make[3]: *** Waiting for unfinished jobs....
[04:56:48.297] make[2]: *** [common.mk:39: commands-recursive] Error 2
[04:56:48.297] make[2]: *** Waiting for unfinished jobs....
[04:56:54.554] In file included from clog.c:36:
[04:56:54.554] ../../../../src/include/access/slru.h:47:28: error: expected identifier or ‘(’ before ‘:’ token
[04:56:54.554] 47 | typedef enum SlruPageStatus:int16_t
[04:56:54.554] | ^
[04:56:54.554] ../../../../src/include/access/slru.h:53:3: warning: data definition has no type or storage class
[04:56:54.554] 53 | } SlruPageStatus;
[04:56:54.554] | ^~~~~~~~~~~~~~
[04:56:54.554] ../../../../src/include/access/slru.h:53:3: warning: type defaults to ‘int’ in declaration of ‘SlruPageStatus’ [-Wimplicit-int]
[04:56:54.554] ../../../../src/include/access/slru.h:58:2: error: expected specifier-qualifier-list before ‘SlruPageStatus’
[04:56:54.554] 58 | SlruPageStatus page_status;
[04:56:54.554] | ^~~~~~~~~~~~~~
[04:56:54.554] clog.c: In function ‘TransactionIdSetPageStatusInternal’:
[04:56:54.554] clog.c:396:39: error: ‘SlruPageEntry’ has no member named ‘page_dirty’
[04:56:54.554] 396 | XactCtl->shared->page_entries[slotno].page_dirty = true;
[04:56:54.554] | ^
[04:56:54.554] In file included from ../../../../src/include/postgres.h:46,
[04:56:54.554] from clog.c:33:
[04:56:54.554] clog.c: In function ‘BootStrapCLOG’:
[04:56:54.554] clog.c:716:47: error: ‘SlruPageEntry’ has no member named ‘page_dirty’
[04:56:54.554] 716 | Assert(!XactCtl->shared->page_entries[slotno].page_dirty);
[04:56:54.554] | ^
[04:56:54.554] ../../../../src/include/c.h:848:9: note: in definition of macro ‘Assert’
[04:56:54.554] 848 | if (!(condition)) \
[04:56:54.554] | ^~~~~~~~~
[04:56:54.554] clog.c: In function ‘TrimCLOG’:
[04:56:54.554] clog.c:801:40: error: ‘SlruPageEntry’ has no member named ‘page_dirty’
[04:56:54.554] 801 | XactCtl->shared->page_entries[slotno].page_dirty = true;
[04:56:54.554] | ^
[04:56:54.554] In file included from ../../../../src/include/postgres.h:46,
[04:56:54.554] from clog.c:33:
[04:56:54.554] clog.c: In function ‘clog_redo’:
[04:56:54.554] clog.c:997:48: error: ‘SlruPageEntry’ has no member named ‘page_dirty’
[04:56:54.554] 997 | Assert(!XactCtl->shared->page_entries[slotno].page_dirty);
[04:56:54.554] | ^
[04:56:54.554] ../../../../src/include/c.h:848:9: note: in definition of macro ‘Assert’
[04:56:54.554] 848 | if (!(condition)) \
[04:56:54.554] | ^~~~~~~~~
[04:56:54.555] make[4]: *** [<builtin>: clog.o] Error 1
[04:56:54.555] make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2
[04:56:54.555] make[3]: *** Waiting for unfinished jobs....
[04:56:56.405] make[2]: *** [common.mk:39: access-recursive] Error 2
[04:56:56.405] make[1]: *** [Makefile:42: all-backend-recurse] Error 2
[04:56:56.405] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2
[04:56:56.407]
[04:56:56.407] Exit status: 2

Could you send a new version? In the meantime I will switch the patch status
to Waiting on Author.

From:	Shawn Debnath <sdn(at)amazon(dot)com>
To:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-01-14 22:20:05
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 14, 2022 at 05:28:38PM +0800, Julien Rouhaud wrote:
> > PFA rebase of the patchset. Also I've added a patch to combine
> > page_number, page_status, and page_dirty together to touch less
> > cachelines.
>
> The cfbot reports some errors on the latest version of the patch:
>
> https://cirrus-ci.com/task/6121317215764480
> [...]
> Could you send a new version? In the meantime I will switch the patch
> status
> to Waiting on Author.
>

I was planning on running a set of stress tests on these patches. Could
we confirm which ones we plan to include in the commitfest?

--
Shawn Debnath
Amazon Web Services (AWS)

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Shawn Debnath <sdn(at)amazon(dot)com>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-01-15 07:16:59
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 15 янв. 2022 г., в 03:20, Shawn Debnath <sdn(at)amazon(dot)com> написал(а):
>
> On Fri, Jan 14, 2022 at 05:28:38PM +0800, Julien Rouhaud wrote:
>>> PFA rebase of the patchset. Also I've added a patch to combine
>>> page_number, page_status, and page_dirty together to touch less
>>> cachelines.
>>
>> The cfbot reports some errors on the latest version of the patch:
>>
>> https://cirrus-ci.com/task/6121317215764480
>> [...]
>> Could you send a new version? In the meantime I will switch the patch
>> status
>> to Waiting on Author.
>>
>
> I was planning on running a set of stress tests on these patches. Could
> we confirm which ones we plan to include in the commitfest?

Many thanks for your interest. Here's the latest version.

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v19-0003-Pack-SLRU-page_number-page_status-and-page_dirty.patch	application/octet-stream	23.6 KB
v19-0002-Divide-SLRU-buffers-into-8-associative-banks.patch	application/octet-stream	4.7 KB
v19-0001-Make-all-SLRU-buffer-sizes-configurable.patch	application/octet-stream	25.7 KB

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Shawn Debnath <sdn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact/SLRU buffers configuration
Date:	2022-01-15 15:46:01
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 15, 2022 at 12:16:59PM +0500, Andrey Borodin wrote:
> > 15 янв. 2022 г., в 03:20, Shawn Debnath <sdn(at)amazon(dot)com> написал(а):
> > On Fri, Jan 14, 2022 at 05:28:38PM +0800, Julien Rouhaud wrote:
> >>> PFA rebase of the patchset. Also I've added a patch to combine
> >>> page_number, page_status, and page_dirty together to touch less
> >>> cachelines.
> >>
> >> The cfbot reports some errors on the latest version of the patch:
> >>
> >> https://cirrus-ci.com/task/6121317215764480
> >> [...]
> >> Could you send a new version? In the meantime I will switch the patch
> >> status to Waiting on Author.
> >
> > I was planning on running a set of stress tests on these patches. Could
> > we confirm which ones we plan to include in the commitfest?
>
> Many thanks for your interest. Here's the latest version.

This is failing to compile under linux and windows due to bitfield syntax.
http://cfbot.cputube.org/andrey-borodin.html

And compile warnings:

slru.c: In function ‘SlruAdjustNSlots’:
slru.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
161 | int nbanks = 1;
| ^~~
slru.c: In function ‘SimpleLruReadPage_ReadOnly’:
slru.c:530:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
530 | int bankstart = (pageno & shared->bank_mask) * shared->bank_size;
| ^~~

Note that you can test the CI result using any github account without waiting
for the cfbot. See ./src/tools/ci/README.

--
Justin

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	Shawn Debnath <sdn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact/SLRU buffers configuration
Date:	2022-01-16 05:36:08
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 15 янв. 2022 г., в 20:46, Justin Pryzby <pryzby(at)telsasoft(dot)com> написал(а):
>>>
>>> I was planning on running a set of stress tests on these patches. Could
>>> we confirm which ones we plan to include in the commitfest?
>>
>> Many thanks for your interest. Here's the latest version.
>
> This is failing to compile under linux and windows due to bitfield syntax.
> http://cfbot.cputube.org/andrey-borodin.html

Uh, sorry, I formatted a patch from wrong branch.

Just tested Cirrus. It's wonderful, thanks! Really faster than doing stuff on my machines...

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v20-0001-Make-all-SLRU-buffer-sizes-configurable.patch	application/octet-stream	25.7 KB
v20-0002-Divide-SLRU-buffers-into-8-associative-banks.patch	application/octet-stream	4.7 KB
v20-0003-Pack-SLRU-page_number-page_status-and-page_dirty.patch	application/octet-stream	24.4 KB

From:	Shawn Debnath <sdn(at)amazon(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-01-20 15:44:54
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 15, 2022 at 12:16:59PM +0500, Andrey Borodin wrote:

> > I was planning on running a set of stress tests on these patches. Could
> > we confirm which ones we plan to include in the commitfest?
>
> Many thanks for your interest. Here's the latest version.

Here are the results of the multixact perf test I ran on the patch that splits
the linear SLRU caches into banks. With my test setup, the binaries
with the patch applied performed slower marginally across the test
matrix against unpatched binaries. Here are the results:

+-------------------------------+---------------------+-----------------------+------------+
| workload | patched average tps | unpatched average tps | difference |
+-------------------------------+---------------------+-----------------------+------------+
| create only | 10250.54396 | 10349.67487 | -1.0% |
| create and select | 9677.711286 | 9991.065037 | -3.2% |
| large cache create only | 10310.96646 | 10337.16455 | -0.3% |
| large cache create and select | 9654.24077 | 9924.270242 | -2.8% |
+-------------------------------+---------------------+-----------------------+------------+

The test was configured in the following manner:
- AWS EC2 c5d.24xlarge instances, located in the same AZ, were used as
the database host and the test driver. These systems have 96 vcpus and
184 GB memory. NVMe drives were configured as RAID5.
- GUCs were changed from defaults to be the following:
max_connections = 5000
shared_buffers = 96GB
max_wal_size = 2GB
min_wal_size = 192MB
- pgbench runs were done with -c 1000 -j 1000 and a scale of 10,000
- Two multixact workloads were tested, first [0] was a create only
script which selected 100 pgbench_account rows for share. Second
workload [1] added a select statement to visit rows touched in the
past which had multixacts generated for them. pgbench test script [2]
wraps the call to the functions inside an explicit transaction.
- Large cache tests are multixact offsets cache size hard coded to 128
and members cache size hard coded to 256.
- Row selection is based on time based approach that lets all client
connections coordinate which rows to work with based on the
millisecond they start executing. To allow for more multixacts to be
generated and reduce contention, the workload uses offsets ahead of
the start id based on a random number.
- The one bummer about these runs were that they only ran for 600
seconds for insert only and 400 seconds for insert and select. I
consistently ran into checkpointer getting oom-killed on this instance
after that timeframe. Will dig into this separately. But the TPS was
consistent.
- Each test was repeated at least 3 times and the average of those runs
were used.
- I am using the master branch and changes were applied on commit
f47ed79cc8a0cfa154dc7f01faaf59822552363f

I think patch 1 is a must-have. Regarding patch 2, I would propose we
avoid introducing more complexity into SimpleLRU cache and instead focus
on making the SLRU to buffer cache effort [3] a reality. I would also
add that we have a few customers in our fleet who have been successfully
running the large cache configuration on the regular SLRU without any
issues. With cache sizes this small, the linear searches are still quite
efficient.

If my test workload can be made better, please let me know. Happy to
re-run tests as needed.

[0] https://gist.github.com/sdebnath/e015561811adf721dd40dd6638969c69
[1] https://gist.github.com/sdebnath/2f3802e1fe288594b6661a7a59a7ca07
[2] https://gist.github.com/sdebnath/6bbfd5f87945a7d819e30a9a1701bc97
[3] https://www.postgresql.org/message-id/CA%2BhUKGKAYze99B-jk9NoMp-2BDqAgiRC4oJv%2BbFxghNgdieq8Q%40mail.gmail.com

--
Shawn Debnath
Amazon Web Services (AWS)

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Shawn Debnath <sdn(at)amazon(dot)com>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-01-20 16:21:24
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 20 янв. 2022 г., в 20:44, Shawn Debnath <sdn(at)amazon(dot)com> написал(а):
>
> If my test workload can be made better, please let me know. Happy to
> re-run tests as needed.

Shawn, thanks for the benchmarking!

Can you please also test 2nd patch against a large multixact SLRUs?
2nd patch is not intended to do make thing better on default buffer sizes. It must save the perfromance in case of really huge SLRU buffers.

Thanks!

Best regards, Andrey Borodin.

From:	Shawn Debnath <sdn(at)amazon(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-01-21 00:19:49
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 20, 2022 at 09:21:24PM +0500, Andrey Borodin wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
> > 20 янв. 2022 г., в 20:44, Shawn Debnath <sdn(at)amazon(dot)com> написал(а):
> Can you please also test 2nd patch against a large multixact SLRUs?
> 2nd patch is not intended to do make thing better on default buffer sizes. It must save the perfromance in case of really huge SLRU buffers.

Test was performed on 128/256 for multixact offset/members cache as
stated in my previous email. Sure I can test it for higher values - but
what's a real world value that would make sense? We have been using this
configuration successfully for a few of our customers that ran into
MultiXact contention.

--
Shawn Debnath
Amazon Web Services (AWS)

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Shawn Debnath <sdn(at)amazon(dot)com>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-01-21 07:02:36
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 21 янв. 2022 г., в 05:19, Shawn Debnath <sdn(at)amazon(dot)com> написал(а):
>
> On Thu, Jan 20, 2022 at 09:21:24PM +0500, Andrey Borodin wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>> 20 янв. 2022 г., в 20:44, Shawn Debnath <sdn(at)amazon(dot)com> написал(а):
>> Can you please also test 2nd patch against a large multixact SLRUs?
>> 2nd patch is not intended to do make thing better on default buffer sizes. It must save the perfromance in case of really huge SLRU buffers.
>
> Test was performed on 128/256 for multixact offset/members cache as
> stated in my previous email. Sure I can test it for higher values - but
> what's a real world value that would make sense? We have been using this
> configuration successfully for a few of our customers that ran into
> MultiXact contention.

Sorry, seems like I misinterpreted results yesterday.
I had one concern about 1st patch step: it makes CLOG buffers size dependent on shared_buffers. But in your tests you seem to have already exercised xact_buffers = 24576 without noticeable degradation. Is it correct? I doubt a little bit that linear search among 24K elements on each CLOG access does not incur performance impact, but your tests seem to prove it.

IMV splitting SLRU buffer into banks would make sense for values greater than 1<<10. But you are right that 256 seems enough to cope with most of problems of multixacts so far. I just thought about stressing SLRU buffers with multixacts to be sure that CLOG buffers will not suffer degradation. But yes, it's too indirect test.

Maybe, just to be sure, let's repeat tests with autovacuum turned off to stress xact_buffers?

Thanks!

Best regards, Andrey Borodin.

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-02-19 05:34:36
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 8 апр. 2021 г., в 17:22, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> написал(а):
>
> Thanks! I chickened out of committing a buffer replacement algorithm
> patch written 11 hours before the feature freeze, but I also didn't
> really want to commit the GUC patch without that. Ahh, if only we'd
> latched onto the real problems here just a little sooner, but there is
> always PostgreSQL 15, I heard it's going to be amazing. Moved to next
> CF.

Hi Thomas!

There's feature freeze approaching again. I see that you are working on moving SLRUs to buffer pools, but it is not clear to which PG version it will land. And there is 100% consensus that first patch is useful and helps to prevent big issues. Maybe let's commit 1'st step without lifting default xact_buffers limit? Or 1st patch as-is with any simple technique that prevents linear search in SLRU buffers.

Best regards, Andrey Borodin.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-02-19 21:38:53
Message-ID:	CA+hUKGLZWwv6Vwq0E8XxK+YZHEbt7Z=KBQagfXhyEquz52WXDA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 19, 2022 at 6:34 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> There's feature freeze approaching again. I see that you are working on moving SLRUs to buffer pools, but it is not clear to which PG version it will land. And there is 100% consensus that first patch is useful and helps to prevent big issues. Maybe let's commit 1'st step without lifting default xact_buffers limit? Or 1st patch as-is with any simple technique that prevents linear search in SLRU buffers.

Hi Andrey,

Yeah, the SLRU/buffer pool thing would be complete enough to propose
for 16 at the earliest. I posted the early prototype to see what sort
of reaction the concept would get before doing more work; I know
others have investigated this topic too... maybe it can encourage more
patches, experimental results, ideas to be shared... but this is not
relevant for 15.

Back to this patch: assuming we can settle on a good-enough-for-now
replacement algorithm, do we want to add this set of 7 GUCs? Does
anyone else want to weigh in on that? Concretely, this patch adds:

multixact_offsets_buffers
multixact_members_buffers
subtrans_buffers
notify_buffers
serial_buffers
xact_buffers
commit_ts_buffers

I guess the people at
https://ottertune.com/blog/postgresql-knobs-list/ would be happy if we
did. Hopefully we'd drop the settings in a future release once we
figure out the main buffer pool thing (or some other scheme to
automate sizing).

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-02-19 21:42:23
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2022-02-20 10:38:53 +1300, Thomas Munro wrote:
> Back to this patch: assuming we can settle on a good-enough-for-now
> replacement algorithm, do we want to add this set of 7 GUCs? Does
> anyone else want to weigh in on that?

I'm -0.2 on it, given that we have a better path forward.

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-02-20 07:35:06
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 20 Feb 2022, at 02:42, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2022-02-20 10:38:53 +1300, Thomas Munro wrote:
>> Back to this patch: assuming we can settle on a good-enough-for-now
>> replacement algorithm, do we want to add this set of 7 GUCs? Does
>> anyone else want to weigh in on that?
>
> I'm -0.2 on it, given that we have a better path forward.
That’s a really good path forward, but it's discussed at least for 3.5 years[0]. And guaranteed not to be there until 2023. Gilles, Shawn, Dmitry expressed their opinion in lines with that the patch “is a must-have” referring to real pathological performance degradation inflicted by SLRU cache starvation. And I can remember dozen of other incidents that would not happen if the patch was applied, e.g. this post is referring to the patch as a cure [1].

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
[1] https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/#what-can-we-do-about-getting-rid-of-nessie

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-03-18 09:02:13
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 20 Feb 2022, at 02:38, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> Back to this patch: assuming we can settle on a good-enough-for-now
> replacement algorithm, do we want to add this set of 7 GUCs? Does
> anyone else want to weigh in on that?

Hi Thomas!

It seems we don't have any other input besides reviews and Andres's -0.2.
Is there a chance to proceed?

Best regards, Andrey Borodin.

From:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-07-21 13:00:20
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Good day, all.

I did benchmark of patch on 2 socket Xeon 5220 CPU @ 2.20GHz .
I used "benchmark" used to reproduce problems with SLRU on our
customers setup.
In opposite to Shawn's tests I concentrated on bad case: a lot
of contention.

slru-funcs.sql - function definitions
- functions creates a lot of subtrunsactions to stress subtrans
- and select random rows for share to stress multixacts

slru-call.sql - function call for benchmark

slru-ballast.sql - randomly select 1000 consequent rows
"for update skip locked" to stress multixacts

patch1 - make SLRU buffers configurable
patch2 - make "8-associative banks"

Benchmark done by pgbench.
Inited with scale 1 to induce contention.
pgbench -i -s 1 testdb

Benchmark 1:
- low number of connections (50), 60% slru-call, 40% slru-ballast
pgbench -f slru-call(dot)sql(at)60 -f slru-ballast(dot)sql(at)40 -c 50 -j 75 -P 1 -T 30 testdb

version | subtrans | multixact | tps
| buffers | offs/memb | func+ballast
--------+----------+-----------+------
master | 32 | 8/16 | 184+119
patch1 | 32 | 8/16 | 184+119
patch1 | 1024 | 8/16 | 121+77
patch1 | 1024 | 512/1024 | 118+75
patch2 | 32 | 8/16 | 190+122
patch2 | 1024 | 8/16 | 190+125
patch2 | 1024 | 512/1024 | 190+127

As you see, this test case degrades with dumb increase of
SLRU buffers. But use of "hash table" in form of "associative
buckets" makes performance stable.

Benchmark 2:
- high connection number (600), 98% slru-call, 2% slru-ballast
pgbench -f slru-call(dot)sql(at)98 -f slru-ballast(dot)sql(at)2 -c 600 -j 75 -P 1 -T 30 testdb

I don't paste "ballast" tps here since 2% make them too small,
and they're very noisy.

version | subtrans | multixact | tps
| buffers | offs/memb | func
--------+----------+-----------+------
master | 32 | 8/16 | 13
patch1 | 32
| 8/16 | 13
patch1 | 1024 | 8/16 | 31
patch1 | 1024 | 512/1024 | 53
patch2 | 32 | 8/16 | 12
patch2 | 1024 | 8/16 | 34
patch2 | 1024 | 512/1024 | 67

In this case simple buffer increase does help. But "buckets"
increase performance gain.

I didn't paste here results third part of patch ("Pack SLRU...")
because I didn't see any major performance gain from it, and
it consumes large part of patch diff.

Rebased versions of first two patch parts are attached.

regards,

Yura Sokolov

Attachment	Content-Type	Size
slru-ballast.sql	application/sql	123 bytes
slru-call.sql	application/sql	25 bytes
slru-func.sql	application/sql	1.2 KB
v21-0002-Divide-SLRU-buffers-into-8-associative-banks.patch	text/x-patch	4.9 KB
v21-0001-Make-all-SLRU-buffer-sizes-configurable.patch	text/x-patch	25.7 KB

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-07-23 08:39:50
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 21 Jul 2022, at 18:00, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>
> In this case simple buffer increase does help. But "buckets"
> increase performance gain.
Yura, thank you for your benchmarks!
We already knew that patch can save the day on pathological workloads, now we have a proof of this.
Also there's the evidence that user can blindly increase size of SLRU if they want (with the second patch). So there's no need for hard explanations on how to tune the buffers size.

Thomas, do you still have any doubts? Or is it certain that SLRU will be replaced by any better subsystem in 16?

Best regards, Andrey Borodin.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-07-23 08:47:48
Message-ID:	CA+hUKG+BAH+pEBgv0AbttWd_GZukrYFy+wfk9gY4+CSS8enyiw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jul 23, 2022 at 8:41 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> Thomas, do you still have any doubts? Or is it certain that SLRU will be replaced by any better subsystem in 16?

Hi Andrey,

Sorry for my lack of replies on this and the other SLRU thread -- I'm
thinking and experimenting. More soon.

From:	i(dot)lazarev(at)postgrespro(dot)ru
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-08-16 19:36:27
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Andrey Borodin wrote 2022-07-23 11:39:
>
> Yura, thank you for your benchmarks!
> We already knew that patch can save the day on pathological workloads,
> now we have a proof of this.
> Also there's the evidence that user can blindly increase size of SLRU
> if they want (with the second patch). So there's no need for hard
> explanations on how to tune the buffers size.

Hi @Andrey.Borodin, With some considerations and performance checks from
@Yura.Sokolov we simplified your approach by the following:

1. Preamble. We feel free to increase any SLRU's, since there's no
performance degradation on large Buffers count using your SLRU buckets
solution.
2. `slru_buffers_size_scale` is only one config param introduced for all
SLRUs. It scales SLRUs upper cap by power 2.
3. All SLRU buffers count are capped by both `MBuffers (shared_buffers)`
and `slru_buffers_size_scale`. see
4. Magic initial constants `NUM_*_BUFFERS << slru_buffers_size_scale`
are applied for every SLRU.
5. All SLRU buffers are always sized as power of 2, their hash bucket
size is always 8.

There's attached patch for your consideration. It does gather and
simplify both `v21-0001-Make-all-SLRU-buffer-sizes-configurable.patch`
and `v21-0002-Divide-SLRU-buffers-into-8-associative-banks.patch` to
much simpler approach.

Thank you, Yours,
- Ivan

Attachment	Content-Type	Size
v22-0006-bucketed-SLRUs-simplified_patch.patch	text/x-diff	18.0 KB

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	i(dot)lazarev(at)postgrespro(dot)ru
Cc:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-08-18 03:35:15
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 17 Aug 2022, at 00:36, i(dot)lazarev(at)postgrespro(dot)ru wrote:
>
>> Andrey Borodin wrote 2022-07-23 11:39:
>> Yura, thank you for your benchmarks!
>> We already knew that patch can save the day on pathological workloads,
>> now we have a proof of this.
>> Also there's the evidence that user can blindly increase size of SLRU
>> if they want (with the second patch). So there's no need for hard
>> explanations on how to tune the buffers size.
>
> Hi @Andrey.Borodin, With some considerations and performance checks from @Yura.Sokolov we simplified your approach by the following:
>
> 1. Preamble. We feel free to increase any SLRU's, since there's no performance degradation on large Buffers count using your SLRU buckets solution.
> 2. `slru_buffers_size_scale` is only one config param introduced for all SLRUs. It scales SLRUs upper cap by power 2.
> 3. All SLRU buffers count are capped by both `MBuffers (shared_buffers)` and `slru_buffers_size_scale`. see
> 4. Magic initial constants `NUM_*_BUFFERS << slru_buffers_size_scale` are applied for every SLRU.
> 5. All SLRU buffers are always sized as power of 2, their hash bucket size is always 8.
>
> There's attached patch for your consideration. It does gather and simplify both `v21-0001-Make-all-SLRU-buffer-sizes-configurable.patch` and `v21-0002-Divide-SLRU-buffers-into-8-associative-banks.patch` to much simpler approach.

I like the idea of one knob instead of one per each SLRU. Maybe we even could deduce sane value from NBuffers? That would effectively lead to 0 knobs :)

Your patch have a prefix "v22-0006", does it mean there are 5 previous steps of the patchset?

Thank you!

Best regards, Andrey Borodin.

From:	i(dot)lazarev(at)postgrespro(dot)ru
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-08-19 15:48:41
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrey Borodin wrote 2022-08-18 06:35:
>
> I like the idea of one knob instead of one per each SLRU. Maybe we
> even could deduce sane value from NBuffers? That would effectively
> lead to 0 knobs :)
>
> Your patch have a prefix "v22-0006", does it mean there are 5 previous
> steps of the patchset?
>
> Thank you!
>
>
> Best regards, Andrey Borodin.

Not sure it's possible to deduce from NBuffers only.
slru_buffers_scale_shift looks like relief valve for systems with ultra
scaled NBuffers.

Regarding v22-0006 I just tried to choose index unique for this thread
so now it's fixed to 0001 indexing.

Attachment	Content-Type	Size
v23-0001-bucketed-SLRUs-simplified.patch	text/x-diff	18.0 KB

From:	Andrey Borodin <amborodin86(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2022-12-20 18:39:29
Message-ID:	CAAhFRxhox2Z9CpUMu3Quf-AoM+sbxWV16ED9+nKfg_fibaY_rw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jul 23, 2022 at 1:48 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Sat, Jul 23, 2022 at 8:41 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > Thomas, do you still have any doubts? Or is it certain that SLRU will be replaced by any better subsystem in 16?
>
> Hi Andrey,
>
> Sorry for my lack of replies on this and the other SLRU thread -- I'm
> thinking and experimenting. More soon.
>

Hi Thomas,

PostgreSQL 16 feature freeze is approaching again. Let's choose
something from possible solutions, even if the chosen one is
temporary.
Thank you!

Best regards, Andrey Borodin.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	i(dot)lazarev(at)postgrespro(dot)ru
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2023-01-03 13:01:59
Message-ID:	CALDaNm1U99XGKzn7SEB4jpwgL3itbSUvMUmyyDv2azYL8qg7Eg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 19 Aug 2022 at 21:18, <i(dot)lazarev(at)postgrespro(dot)ru> wrote:
>
> Andrey Borodin wrote 2022-08-18 06:35:
> >
> > I like the idea of one knob instead of one per each SLRU. Maybe we
> > even could deduce sane value from NBuffers? That would effectively
> > lead to 0 knobs :)
> >
> > Your patch have a prefix "v22-0006", does it mean there are 5 previous
> > steps of the patchset?
> >
> > Thank you!
> >
> >
> > Best regards, Andrey Borodin.
>
> Not sure it's possible to deduce from NBuffers only.
> slru_buffers_scale_shift looks like relief valve for systems with ultra
> scaled NBuffers.
>
> Regarding v22-0006 I just tried to choose index unique for this thread
> so now it's fixed to 0001 indexing.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:

=== Applying patches on top of PostgreSQL commit ID
325bc54eed4ea0836a0bb715bb18342f0c1c668a ===
=== applying patch ./v23-0001-bucketed-SLRUs-simplified.patch
patching file src/include/miscadmin.h
...
patching file src/backend/utils/misc/guc.c
Hunk #1 FAILED at 32.
Hunk #2 FAILED at 2375.
2 out of 2 hunks FAILED -- saving rejects to file
src/backend/utils/misc/guc.c.rej

[1] - http://cfbot.cputube.org/patch_41_2627.log

Regards,
Vignesh

From:	Andrey Borodin <amborodin86(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	i(dot)lazarev(at)postgrespro(dot)ru, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2023-01-09 04:19:14
Message-ID:	CAAhFRxheL9XKBiyPYtTj9p1pupkorXFdSM4OjNGB79xn6o=x4Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2023 at 5:02 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> does not apply on top of HEAD as in [1], please post a rebased patch:
>
Thanks! Here's the rebase.

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v24-0001-Divide-SLRU-buffers-into-8-associative-banks.patch	application/octet-stream	17.1 KB

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Andrey Borodin <amborodin86(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2023-01-11 05:58:06
Message-ID:	CAFiTN-sHefPcYwxybSwLUGQsdHMmrGBxphRWALiaEKNKRyGqxg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 9, 2023 at 9:49 AM Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
>
> On Tue, Jan 3, 2023 at 5:02 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > does not apply on top of HEAD as in [1], please post a rebased patch:
> >
> Thanks! Here's the rebase.

I was looking into this patch, it seems like three different
optimizations are squeezed in a single patch
1) dividing buffer space in banks to reduce the seq search cost 2) guc
parameter for buffer size scale 3) some of the buffer size values are
modified compared to what it is on the head. I think these are 3
patches which should be independently committable.

While looking into the first idea of dividing the buffer space in
banks, I see that it will speed up finding the buffers but OTOH while
searching the victim buffer it will actually can hurt the performance
the slru pages which are frequently accessed are not evenly
distributed across the banks. So imagine the cases where we have some
banks with a lot of empty slots and other banks from which we
frequently have to evict out the pages in order to get the new pages
in.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Andrey Borodin <amborodin86(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2023-01-11 17:13:43
Message-ID:	CAAhFRxiO3EBvT-T5t13rH8qh4sx438nm9AM++enJ1tS+4ubzwg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Dilip! Thank you for the review!

On Tue, Jan 10, 2023 at 9:58 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Jan 9, 2023 at 9:49 AM Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 3, 2023 at 5:02 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > does not apply on top of HEAD as in [1], please post a rebased patch:
> > >
> > Thanks! Here's the rebase.
>
> I was looking into this patch, it seems like three different
> optimizations are squeezed in a single patch
> 1) dividing buffer space in banks to reduce the seq search cost 2) guc
> parameter for buffer size scale 3) some of the buffer size values are
> modified compared to what it is on the head. I think these are 3
> patches which should be independently committable.
There's no point in dividing SLRU buffers in parts unless the buffer's
size is configurable.
And it's only possible to enlarge default buffers size if SLRU buffers
are divided into banks.
So the features can be viewed as independent commits, but make no
sense separately.

But, probably, it's a good idea to split the patch back anyway, for
easier review.

>
> While looking into the first idea of dividing the buffer space in
> banks, I see that it will speed up finding the buffers but OTOH while
> searching the victim buffer it will actually can hurt the performance
> the slru pages which are frequently accessed are not evenly
> distributed across the banks. So imagine the cases where we have some
> banks with a lot of empty slots and other banks from which we
> frequently have to evict out the pages in order to get the new pages
> in.
>

Yes. Despite the extremely low probability of such a case, this
pattern when a user accesses pages assigned to only one bank may
happen.
This case is equivalent to having just one bank, which means small
buffers. Just as we have now.

Thank you!

Best regards, Andrey Borodin.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Andrey Borodin <amborodin86(at)gmail(dot)com>
Cc:	i(dot)lazarev(at)postgrespro(dot)ru, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-01-20 03:31:20
Message-ID:	CALDaNm0qK5EE3UKbDMiywGfjTpiZq74TeUo9b6Pq8DbqjS2ing@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 9 Jan 2023 at 09:49, Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
>
> On Tue, Jan 3, 2023 at 5:02 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > does not apply on top of HEAD as in [1], please post a rebased patch:
> >
> Thanks! Here's the rebase.

I'm seeing that there has been no activity in this thread for more
than 1 year now, I'm planning to close this in the current commitfest
unless someone is planning to take it forward.

Regards,
Vignesh

From:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-01-27 04:58:18
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 20 Jan 2024, at 08:31, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 9 Jan 2023 at 09:49, Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
>>
>> On Tue, Jan 3, 2023 at 5:02 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>>> does not apply on top of HEAD as in [1], please post a rebased patch:
>>>
>> Thanks! Here's the rebase.
>
> I'm seeing that there has been no activity in this thread for more
> than 1 year now, I'm planning to close this in the current commitfest
> unless someone is planning to take it forward.

Hi Vignesh,

thanks for the ping! Most important parts of this patch set are discussed in [0]. If that patchset will be committed, I'll withdraw entry for this thread from commitfest.
There's a version of Multixact-specific optimizations [1], but I hope they will not be necessary with effective caches developed in [0]. It seems to me that most important part of those optimization is removing sleeps under SLRU lock on standby [2] by Kyotaro Horiguchi. But given that cache optimizations took 4 years to get closer to commit, I'm not sure we will get this optimization any time soon...

Thanks!

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/CAFiTN-vzDvNz%3DExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A%40mail.gmail.com
[1] https://www.postgresql.org/message-id/flat/2ECE132B-C042-4489-930E-DBC5D0DAB84A%40yandex-team.ru#5f7e7022647be9eeecfc2ae75d765500
[2] https://www.postgresql.org/message-id/flat/20200515.090333.24867479329066911.horikyota.ntt%40gmail.com#855f8bb7205890579a363d2344b4484d

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-01-28 12:49:31
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Jan-27, Andrey Borodin wrote:

> thanks for the ping! Most important parts of this patch set are discussed in [0]. If that patchset will be committed, I'll withdraw entry for this thread from commitfest.
> There's a version of Multixact-specific optimizations [1], but I hope they will not be necessary with effective caches developed in [0]. It seems to me that most important part of those optimization is removing sleeps under SLRU lock on standby [2] by Kyotaro Horiguchi. But given that cache optimizations took 4 years to get closer to commit, I'm not sure we will get this optimization any time soon...

I'd appreciate it if you or Horiguchi-san can update his patch to remove
use of usleep in favor of a CV in multixact, and keep this CF entry to
cover it.

Perhaps a test to make the code reach the usleep(1000) can be written
using injection points (49cd2b93d7db)?

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-01-28 18:17:16
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 28 Jan 2024, at 17:49, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> I'd appreciate it if you or Horiguchi-san can update his patch to remove
> use of usleep in favor of a CV in multixact, and keep this CF entry to
> cover it.

Sure! Sounds great!

> Perhaps a test to make the code reach the usleep(1000) can be written
> using injection points (49cd2b93d7db)?

I've tried to prototype something like that. But interesting point between GetNewMultiXactId() and RecordNewMultiXact() is a critical section, and we cannot have injection points in critical sections...
Also, to implement such a test we need "wait" type of injection points, see step 2 in attachment. With this type of injection points I can stop a backend amidst entering information about new MultiXact.

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
0001-Add-conditional-variable-to-wait-for-next-MultXact-o.patch	application/octet-stream	3.3 KB
0002-Add-wait-type-for-injection-points.patch	application/octet-stream	3.7 KB
0003-Try-to-test-multixact-CV-sleep.patch	application/octet-stream	5.3 KB

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-02-03 17:32:45
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 28 Jan 2024, at 23:17, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
>> Perhaps a test to make the code reach the usleep(1000) can be written
>> using injection points (49cd2b93d7db)?
>
> I've tried to prototype something like that. But interesting point between GetNewMultiXactId() and RecordNewMultiXact() is a critical section, and we cannot have injection points in critical sections...
> Also, to implement such a test we need "wait" type of injection points, see step 2 in attachment. With this type of injection points I can stop a backend amidst entering information about new MultiXact.

Here's the test draft. This test reliably reproduces sleep on CV when waiting next multixact to be filled into "members" SLRU.
Cost of having this test:
1. We need a new injection point type "wait" (in addition to "error" and "notice"). It cannot be avoided, because we need to sync at least 3 processed to observe condition we want.
2. We need new way to declare injection point that can happen inside critical section. I've called it "prepared injection point".

Complexity of having this test is higher than complexity of CV-sleep patch itself. Do we want it? If so I can produce cleaner version, currently all multixact tests are int injection_points test module.

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
v2-0003-Test-multixact-CV-sleep.patch	application/octet-stream	9.1 KB
v2-0002-Add-wait-type-for-injection-points.patch	application/octet-stream	4.9 KB
v2-0001-Add-conditional-variable-to-wait-for-next-MultXac.patch	application/octet-stream	3.3 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	x4mmm(at)yandex-team(dot)ru
Cc:	alvherre(at)alvh(dot)no-ip(dot)org, vignesh21(at)gmail(dot)com, amborodin86(at)gmail(dot)com, i(dot)lazarev(at)postgrespro(dot)ru, y(dot)sokolov(at)postgrespro(dot)ru, andres(at)anarazel(dot)de, thomas(dot)munro(at)gmail(dot)com, gilles(at)darold(dot)net, tomas(dot)vondra(at)enterprisedb(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, aekorotkov(at)gmail(dot)com, a(dot)lubennikova(at)postgrespro(dot)ru, daniel(at)yesql(dot)se, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-02-29 01:59:44
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Sat, 3 Feb 2024 22:32:45 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
> Here's the test draft. This test reliably reproduces sleep on CV when waiting next multixact to be filled into "members" SLRU.

By the way, I raised a question about using multiple CVs
simultaneously [1]. That is, I suspect that the current CV
implementation doesn't allow us to use multiple condition variables at
the same time, because all CVs use the same PCPROC member cvWaitLink
to accommodate different waiter sets. If this assumption is correct,
we should resolve the issue before spreading more uses of CVs.

[1] https://www.postgresql.org/message-id/20240227.150709.1766217736683815840.horikyota.ntt%40gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, gilles(at)darold(dot)net, tomas(dot)vondra(at)enterprisedb(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, aekorotkov(at)gmail(dot)com, a(dot)lubennikova(at)postgrespro(dot)ru, daniel(at)yesql(dot)se, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-04-06 11:24:12
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 29 Feb 2024, at 06:59, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Sat, 3 Feb 2024 22:32:45 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>> Here's the test draft. This test reliably reproduces sleep on CV when waiting next multixact to be filled into "members" SLRU.
>
> By the way, I raised a question about using multiple CVs
> simultaneously [1]. That is, I suspect that the current CV
> implementation doesn't allow us to use multiple condition variables at
> the same time, because all CVs use the same PCPROC member cvWaitLink
> to accommodate different waiter sets. If this assumption is correct,
> we should resolve the issue before spreading more uses of CVs.

Alvaro, Kyotaro, what's our plan for this?
It seems to late to deal with this pg_usleep(1000L) for PG17.
I propose following course of action
1. Close this long-standing CF item
2. Start new thread with CV-sleep patch aimed at PG18
3. Create new entry in July CF

What do you think?

Best regards, Andrey Borodin.

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-04-07 12:52:16
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 6 Apr 2024, at 14:24, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> What do you think?

OK, I'll follow this plan.
As long as most parts of this thread were committed, I'll mark CF item as "committed".
Thanks to everyone involved!
See you in a followup thread about sleeping on CV.

Best regards, Andrey Borodin.

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-04-07 18:41:41
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Feb-03, Andrey M. Borodin wrote:

> Here's the test draft. This test reliably reproduces sleep on CV when waiting next multixact to be filled into "members" SLRU.
> Cost of having this test:
> 1. We need a new injection point type "wait" (in addition to "error" and "notice"). It cannot be avoided, because we need to sync at least 3 processed to observe condition we want.
> 2. We need new way to declare injection point that can happen inside critical section. I've called it "prepared injection point".
>
> Complexity of having this test is higher than complexity of CV-sleep patch itself. Do we want it? If so I can produce cleaner version, currently all multixact tests are int injection_points test module.

Well, it would be nice to have *some* test, but as you say it is way
more complex than the thing being tested, and it zooms in on the
functioning of the multixact creation in insane quantum-physics ways ...
to the point that you can no longer trust that multixact works the same
way with the test than without it. So what I did is manually run other
tests (pgbench) to verify that the corner case in multixact creation is
being handled correctly, and pushed the patch after a few corrections,
in particular so that it would follow the CV sleeping protocol a little
more closely to what the CV documentation suggests, which is not at all
what the patch did. I also fixed a few comments that had been neglected
and changed the name and description of the CV in the docs.

Now, maybe we can still add the test later, but it needs a rebase.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
“Cuando no hay humildad las personas se degradan” (A. Christie)

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-04-07 19:13:12
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 7 Apr 2024, at 21:41, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> Well, it would be nice to have *some* test, but as you say it is way
> more complex than the thing being tested, and it zooms in on the
> functioning of the multixact creation in insane quantum-physics ways ...
> to the point that you can no longer trust that multixact works the same
> way with the test than without it. So what I did is manually run other
> tests (pgbench) to verify that the corner case in multixact creation is
> being handled correctly, and pushed the patch after a few corrections,
> in particular so that it would follow the CV sleeping protocol a little
> more closely to what the CV documentation suggests, which is not at all
> what the patch did. I also fixed a few comments that had been neglected
> and changed the name and description of the CV in the docs.

That's excellent! Thank you!

> Now, maybe we can still add the test later, but it needs a rebase.

Sure. 'wait' injection points are there already, so I'll produce patch with "prepared" injection points and re-implement test on top of that. I'll put it on July CF.

Best regards, Andrey Borodin.

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-07-05 18:18:32
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 5 Jul 2024, at 14:16, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Mon, Jun 10, 2024 at 03:10:33PM +0900, Michael Paquier wrote:
>> OK, cool. I'll try to get that into the tree once v18 opens up.
>
> And I've spent more time on this one, and applied it to v18 after some
> slight tweaks. Please feel free to re-post your tests with
> multixacts, Andrey.

Thanks Michael!

> On 7 Apr 2024, at 23:41, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> Now, maybe we can still add the test later, but it needs a rebase.

Alvaro, please find attached the test.
I’ve addressed some Michael’s comments in a nearby thread: removed extra load, made injection point names lowercase, fixed some grammar issues.

Best regards, Andrey Borodin.

Attachment	Content-Type	Size
vJul5-0001-Add-multixact-CV-sleep-test.patch	application/octet-stream	9.5 KB

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-19 07:35:53
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 5 Jul 2024, at 23:18, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> Alvaro, please find attached the test.
> I’ve addressed some Michael’s comments in a nearby thread: removed extra load, made injection point names lowercase, fixed some grammar issues.

I’ve made several runs on Github to test stability [0, 1, 2, 4]. CI seems to be stable.

Thanks!

Best regards, Andrey Borodin.

[0] https://github.com/x4m/postgres_g/commit/c9c362679f244
[1] https://github.com/x4m/postgres_g/commit/9d7e43cc1
[2] https://github.com/x4m/postgres_g/commit/18cf186617
[3] https://github.com/x4m/postgres_g/commit/4fbce73997

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-20 18:37:34
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Aug-19, Andrey M. Borodin wrote:

> > On 5 Jul 2024, at 23:18, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> >
> > Alvaro, please find attached the test.
> > I’ve addressed some Michael’s comments in a nearby thread: removed
> > extra load, made injection point names lowercase, fixed some grammar
> > issues.
>
> I’ve made several runs on Github to test stability [0, 1, 2, 4]. CI seems to be stable.

OK, I've made some minor adjustments and pushed. CI seemed OK for me,
let's see what does the BF have to say.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-20 23:30:45
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 20, 2024 at 02:37:34PM -0400, Alvaro Herrera wrote:
> OK, I've made some minor adjustments and pushed. CI seemed OK for me,
> let's see what does the BF have to say.

I see that you've gone the way with the SQL function doing a load().
Would it be worth switching the test to rely on the two macros for
load and caching instead? I've mentioned that previously but never
got down to present a patch for the sake of this test.

This requires some more tweaks in the module to disable the stats when
loaded through a GUC, and two shmem callbacks, then the test is able
to work correctly. Please see attached.

Thoughts?
--
Michael

Attachment	Content-Type	Size
0001-Refactor-injection-point-test-with-loading.patch	text/x-diff	8.8 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-21 00:13:12
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Aug-21, Michael Paquier wrote:

> I see that you've gone the way with the SQL function doing a load().
> Would it be worth switching the test to rely on the two macros for
> load and caching instead? I've mentioned that previously but never
> got down to present a patch for the sake of this test.

Hmm, I have no opinion on which way is best. You probably have a better
sense of what's better for the injections point interface, so I'm happy
to defer to you on this.

> + /* reset in case this is a restart within the postmaster */
> + inj_state = NULL;

I'm not sure that this assignment actually accomplishes anything ...

I don't understand what do the inj_stats_enabled stuff have to do with
this patch. I suspect it's a git operation error, ie., you seem to have
squashed two different things together.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees." (E. Dijkstra)

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-21 03:46:31
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 20, 2024 at 08:13:12PM -0400, Alvaro Herrera wrote:
> I don't understand what do the inj_stats_enabled stuff have to do with
> this patch. I suspect it's a git operation error, ie., you seem to have
> squashed two different things together.

Sorry, I should have split that for clarity (one patch for the GUC,
one to change the test to use CACHED/LOAD). It is not an error
though: if we don't have a controlled way to disable the stats of the
module, then the test would fail when calling the cached callback
because we'd try to allocate some memory for the dshash entry in
pgstats.

The second effect of initializing the shmem state of the module with
shared_preload_libraries is condition variables are set up for the
sake if the test, removing the dependency to the SQL load() call.
Both are OK, but I'd prefer introducing one use case for these two
macros in the tree, so as these can be used as a reference in the
future when developing new tests.
--
Michael

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-21 06:26:05
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 21, 2024 at 12:46:31PM +0900, Michael Paquier wrote:
> Sorry, I should have split that for clarity (one patch for the GUC,
> one to change the test to use CACHED/LOAD). It is not an error
> though: if we don't have a controlled way to disable the stats of the
> module, then the test would fail when calling the cached callback
> because we'd try to allocate some memory for the dshash entry in
> pgstats.
>
> The second effect of initializing the shmem state of the module with
> shared_preload_libraries is condition variables are set up for the
> sake if the test, removing the dependency to the SQL load() call.
> Both are OK, but I'd prefer introducing one use case for these two
> macros in the tree, so as these can be used as a reference in the
> future when developing new tests.

In short, here is a better patch set, with 0001 and 0002 introducing
the pieces that the test would need to be able to use the LOAD() and
CACHED() macros in 0003:
- 0001: Add shmem callbacks to initialize shmem state of
injection_points with shared_preload_libraries.
- 0002: Add a GUC to control if the stats of the module are enabled.
By default, they are disabled as they are only needed in the TAP test
of injection_points for the stats.
- 0003: Update the SLRU test to use INJECTION_POINT_LOAD and
INJECTION_POINT_CACHED with injection_points loaded via
shared_preload_libraries, removing the call to
injection_points_load() in the perl test.

What do you think?
--
Michael

Attachment	Content-Type	Size
0001-injection_points-Add-initialization-of-shmem-state-a.patch	text/x-diff	3.6 KB
0002-injection_point-Add-injection_points.stats.patch	text/x-diff	5.7 KB
0003-Rework-new-SLRU-test-with-injection-points.patch	text/x-diff	2.3 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-21 17:55:06
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Aug-21, Michael Paquier wrote:

> From fd8ab7b6845a2c56aa2c8d9c60f404f6b3407338 Mon Sep 17 00:00:00 2001
> From: Michael Paquier <michael(at)paquier(dot)xyz>
> Date: Wed, 21 Aug 2024 15:16:06 +0900
> Subject: [PATCH 2/3] injection_point: Add injection_points.stats

> This GUC controls if statistics should be used or not in the module.
> Custom statistics require the module to be loaded with
> shared_preload_libraries, hence this GUC is made PGC_POSTMASTER. By
> default, stats are disabled.
>
> This will be used by an upcoming change in a test where stats should not
> be used, as the test has a dependency on a critical section.

I find it's strange that the information that stats cannot be used with
injection points that have dependency on critical sections (?), is only
in the commit message and not in the code.

Also, maybe it'd make sense for stats to be globally enabled, and that
only the tests that require it would disable them? (It's probably easy
enough to have a value "injection_points.stats=auto" which means, if the
module is loaded in shared_preload_libraries them set stats on,
otherwise turn them off.) TBH I don't understand why the issue that
stats require shared_preload_libraries only comes up now ... Maybe
another approach is to say that if an injection point is loaded via
_LOAD() rather than the normal way, then stats are disabled for that one
rather than globally? Or give the _LOAD() macro a boolean argument to
indicate whether to collect stats for that injection point or not.

Lastly, it's not clear to me what does it mean that the test has a
"dependency" on a critical section. Do you mean to say that the
injection point runs inside a critical section?

> From e5329d080b9d8436af8f65aac118745cf1f81ca2 Mon Sep 17 00:00:00 2001
> From: Michael Paquier <michael(at)paquier(dot)xyz>
> Date: Wed, 21 Aug 2024 15:09:06 +0900
> Subject: [PATCH 3/3] Rework new SLRU test with injection points
>
> Rather than the SQL injection_points_load, this commit makes the test
> rely on the two macros to load and run an injection point from the
> cache, acting as an example of how to use them.

No issues with this, feel free to go ahead.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, i(dot)lazarev(at)postgrespro(dot)ru, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-22 00:29:30
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 21, 2024 at 01:55:06PM -0400, Alvaro Herrera wrote:
> I find it's strange that the information that stats cannot be used with
> injection points that have dependency on critical sections (?), is only
> in the commit message and not in the code.

A comment close to where inj_stats_enabled is declared in
injection_points.c may be adapted for that, say:
"This GUC is useful to control if statistics should be enabled or not
during a test with injection points, like for example if a test relies
on a callback run in a critical section where no allocation should
happen."

> Also, maybe it'd make sense for stats to be globally enabled, and that
> only the tests that require it would disable them? (It's probably easy
> enough to have a value "injection_points.stats=auto" which means, if the
> module is loaded in shared_preload_libraries them set stats on,
> otherwise turn them off.)

I'm not sure that we need to get down to that until somebody has a
case where they want to rely on stats of injection points for their
stuff. At this stage, I only want the stats to be enabled to provide
automated checks for the custom pgstats APIs, so disabling it by
default and enabling it only in the stats test of the module
injection_points sounds kind of enough to me for now. The module
could always be tweaked to do that in the future, if there's a case.

> TBH I don't understand why the issue that
> stats require shared_preload_libraries only comes up now ...

Because there was no need to, simply. It is the first test that
relies on a critical section, and we need allocations if we want to
use a wait condition.

> Maybe another approach is to say that if an injection point is loaded via
> _LOAD() rather than the normal way, then stats are disabled for that one
> rather than globally?

One trick would be to force the GUC to be false for the duration of
the callback based on a check of CritSectionCount, a second one would
be to just skip the stats if are under CritSectionCount. A third
option, that I find actually interesting, would be to call
MemoryContextAllowInCriticalSection in some strategic code paths of
the test module injection_points because we're OK to live with this
restriction in the module.

> Or give the _LOAD() macro a boolean argument to
> indicate whether to collect stats for that injection point or not.

Sticking some knowledge about the stats in the backend part of
injection points does not sound like a good idea to me.

> Lastly, it's not clear to me what does it mean that the test has a
> "dependency" on a critical section. Do you mean to say that the
> injection point runs inside a critical section?

Yes.

> No issues with this, feel free to go ahead.

Cool, thanks.
--
Michael

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-22 14:36:38
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Aug-22, Michael Paquier wrote:

> On Wed, Aug 21, 2024 at 01:55:06PM -0400, Alvaro Herrera wrote:

> > Also, maybe it'd make sense for stats to be globally enabled, and that
> > only the tests that require it would disable them? (It's probably easy
> > enough to have a value "injection_points.stats=auto" which means, if the
> > module is loaded in shared_preload_libraries them set stats on,
> > otherwise turn them off.)
>
> I'm not sure that we need to get down to that until somebody has a
> case where they want to rely on stats of injection points for their
> stuff. At this stage, I only want the stats to be enabled to provide
> automated checks for the custom pgstats APIs, so disabling it by
> default and enabling it only in the stats test of the module
> injection_points sounds kind of enough to me for now.

Oh! I thought the stats were useful by themselves. That not being the
case, I agree with simplifying; and the other ways to enhance this point
might not be necessary for now.

> > Or give the _LOAD() macro a boolean argument to
> > indicate whether to collect stats for that injection point or not.
>
> Sticking some knowledge about the stats in the backend part of
> injection points does not sound like a good idea to me.

You could flip this around: have the bool be for "this injection point
is going to be invoked inside a critical section". Then core code just
needs to tell the injection points module what core code does, and it's
injection_points that decides what to do with that information.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Crear es tan difícil como ser libre" (Elsa Triolet)

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-08-23 00:29:13
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 22, 2024 at 10:36:38AM -0400, Alvaro Herrera wrote:
> On 2024-Aug-22, Michael Paquier wrote:
>> I'm not sure that we need to get down to that until somebody has a
>> case where they want to rely on stats of injection points for their
>> stuff. At this stage, I only want the stats to be enabled to provide
>> automated checks for the custom pgstats APIs, so disabling it by
>> default and enabling it only in the stats test of the module
>> injection_points sounds kind of enough to me for now.
>
> Oh! I thought the stats were useful by themselves.

Yep, currently they're not, but I don't want to discard that they'll
never be, either. Perhaps there would be a case where somebody would
like to run a callback N times and trigger a condition? That's
something where the stats could be useful, but I don't have a specific
case for that now. I'm just imagining possibilities.

> That not being the case, I agree with simplifying; and the other
> ways to enhance this point might not be necessary for now.

Okay.

>> Sticking some knowledge about the stats in the backend part of
>> injection points does not sound like a good idea to me.
>
> You could flip this around: have the bool be for "this injection point
> is going to be invoked inside a critical section". Then core code just
> needs to tell the injection points module what core code does, and it's
> injection_points that decides what to do with that information.

Hmm. We could do that, but I'm not really on board with anything that
adds more code footprint into the backend. For this one, this is even
information specific to the code path where the injection point is
added.
--
Michael

From:	Thom Brown <thom(at)linux(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-10-29 16:38:28
Message-ID:	CAA-aLv4h=58ggum1-ChLEH3STPXrC4K_cbqYxQ2i2aYMfdwLzA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 23 Aug 2024 at 01:29, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Aug 22, 2024 at 10:36:38AM -0400, Alvaro Herrera wrote:
> > On 2024-Aug-22, Michael Paquier wrote:
> >> I'm not sure that we need to get down to that until somebody has a
> >> case where they want to rely on stats of injection points for their
> >> stuff. At this stage, I only want the stats to be enabled to provide
> >> automated checks for the custom pgstats APIs, so disabling it by
> >> default and enabling it only in the stats test of the module
> >> injection_points sounds kind of enough to me for now.
> >
> > Oh! I thought the stats were useful by themselves.
>
> Yep, currently they're not, but I don't want to discard that they'll
> never be, either. Perhaps there would be a case where somebody would
> like to run a callback N times and trigger a condition? That's
> something where the stats could be useful, but I don't have a specific
> case for that now. I'm just imagining possibilities.
>

I believe I am seeing the problem being discussed occuring on a production
system running 15.6, causing ever-increasing replay lag on the standby,
until I cancel the offending process on the standby and force it to process
its interrupts.

Here's the backtrace before I do that:

#0 0x00007f4503b81876 in select () from /lib64/libc.so.6
#1 0x0000558b0956891a in pg_usleep (microsec=microsec(at)entry=1000) at
pgsleep.c:56
#2 0x0000558b0917e01a in GetMultiXactIdMembers (from_pgupgrade=false,
onlyLock=<optimized out>, members=0x7ffcd2a9f1e0, multi=109187502) at
multixact.c:1392
#3 GetMultiXactIdMembers (multi=109187502,
members=members(at)entry=0x7ffcd2a9f1e0,
from_pgupgrade=from_pgupgrade(at)entry=false, onlyLock=onlyLock(at)entry=false)
at multixact.c:1224
#4 0x0000558b0913de15 in MultiXactIdGetUpdateXid (xmax=<optimized out>,
t_infomask=<optimized out>) at heapam.c:6924
#5 0x0000558b09146028 in HeapTupleGetUpdateXid
(tuple=tuple(at)entry=0x7f440d428308)
at heapam.c:6965
#6 0x0000558b0914c02f in HeapTupleSatisfiesMVCC (htup=0x558b0b7cbf20,
htup=0x558b0b7cbf20, buffer=8053429, snapshot=0x558b0b63a2d8) at
heapam_visibility.c:1089
#7 HeapTupleSatisfiesVisibility (tup=tup(at)entry=0x7ffcd2a9f2b0,
snapshot=snapshot(at)entry=0x558b0b63a2d8, buffer=buffer(at)entry=8053429) at
heapam_visibility.c:1771
#8 0x0000558b0913e819 in heapgetpage (sscan=sscan(at)entry=0x558b0b7ccfa0,
page=page(at)entry=115) at heapam.c:468
#9 0x0000558b0913eb7e in heapgettup_pagemode (scan=scan(at)entry=0x558b0b7ccfa0,
dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:1120
#10 0x0000558b0913fb5e in heap_getnextslot (sscan=0x558b0b7ccfa0,
direction=<optimized out>, slot=0x558b0b7cc000) at heapam.c:1352
#11 0x0000558b092c0e7a in table_scan_getnextslot (slot=0x558b0b7cc000,
direction=ForwardScanDirection, sscan=<optimized out>) at
../../../src/include/access/tableam.h:1046
#12 SeqNext (node=0x558b0b7cbe10) at nodeSeqscan.c:80
#13 0x0000558b0929b9bf in ExecScan (node=0x558b0b7cbe10,
accessMtd=0x558b092c0df0 <SeqNext>, recheckMtd=0x558b092c0dc0 <SeqRecheck>)
at execScan.c:198
#14 0x0000558b09292cb2 in ExecProcNode (node=0x558b0b7cbe10) at
../../../src/include/executor/executor.h:262
#15 ExecutePlan (execute_once=<optimized out>, dest=0x558b0bca1350,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x558b0b7cbe10, estate=0x558b0b7cbbe8)
at execMain.c:1636
#16 standard_ExecutorRun (queryDesc=0x558b0b8c9798, direction=<optimized
out>, count=0, execute_once=<optimized out>) at execMain.c:363
#17 0x00007f44f64d43c5 in pgss_ExecutorRun (queryDesc=0x558b0b8c9798,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
pg_stat_statements.c:1010
#18 0x0000558b093fda0f in PortalRunSelect (portal=portal(at)entry=0x558b0b6ba458,
forward=forward(at)entry=true, count=0, count(at)entry=9223372036854775807,
dest=dest(at)entry=0x558b0bca1350) at pquery.c:924
#19 0x0000558b093fedb8 in PortalRun (portal=portal(at)entry=0x558b0b6ba458,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true,
run_once=run_once(at)entry=true, dest=dest(at)entry=0x558b0bca1350,
altdest=altdest(at)entry=0x558b0bca1350, qc=0x7ffcd2a9f7a0) at pquery.c:768
#20 0x0000558b093fb243 in exec_simple_query (
query_string=0x558b0b6170c8 "<redacted>")
at postgres.c:1250
#21 0x0000558b093fd412 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4598
#22 0x0000558b0937e170 in BackendRun (port=<optimized out>, port=<optimized
out>) at postmaster.c:4514
#23 BackendStartup (port=<optimized out>) at postmaster.c:4242
#24 ServerLoop () at postmaster.c:1809
#25 0x0000558b0937f147 in PostmasterMain (argc=argc(at)entry=5,
argv=argv(at)entry=0x558b0b5d0a30)
at postmaster.c:1481
#26 0x0000558b09100a2c in main (argc=5, argv=0x558b0b5d0a30) at main.c:202

This occurred twice, meaning 2 processes needed terminating.

Thom

From:	Thom Brown <thom(at)linux(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-10-29 17:45:02
Message-ID:	CAA-aLv7AcTh+O9rnoPVbck=Fw8atMOYrUttk5G9ctFhXOsqQbw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 29 Oct 2024 at 16:38, Thom Brown <thom(at)linux(dot)com> wrote:
>
> On Fri, 23 Aug 2024 at 01:29, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>
>> On Thu, Aug 22, 2024 at 10:36:38AM -0400, Alvaro Herrera wrote:
>> > On 2024-Aug-22, Michael Paquier wrote:
>> >> I'm not sure that we need to get down to that until somebody has a
>> >> case where they want to rely on stats of injection points for their
>> >> stuff. At this stage, I only want the stats to be enabled to provide
>> >> automated checks for the custom pgstats APIs, so disabling it by
>> >> default and enabling it only in the stats test of the module
>> >> injection_points sounds kind of enough to me for now.
>> >
>> > Oh! I thought the stats were useful by themselves.
>>
>> Yep, currently they're not, but I don't want to discard that they'll
>> never be, either. Perhaps there would be a case where somebody would
>> like to run a callback N times and trigger a condition? That's
>> something where the stats could be useful, but I don't have a specific
>> case for that now. I'm just imagining possibilities.
>
>
> I believe I am seeing the problem being discussed occuring on a production system running 15.6, causing ever-increasing replay lag on the standby, until I cancel the offending process on the standby and force it to process its interrupts.
>
> Here's the backtrace before I do that:
>
> #0 0x00007f4503b81876 in select () from /lib64/libc.so.6
> #1 0x0000558b0956891a in pg_usleep (microsec=microsec(at)entry=1000) at pgsleep.c:56
> #2 0x0000558b0917e01a in GetMultiXactIdMembers (from_pgupgrade=false, onlyLock=<optimized out>, members=0x7ffcd2a9f1e0, multi=109187502) at multixact.c:1392
> #3 GetMultiXactIdMembers (multi=109187502, members=members(at)entry=0x7ffcd2a9f1e0, from_pgupgrade=from_pgupgrade(at)entry=false, onlyLock=onlyLock(at)entry=false) at multixact.c:1224
> #4 0x0000558b0913de15 in MultiXactIdGetUpdateXid (xmax=<optimized out>, t_infomask=<optimized out>) at heapam.c:6924
> #5 0x0000558b09146028 in HeapTupleGetUpdateXid (tuple=tuple(at)entry=0x7f440d428308) at heapam.c:6965
> #6 0x0000558b0914c02f in HeapTupleSatisfiesMVCC (htup=0x558b0b7cbf20, htup=0x558b0b7cbf20, buffer=8053429, snapshot=0x558b0b63a2d8) at heapam_visibility.c:1089
> #7 HeapTupleSatisfiesVisibility (tup=tup(at)entry=0x7ffcd2a9f2b0, snapshot=snapshot(at)entry=0x558b0b63a2d8, buffer=buffer(at)entry=8053429) at heapam_visibility.c:1771
> #8 0x0000558b0913e819 in heapgetpage (sscan=sscan(at)entry=0x558b0b7ccfa0, page=page(at)entry=115) at heapam.c:468
> #9 0x0000558b0913eb7e in heapgettup_pagemode (scan=scan(at)entry=0x558b0b7ccfa0, dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:1120
> #10 0x0000558b0913fb5e in heap_getnextslot (sscan=0x558b0b7ccfa0, direction=<optimized out>, slot=0x558b0b7cc000) at heapam.c:1352
> #11 0x0000558b092c0e7a in table_scan_getnextslot (slot=0x558b0b7cc000, direction=ForwardScanDirection, sscan=<optimized out>) at ../../../src/include/access/tableam.h:1046
> #12 SeqNext (node=0x558b0b7cbe10) at nodeSeqscan.c:80
> #13 0x0000558b0929b9bf in ExecScan (node=0x558b0b7cbe10, accessMtd=0x558b092c0df0 <SeqNext>, recheckMtd=0x558b092c0dc0 <SeqRecheck>) at execScan.c:198
> #14 0x0000558b09292cb2 in ExecProcNode (node=0x558b0b7cbe10) at ../../../src/include/executor/executor.h:262
> #15 ExecutePlan (execute_once=<optimized out>, dest=0x558b0bca1350, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x558b0b7cbe10, estate=0x558b0b7cbbe8)
> at execMain.c:1636
> #16 standard_ExecutorRun (queryDesc=0x558b0b8c9798, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
> #17 0x00007f44f64d43c5 in pgss_ExecutorRun (queryDesc=0x558b0b8c9798, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:1010
> #18 0x0000558b093fda0f in PortalRunSelect (portal=portal(at)entry=0x558b0b6ba458, forward=forward(at)entry=true, count=0, count(at)entry=9223372036854775807, dest=dest(at)entry=0x558b0bca1350) at pquery.c:924
> #19 0x0000558b093fedb8 in PortalRun (portal=portal(at)entry=0x558b0b6ba458, count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true, run_once=run_once(at)entry=true, dest=dest(at)entry=0x558b0bca1350,
> altdest=altdest(at)entry=0x558b0bca1350, qc=0x7ffcd2a9f7a0) at pquery.c:768
> #20 0x0000558b093fb243 in exec_simple_query (
> query_string=0x558b0b6170c8 "<redacted>")
> at postgres.c:1250
> #21 0x0000558b093fd412 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4598
> #22 0x0000558b0937e170 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4514
> #23 BackendStartup (port=<optimized out>) at postmaster.c:4242
> #24 ServerLoop () at postmaster.c:1809
> #25 0x0000558b0937f147 in PostmasterMain (argc=argc(at)entry=5, argv=argv(at)entry=0x558b0b5d0a30) at postmaster.c:1481
> #26 0x0000558b09100a2c in main (argc=5, argv=0x558b0b5d0a30) at main.c:202
>
> This occurred twice, meaning 2 processes needed terminating.

Taking a look at what's happening under the hood, it seems to be
getting stuck here:

if (nextMXOffset == 0)
{
/* Corner case 2: next multixact is still
being filled in */
LWLockRelease(MultiXactOffsetSLRULock);
CHECK_FOR_INTERRUPTS();
pg_usleep(1000L);
goto retry;
}

It clearly checks for interrupts, but when I saw this issue happen, it
wasn't interruptible.

I looked around for similar reports, and I found a fork of Postgres
reporting the same issue, but with more info than I thought to get at
the time. They determined that the root cause involved a deadlock
where the replay thread is stuck due to an out-of-order multixact
playback sequence. The standby process waits indefinitely for a page
pin to release before proceeding, but the required multixact ID hasn't
yet been replayed due to log ordering. What they did to "fix" the
issue in their version was to add an error when the reading thread
cannot retrieve the next multixact ID, allowing it to exit instead of
becoming indefinitely stuck. This took the form of only allowing it to
loop 100 times before exiting.

Thom

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-10-31 10:46:37
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 29 Oct 2024, at 21:45, Thom Brown <thom(at)linux(dot)com> wrote:
>
> It clearly checks for interrupts, but when I saw this issue happen, it
> wasn't interruptible.

Perhaps, that was different multixacts?
When I was observing this pathology on Standby, it was a stream of different reads encountering different multis.

Either way startup can cancel locking process on it's own. Or is it the case that cancel was not enough, did you actually need termination, not cancel?

Thanks!

Best regards, Andrey Borodin.

From:	Thom Brown <thom(at)linux(dot)com>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-10-31 13:29:56
Message-ID:	CAA-aLv5PzDqYV0zyP=zYfvMrN9DiLo6vsDe04PUJoYXRGVa=rQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 31 Oct 2024 at 10:47, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
>
> > On 29 Oct 2024, at 21:45, Thom Brown <thom(at)linux(dot)com> wrote:
> >
> > It clearly checks for interrupts, but when I saw this issue happen, it
> > wasn't interruptible.
>
> Perhaps, that was different multixacts?
> When I was observing this pathology on Standby, it was a stream of different reads encountering different multis.
>
> Either way startup can cancel locking process on it's own. Or is it the case that cancel was not enough, did you actually need termination, not cancel?

Termination didn't work on either of the processes.

Thom

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-10-31 17:32:42
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 31 Oct 2024, at 17:29, Thom Brown <thom(at)linux(dot)com> wrote:
>
> On Thu, 31 Oct 2024 at 10:47, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>
>>
>>
>>> On 29 Oct 2024, at 21:45, Thom Brown <thom(at)linux(dot)com> wrote:
>>>
>>> It clearly checks for interrupts, but when I saw this issue happen, it
>>> wasn't interruptible.
>>
>> Perhaps, that was different multixacts?
>> When I was observing this pathology on Standby, it was a stream of different reads encountering different multis.
>>
>> Either way startup can cancel locking process on it's own. Or is it the case that cancel was not enough, did you actually need termination, not cancel?
>
> Termination didn't work on either of the processes.

How did you force the process to actually terminate?
Did you observe repeated read of the same multixact?
Was offending process holding any locks while waiting?

Best regards, Andrey Borodin.

From:	Thom Brown <thom(at)linux(dot)com>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-10-31 19:25:11
Message-ID:	CAA-aLv7-fs32r7v8znJ=uPPtQNqZ9_sH2REn=cu1pyEQ=tuEhw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 31 Oct 2024, 17:33 Andrey M. Borodin, <x4mmm(at)yandex-team(dot)ru> wrote:

>
>
> > On 31 Oct 2024, at 17:29, Thom Brown <thom(at)linux(dot)com> wrote:
> >
> > On Thu, 31 Oct 2024 at 10:47, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru>
> wrote:
> >>
> >>
> >>
> >>> On 29 Oct 2024, at 21:45, Thom Brown <thom(at)linux(dot)com> wrote:
> >>>
> >>> It clearly checks for interrupts, but when I saw this issue happen, it
> >>> wasn't interruptible.
> >>
> >> Perhaps, that was different multixacts?
> >> When I was observing this pathology on Standby, it was a stream of
> different reads encountering different multis.
> >>
> >> Either way startup can cancel locking process on it's own. Or is it the
> case that cancel was not enough, did you actually need termination, not
> cancel?
> >
> > Termination didn't work on either of the processes.
>
> How did you force the process to actually terminate?
> Did you observe repeated read of the same multixact?
> Was offending process holding any locks while waiting?
>

Unfortunately I didn't gather much information when it was occuring, and
prioritised getting rid of the process blocking replay. I just attached gdb
to it, got a backtrace and then "print ProcessInterrupts()".

Thom

From:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-11-05 14:11:07
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 1 Nov 2024, at 00:25, Thom Brown <thom(at)linux(dot)com> wrote:
>
> Unfortunately I didn't gather much information when it was occuring, and prioritised getting rid of the process blocking replay. I just attached gdb to it, got a backtrace and then "print ProcessInterrupts()".
>

Currently I do not see how this wait can result in a deadlock.
But I did observe standby in a pathological sequential scan encountering recent multixact again and again (new each time).
I hope this situation will be alleviated by recent cahnges - now there is not a millisecond wait, but hopefully smaller amount of time.

In v17 we also added injection point test which reproduces reading very recent multixact. If there is a deadlock I hope buildfarm will show it to us.

Best regards, Andrey Borodin.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MultiXact\SLRU buffers configuration
Date:	2024-11-05 15:58:37
Message-ID:	CA+TgmoarL2nTionJxAgJcuOWm3S9d0hBMHmQ78-6HoO1JgphHQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 29, 2024 at 1:45 PM Thom Brown <thom(at)linux(dot)com> wrote:
> Taking a look at what's happening under the hood, it seems to be
> getting stuck here:
>
> if (nextMXOffset == 0)
> {
> /* Corner case 2: next multixact is still
> being filled in */
> LWLockRelease(MultiXactOffsetSLRULock);
> CHECK_FOR_INTERRUPTS();
> pg_usleep(1000L);
> goto retry;
> }
>
> It clearly checks for interrupts, but when I saw this issue happen, it
> wasn't interruptible.

I don't understand the underlying issue here; however, if a process
holds an lwlock, it's not interruptible, even if it checks for
interrupts. So it could be that at this point in the code we're
holding some other LWLock, such as a buffer content lock, and that's
why this code fails to achieve its objective.

--
Robert Haas
EDB: http://www.enterprisedb.com