| Lists: | pgsql-hackers |
|---|
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Cc: | dipiets(at)amazon(dot)com |
| Subject: | use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2024-10-22 19:54:57 |
| Message-ID: | ZxgDEb_VpWyNZKB_@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
My colleague Salvatore Dipietro (CC'd) sent me a couple of profiles that
showed an enormous amount of s_lock() time going to the
__sync_lock_test_and_set() call in the AArch64 implementation of tas().
Upon closer inspection, I noticed that we don't implement a custom
TAS_SPIN() for this architecture, so I quickly hacked together the attached
patch and ran a couple of benchmarks that stressed the spinlock code. I
found no discussion about TAS_SPIN() on ARM in the archives, but I did
notice that the initial AArch64 support was added [0] before x86_64 started
using a non-locking test [1].
These benchmarks are for a c8g.24xlarge running a select-only pgbench with
256 clients and pg_stat_statements.track_planning enabled.
without the patch:
90.04% postgres [.] s_lock
1.07% pg_stat_statements.so [.] pgss_store
0.59% postgres [.] LWLockRelease
0.56% postgres [.] perform_spin_delay
0.31% [kernel] [k] arch_local_irq_enable
| while (TAS_SPIN(lock))
| {
| perform_spin_delay(&delayStatus);
0.12 |2c: -> bl perform_spin_delay
| tas():
| return __sync_lock_test_and_set(lock, 1);
0.01 |30: swpa w20, w1, [x19]
| s_lock():
99.87 | add x0, sp, #0x28
| while (TAS_SPIN(lock))
0.00 | ^ cbnz w1, 2c
tps = 74135.100891 (without initial connection time)
with the patch:
30.46% postgres [.] s_lock
5.88% postgres [.] perform_spin_delay
4.61% [kernel] [k] arch_local_irq_enable
3.31% [kernel] [k] next_uptodate_page
2.50% postgres [.] hash_search_with_hash_value
| while (TAS_SPIN(lock))
| {
| perform_spin_delay(&delayStatus);
0.63 |2c:+-->add x0, sp, #0x28
0.07 | |-> bl perform_spin_delay
| |while (TAS_SPIN(lock))
0.25 |34:| ldr w0, [x19]
65.19 | +--cbnz w0, 2c
| tas():
| return __sync_lock_test_and_set(lock, 1);
0.00 | swpa w20, w0, [x19]
| s_lock():
33.82 | ^ cbnz w0, 2c
tps = 549462.785554 (without initial connection time)
[0] https://postgr.es/c/5c7603c
[1] https://postgr.es/c/b03d196
--
nathan
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Use-a-non-locking-initial-test-in-TAS_SPIN-on-AAr.patch | text/plain | 870 bytes |
| From: | Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2024-10-23 03:01:05 |
| Message-ID: | CAPsk3_Bx5PhS9A+NH7SO2K-bwrZS7q4i-WTy1Bf98j0CAEqa=g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Hi~
Upon closer inspection, I noticed that we don't implement a custom
> TAS_SPIN() for this architecture, so I quickly hacked together the attached
> patch and ran a couple of benchmarks that stressed the spinlock code. I
> found no discussion about TAS_SPIN() on ARM in the archives, but I did
> notice that the initial AArch64 support was added [0] before x86_64 started
> using a non-locking test [1].
>
It reminds me of a discussion about improving spinlock performance on ARM
in 2020 [0], though the discussion is about CAS and TAS, not TAS_SPIN()
itself.
> tps = 74135.100891 (without initial connection time)
> tps = 549462.785554 (without initial connection time)
The result looks great, but the discussion in [0] shows that the result may
vary among different ARM chips. Could you provide the chip model of this
test? So that we can do a cross validation of this patch. Not sure if
compiler
version is necessary too. I'm willing to test it on Alibaba Cloud Yitian 710
if I have time.
- Regards
Jingtang
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2024-10-23 14:46:56 |
| Message-ID: | ZxkMYErAZKcnpwuY@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Wed, Oct 23, 2024 at 11:01:05AM +0800, Jingtang Zhang wrote:
> The result looks great, but the discussion in [0] shows that the result may
> vary among different ARM chips. Could you provide the chip model of this
> test? So that we can do a cross validation of this patch.
This is on a c8g.24xlarge, which is using Neoverse-V2 and Armv9.0-a [0].
> I'm willing to test it on Alibaba Cloud Yitian 710 if I have time.
That would be great. I have a couple of Apple M-series machines I can
test, too.
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2024-10-23 15:18:55 |
| Message-ID: | ZxkT3_2zQiXoh3eW@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Wed, Oct 23, 2024 at 09:46:56AM -0500, Nathan Bossart wrote:
> I have a couple of Apple M-series machines I can test, too.
After some preliminary tests on an M3, I'm not seeing any gains outside the
noise range. That's not too surprising because it's likely more difficult
to create a lot of spinlock contention on these smaller machines. But, at
the very least, I'm not seeing a regression.
--
nathan
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Cc: | dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-08 18:12:19 |
| Message-ID: | Z37AA6-SsJoLaifa@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Tue, Oct 22, 2024 at 02:54:57PM -0500, Nathan Bossart wrote:
> My colleague Salvatore Dipietro (CC'd) sent me a couple of profiles that
> showed an enormous amount of s_lock() time going to the
> __sync_lock_test_and_set() call in the AArch64 implementation of tas().
> Upon closer inspection, I noticed that we don't implement a custom
> TAS_SPIN() for this architecture, so I quickly hacked together the attached
> patch and ran a couple of benchmarks that stressed the spinlock code. I
> found no discussion about TAS_SPIN() on ARM in the archives, but I did
> notice that the initial AArch64 support was added [0] before x86_64 started
> using a non-locking test [1].
>
> These benchmarks are for a c8g.24xlarge running a select-only pgbench with
> 256 clients and pg_stat_statements.track_planning enabled.
>
> without the patch:
>
> [...]
>
> tps = 74135.100891 (without initial connection time)
>
> with the patch:
>
> [...]
>
> tps = 549462.785554 (without initial connection time)
Are there any objections to proceeding with this change? So far, it's been
tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to
show any effect). If anyone has access to a larger ARM machine, additional
testing would be greatly appreciated. I think it would be unfortunate if
this slipped to v19.
--
nathan
| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-08 20:07:24 |
| Message-ID: | tdex7aj3zrvdu22dfgtljbrybrjezbjpr374otnadi6qxywkmc@rzxtb2fbubpy |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Hi,
On 2025-01-08 12:12:19 -0600, Nathan Bossart wrote:
> On Tue, Oct 22, 2024 at 02:54:57PM -0500, Nathan Bossart wrote:
> > My colleague Salvatore Dipietro (CC'd) sent me a couple of profiles that
> > showed an enormous amount of s_lock() time going to the
> > __sync_lock_test_and_set() call in the AArch64 implementation of tas().
> > Upon closer inspection, I noticed that we don't implement a custom
> > TAS_SPIN() for this architecture, so I quickly hacked together the attached
> > patch and ran a couple of benchmarks that stressed the spinlock code. I
> > found no discussion about TAS_SPIN() on ARM in the archives, but I did
> > notice that the initial AArch64 support was added [0] before x86_64 started
> > using a non-locking test [1].
> >
> > These benchmarks are for a c8g.24xlarge running a select-only pgbench with
> > 256 clients and pg_stat_statements.track_planning enabled.
> >
> > without the patch:
> >
> > [...]
> >
> > tps = 74135.100891 (without initial connection time)
> >
> > with the patch:
> >
> > [...]
> >
> > tps = 549462.785554 (without initial connection time)
>
> Are there any objections to proceeding with this change? So far, it's been
> tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to
> show any effect). If anyone has access to a larger ARM machine, additional
> testing would be greatly appreciated. I think it would be unfortunate if
> this slipped to v19.
Seems reasonable. It's not surprising that spinning with an atomic operation
doesn't scale very well once a you have more than a handful cores.
Of course the whole way locking works in pg_stat_statements is a scalability
disaster, but that's a bigger project to fix.
Out of curiosity, have you measured whether this has a positive effect without
pg_stat_statements? I think it'll e.g. also affect lwlocks, as they also use
perform_spin_delay().
Greetings,
Andres
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-08 20:23:45 |
| Message-ID: | 1688329.1736367825@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> Are there any objections to proceeding with this change? So far, it's been
> tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to
> show any effect). If anyone has access to a larger ARM machine, additional
> testing would be greatly appreciated. I think it would be unfortunate if
> this slipped to v19.
I just acquired an M4 Pro, which may also be too small to show any
effect, but perhaps running the test there would at least give us
more confidence that there's not a bad effect. Which test case(s)
would you recommend trying?
regards, tom lane
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-08 20:30:56 |
| Message-ID: | Z37ggNnCZp-_gbNc@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Wed, Jan 08, 2025 at 03:23:45PM -0500, Tom Lane wrote:
> I just acquired an M4 Pro, which may also be too small to show any
> effect, but perhaps running the test there would at least give us
> more confidence that there's not a bad effect. Which test case(s)
> would you recommend trying?
Thanks! A select-only pgbench with many clients (I used 256 upthread) and
pg_stat_statements.track_planning enabled seems to be a pretty easy way to
stress spinlocks. The same test without track_planning enabled might also
be interesting. I'm looking for a way to stress LWLocks, too, and I will
share here when I either find an existing test or write something of my
own.
--
nathan
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-08 22:01:19 |
| Message-ID: | Z371r_AVOU0WCrBr@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Wed, Jan 08, 2025 at 03:07:24PM -0500, Andres Freund wrote:
> Out of curiosity, have you measured whether this has a positive effect without
> pg_stat_statements? I think it'll e.g. also affect lwlocks, as they also use
> perform_spin_delay().
AFAICT TAS_SPIN() is only used for s_lock(), which doesn't appear to be
used by LWLocks. But I did retry my test from upthread without
pg_stat_statements and was surprised to find a reproducible 4-6%
regression. I'm not seeing any obvious differences in perf, but I do see
that the thread for adding TAS_SPIN() for PPC mentions a regression at
lower contention levels [0]. Perhaps the non-locked test is failing often
enough to hurt performance in this case... Whatever it is, it'll be mighty
frustrating to miss out on a >7x gain because of a 4% regression.
[0] https://postgr.es/me/26496.1325625436%40sss.pgh.pa.us
--
nathan
| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-08 22:25:24 |
| Message-ID: | tb6d2bg5a3so2325opkaow3ptkvpgpxzz3nnkehb3x7d77g3sx@xjid7uptuoni |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Hi,
On 2025-01-08 16:01:19 -0600, Nathan Bossart wrote:
> On Wed, Jan 08, 2025 at 03:07:24PM -0500, Andres Freund wrote:
> > Out of curiosity, have you measured whether this has a positive effect without
> > pg_stat_statements? I think it'll e.g. also affect lwlocks, as they also use
> > perform_spin_delay().
>
> AFAICT TAS_SPIN() is only used for s_lock(), which doesn't appear to be
> used by LWLocks.
Brainfart on my part, sorry. I was thinking of SPIN_DELAY() for a moment...
> But I did retry my test from upthread without pg_stat_statements and was
> surprised to find a reproducible 4-6% regression.
Uh, huh. I assume this was readonly pgbench with 256 clients just as you had
tested upthread? I don't think there's any hot spinlock meaningfully involved
in that workload? A r/w workload is a different story, but upthread you
mentioned select-only.
Do you see any spinlock in profiles?
> I'm not seeing any obvious differences in perf, but I do see that the thread
> for adding TAS_SPIN() for PPC mentions a regression at lower contention
> levels [0]. Perhaps the non-locked test is failing often enough to hurt
> performance in this case... Whatever it is, it'll be mighty frustrating to
> miss out on a
> >7x gain because of a 4% regression.
I don't think the explanation can be that simple - even with TAS_SPIN defined,
we do try to acquire the lock once without using TAS_SPIN:
#if !defined(S_LOCK)
#define S_LOCK(lock) \
(TAS(lock) ? s_lock((lock), __FILE__, __LINE__, __func__) : 0)
#endif /* S_LOCK */
Only s_lock() then uses TAS_SPIN(lock).
I wonder if you're hitting an extreme case of binary-layout related effects?
I've never seen them at this magnitude though. I'd suggest using either lld
or mold as linker and comparing the numbers for a few
-Wl,--shuffle-sections=$seed seed values.
Greetings,
Andres Freund
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-08 22:38:02 |
| Message-ID: | Z37-Svaqt7XCBcXP@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Wed, Jan 08, 2025 at 05:25:24PM -0500, Andres Freund wrote:
> On 2025-01-08 16:01:19 -0600, Nathan Bossart wrote:
>> But I did retry my test from upthread without pg_stat_statements and was
>> surprised to find a reproducible 4-6% regression.
>
> Uh, huh. I assume this was readonly pgbench with 256 clients just as you had
> tested upthread? I don't think there's any hot spinlock meaningfully involved
> in that workload? A r/w workload is a different story, but upthread you
> mentioned select-only.
>
> Do you see any spinlock in profiles?
Yes, this was using 256 clients. Looking closer, I don't see anything
spinlock related anywhere near the top of perf.
>> I'm not seeing any obvious differences in perf, but I do see that the thread
>> for adding TAS_SPIN() for PPC mentions a regression at lower contention
>> levels [0]. Perhaps the non-locked test is failing often enough to hurt
>> performance in this case... Whatever it is, it'll be mighty frustrating to
>> miss out on a
>> >7x gain because of a 4% regression.
>
> I don't think the explanation can be that simple - even with TAS_SPIN defined,
> we do try to acquire the lock once without using TAS_SPIN:
>
> #if !defined(S_LOCK)
> #define S_LOCK(lock) \
> (TAS(lock) ? s_lock((lock), __FILE__, __LINE__, __func__) : 0)
> #endif /* S_LOCK */
>
> Only s_lock() then uses TAS_SPIN(lock).
Ah, right. FWIW I tried setting a cap on the number of times we do a
non-locked test, and the results still showed the regression, which seems
to match your intuition here.
> I wonder if you're hitting an extreme case of binary-layout related effects?
> I've never seen them at this magnitude though. I'd suggest using either lld
> or mold as linker and comparing the numbers for a few
> -Wl,--shuffle-sections=$seed seed values.
Will do.
--
nathan
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-08 23:07:44 |
| Message-ID: | 1759000.1736377664@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> AFAICT TAS_SPIN() is only used for s_lock(), which doesn't appear to be
> used by LWLocks. But I did retry my test from upthread without
> pg_stat_statements and was surprised to find a reproducible 4-6%
> regression.
On what hardware?
I just spent an hour beating on my M4 Pro (the 14-core variant)
and could not detect any outside-the-noise effect of this patch,
with or without pg_stat_statements loaded. There does seem to be
a small fraction-of-a-percent-ish benefit. But the run-to-run
variation with 60-second "pgbench -S" tests is a couple of percent,
so I can't say that that's real.
I do feel pretty sure that the patch doesn't hurt on this
class of hardware.
regards, tom lane
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-09 18:15:59 |
| Message-ID: | Z4ASX5Zc0GA-1J-Z@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Wed, Jan 08, 2025 at 06:07:44PM -0500, Tom Lane wrote:
> Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
>> AFAICT TAS_SPIN() is only used for s_lock(), which doesn't appear to be
>> used by LWLocks. But I did retry my test from upthread without
>> pg_stat_statements and was surprised to find a reproducible 4-6%
>> regression.
>
> On what hardware?
This was on a c8g.24xlarge (Neoverse-V2, Armv9.0-a) [0].
> I just spent an hour beating on my M4 Pro (the 14-core variant)
> and could not detect any outside-the-noise effect of this patch,
> with or without pg_stat_statements loaded. There does seem to be
> a small fraction-of-a-percent-ish benefit. But the run-to-run
> variation with 60-second "pgbench -S" tests is a couple of percent,
> so I can't say that that's real.
>
> I do feel pretty sure that the patch doesn't hurt on this
> class of hardware.
Great. This matches what I saw on an M3.
--
nathan
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-10 16:51:40 |
| Message-ID: | Z4FQHBrAiFXhE4ZM@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Wed, Jan 08, 2025 at 04:38:02PM -0600, Nathan Bossart wrote:
> On Wed, Jan 08, 2025 at 05:25:24PM -0500, Andres Freund wrote:
>> I wonder if you're hitting an extreme case of binary-layout related effects?
>> I've never seen them at this magnitude though. I'd suggest using either lld
>> or mold as linker and comparing the numbers for a few
>> -Wl,--shuffle-sections=$seed seed values.
>
> Will do.
Actually, I think I may have just had back luck and/or not warmed things up
enough. I just re-ran the test a few dozen times, carefully ensuring the
data was in the cache and periodically alternating between the binary with
the patch applied and the one without it. The results converged to within
1-2% of each other, with the patched version even winning about half the
time. The averages across all the runs showed a ~0.4% regression, which I
suspect is well within the noise range.
--
nathan
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, dipiets(at)amazon(dot)com |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-10 19:19:55 |
| Message-ID: | Z4Fy23e7iCmZi1Dk@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Fri, Jan 10, 2025 at 10:51:40AM -0600, Nathan Bossart wrote:
> Actually, I think I may have just had back luck and/or not warmed things up
> enough. I just re-ran the test a few dozen times, carefully ensuring the
> data was in the cache and periodically alternating between the binary with
> the patch applied and the one without it. The results converged to within
> 1-2% of each other, with the patched version even winning about half the
> time. The averages across all the runs showed a ~0.4% regression, which I
> suspect is well within the noise range.
I went ahead and committed this patch. Please let me know if there are any
remaining concerns.
--
nathan
| From: | Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-15 11:50:38 |
| Message-ID: | 74D40BAB-C23D-497E-8153-67B543663182@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Hi, Nathan.
I just realized that I almost forgot about this thread :)
> The result looks great, but the discussion in [0] shows that the result may
> vary among different ARM chips. Could you provide the chip model of this
> test? So that we can do a cross validation of this patch. Not sure if compiler
> version is necessary too. I'm willing to test it on Alibaba Cloud Yitian 710
> if I have time.
I did some benchmark on Yitian 710.
On c8y.16xlarge (64 cores):
Without the patch:
80.31% postgres [.] __aarch64_swp4_acq
1.77% postgres [.] __aarch64_ldadd4_acq_rel
1.13% postgres [.] hash_search_with_hash_value
0.87% pg_stat_statements.so [.] __aarch64_swp4_acq
0.72% postgres [.] perform_spin_delay
0.44% postgres [.] _bt_compare
tps = 295272.628421 (including connections establishing)
tps = 295335.660323 (excluding connections establishing)
Patched:
9.94% postgres [.] s_lock
6.07% postgres [.] __aarch64_swp4_acq
5.73% postgres [.] hash_search_with_hash_value
2.81% postgres [.] perform_spin_delay
2.29% postgres [.] _bt_compare
2.15% postgres [.] PinBuffer
tps = 864519.764125 (including connections establishing)
tps = 864638.244443 (excluding connections establishing)
Seems that great performance could be gained if s_lock contention is severe.
This may be more likely to happen on bigger machines.
On c8y.2xlarge (8 cores), I failed to make s_lock contended severely, and
as a result this patch didn’t bring any difference outside the noise.
Regards,
Jingtang
| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: use a non-locking initial test in TAS_SPIN on AArch64 |
| Date: | 2025-01-15 18:50:00 |
| Message-ID: | Z4gDWMbR6uqV1mhR@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Wed, Jan 15, 2025 at 07:50:38PM +0800, Jingtang Zhang wrote:
> Seems that great performance could be gained if s_lock contention is severe.
> This may be more likely to happen on bigger machines.
>
> On c8y.2xlarge (8 cores), I failed to make s_lock contended severely, and
> as a result this patch didn´t bring any difference outside the noise.
Thanks for sharing.
--
nathan