pg_stat_statements: improve loading and saving routines for the dump file

Lists: pgsql-hackers
From: m(dot)litsarev(at)postgrespro(dot)ru
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: pg_stat_statements: improve loading and saving routines for the dump file
Date: 2025-01-20 13:49:45
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

Currently in pg_stat_statements save/load routines the whole pgssEntry
entity data are written/read with its last field
slock_t mutex;
which is actually not used then.
This small patch fixes this issue. Hope, it will be useful.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
v1-0001-Exclude-pgssEntry-mutex-for-rw-dump-file.patch text/x-diff 2.3 KB

From: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_stat_statements: improve loading and saving routines for the dump file
Date: 2025-01-20 16:14:07
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, Mikhail.

1) I'd add to comment a reason, why mutex should be last.

// Mutex should be last field, as this field isn't copied to dump file

+ /* protects the counters only. Should be the very last field, as this field isn't copied to dump file
+ slock_t mutex;
} pgssEntry;

2) You didn't take into account the upgrade. Saved in Postgres with this
byte and try to load in version without this byte.

On 1/20/25 16:49, m(dot)litsarev(at)postgrespro(dot)ru wrote:
> Hi!
>
> Currently in pg_stat_statements save/load routines the whole pgssEntry
> entity data are written/read with its last field
> slock_t        mutex;
> which is actually not used then.
> This small patch fixes this issue. Hope, it will be useful.
>
> Respectfully,
>
> Mikhail Litsarev,
> Postgres Professional: https://postgrespro.com

--
Best wishes,
Ivan Kush
Tantor Labs LLC


From: m(dot)litsarev(at)postgrespro(dot)ru
To: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_stat_statements: improve loading and saving routines for the dump file
Date: 2025-01-21 10:51:22
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> // Mutex should be last field, as this field isn't copied to dump file
Updated.

> 2) You didn't take into account the upgrade. Saved in Postgres with
> this byte and try to load in version without this byte.
The PGSS_DUMP_FILE format is defined by PGSS_FILE_HEADER magic number
(the first four bytes in the dump file). Once they does not coinside
(when the file is being read) the statistics are not loaded from the
dump file and set to zero and the irrelevant dump file is skipped. This
is the actual behaviour implemented in the code.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
v2-0001-Exclude-pgssEntry-mutex-for-rw-dump-file.patch text/x-diff 2.4 KB

From: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
To: m(dot)litsarev(at)postgrespro(dot)ru
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_stat_statements: improve loading and saving routines for the dump file
Date: 2025-01-21 15:46:41
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

What does this patch give on aglobal scale? Does it save much memory or
increase performance? How many times?

On 1/21/25 13:51, m(dot)litsarev(at)postgrespro(dot)ru wrote:
>> // Mutex should be last field, as this field isn't copied to dump file
> Updated.
>
>> 2) You didn't take into account the upgrade. Saved in Postgres with
>> this byte and try to load in version without this byte.
> The PGSS_DUMP_FILE format is defined by PGSS_FILE_HEADER magic number
> (the first four bytes in the dump file). Once they does not coinside
> (when the file is being read) the statistics are not loaded from the
> dump file and set to zero and the irrelevant dump file is skipped.
> This is the actual behaviour implemented in the code.
>
> Respectfully,
>
> Mikhail Litsarev,
> Postgres Professional: https://postgrespro.com

--
Best wishes,
Ivan Kush
Tantor Labs LLC


From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>
Cc: m(dot)litsarev(at)postgrespro(dot)ru, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_stat_statements: improve loading and saving routines for the dump file
Date: 2025-01-21 21:22:08
Message-ID: CAA5RZ0u_Jt7LgTXDcbjyjeosSni2MEjhm=vr368GkB_Qk4aqzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This will only reduce the size of the
$PGDATA/pg_stat/pg_stat_statements.txt file. Even with
100k entries, the most I have seen pg_stat_statements.max
set to, that will be less than 1 MB of disk saving. The default
config of 5k entries will be much less.

Regards,

Sami


From: m(dot)litsarev(at)postgrespro(dot)ru
To: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, samimseih(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_stat_statements: improve loading and saving routines for the dump file
Date: 2025-01-22 13:54:40
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> What does this patch give on aglobal scale? Does it save much memory or
> increase performance? How many times?

This patch makes the code semantically more correct and we don't lose
anything. It is obviously not about performance or memory optimisation.

> This will only reduce the size of the
> $PGDATA/pg_stat/pg_stat_statements.txt file. Even with
> 100k entries, the most I have seen pg_stat_statements.max
> set to, that will be less than 1 MB of disk saving. The default
> config of 5k entries will be much less.

Perfectly agree. I would just add that statistics are dropped into the
pg_stat_statements.txt file at server stop which is not a very frequent
event.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: m(dot)litsarev(at)postgrespro(dot)ru
Cc: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, samimseih(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_stat_statements: improve loading and saving routines for the dump file
Date: 2025-04-03 22:32:12
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

m(dot)litsarev(at)postgrespro(dot)ru writes:
>> What does this patch give on aglobal scale? Does it save much memory or
>> increase performance? How many times?

> This patch makes the code semantically more correct and we don't lose
> anything. It is obviously not about performance or memory optimisation.

Indeed not: on my machine I see sizeof(pgssEntry) = 432. It's full of
int64 fields, so the alignment requirement is 8 bytes, meaning that
the mutex field accounts for 8 bytes even though it's likely just 1
or 4 bytes wide. Still, that means what you suggest is only going
to save 8/432 = 1.8% of the on-disk size of the struct. Given that
we also store the SQL query text for each entry, the net savings
fraction is even smaller.

I don't really agree that this is adding any semantic correctness
either. If we were reading directly into a live hashtable entry,
overwriting the mutex field could indeed be bad. But the fread()
call is reading into a local variable "temp" and we only copy
selected fields out of that. So it's just some bytes in a
transient stack entry.

On the whole therefore, I'm inclined to reject this on several
grounds:

* The savings is not worth the costs of bumping the stats file format
magic number (and thereby invalidating everyone's stored statistics).

* I'd rather keep the flexibility to put the mutex wherever we want in
the struct. Right now that isn't worth much either, but depending on
what fields get added in future, we might have the opportunity to move
the mutex to someplace where it fits into padding space and hence adds
nothing to sizeof(pgssEntry).

* Adding the requirement on where the mutex is makes the code more
fragile, since somebody might ignore the comment and place things
incorrectly. I'd like to think that such an error would be spotted
immediately, but we've committed sillier mistakes. The consequences
would likely be that some fields don't get stored/reloaded correctly,
which might escape notice for awhile.

regards, tom lane


From: m(dot)litsarev(at)postgrespro(dot)ru
To: Sami Imseih <samimseih(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ivan Kush <ivan(dot)kush(at)tantorlabs(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_stat_statements: improve loading and saving routines for the dump file
Date: 2025-04-04 08:25:58
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

Thank you for detailed explanations.

Respectfully,
Mikhail Litsarev,
Postgres Professional: https://postgrespro.com