Lists: | pgsql-hackers |
---|
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | ANALYZE ONLY |
Date: | 2024-08-20 05:52:12 |
Message-ID: | CADofcAWATx_haD=QkSxHbnTsAe6+e0Aw8Eh4H8cXyogGvn_kOg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello Postgres Hackers
An application that I am developing uses Postgresql, and includes a fairly large
number of partitioned tables which are used to store time series data.
The tables are partitioned by time, and typically there is only one partition
at a time - the current partition - that is actually being updated.
Older partitions
are available for query and eventually dropped.
As per the documentation, partitioned tables are not analyzed by the autovacuum
workers, although their partitions are. Statistics are needed on the partitioned
table level for at least some query planning activities.
The problem is that giving an ANALYZE command targeting a partitioned table
causes it to update statistics for the partitioned table AND all the individual
partitions. There is currently no option to prevent it from including the
partitions.
This is wasteful for our application: for one thing the autovacuum
has already analyzed the individual partitions; for another most of
the partitions
will have had no changes, so they don't need to be analyzed repeatedly.
I took some measurements when running ANALYZE on one of our tables. It
took approx
4 minutes to analyze the partitioned table, then 29 minutes to analyze the
partitions. We have hundreds of these tables, so the cost is very significant.
For my use case at least it would be fantastic if we could add an ONLY option
to ANALYZE, which would cause it to analyze the named table only and not
descend into the partitions.
I took a look at the source and it looks doable, but before launching into it
I thought I would ask a few questions here.
1. Would such a feature be welcomed? Are there any traps I might not
have thought of?
2. The existing ANALYZE command has the following structure:
ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
It would be easiest to add ONLY as another option, but that
doesn't look quite
right to me - surely the ONLY should be attached to the table name?
An alternative would be:
ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]
Any feedback or advice would be great.
Regards
Mike.
From: | Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-20 07:42:43 |
Message-ID: | CAGECzQTq_OPhXvA+VK3zzQ-ZYa97Bg8D+JomjXTuiTs9P6Q4Tw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, 20 Aug 2024 at 07:52, Michael Harris <harmic(at)gmail(dot)com> wrote:
> 1. Would such a feature be welcomed? Are there any traps I might not
> have thought of?
I think this sounds like a reasonable feature.
> 2. The existing ANALYZE command has the following structure:
>
> ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
>
> It would be easiest to add ONLY as another option, but that
> doesn't look quite
> right to me - surely the ONLY should be attached to the table name?
> An alternative would be:
>
> ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]
>
> Any feedback or advice would be great.
Personally I'd prefer a new option to be added. But I agree ONLY isn't
a good name then. Maybe something like SKIP_PARTITIONS.
From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Michael Harris <harmic(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-20 11:25:32 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 20.8.24 10:42, Jelte Fennema-Nio wrote:
> On Tue, 20 Aug 2024 at 07:52, Michael Harris<harmic(at)gmail(dot)com> wrote:
>> 1. Would such a feature be welcomed? Are there any traps I might not
>> have thought of?
> I think this sounds like a reasonable feature.
>
>
>> 2. The existing ANALYZE command has the following structure:
>>
>> ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
>>
>> It would be easiest to add ONLY as another option, but that
>> doesn't look quite
>> right to me - surely the ONLY should be attached to the table name?
>> An alternative would be:
>>
>> ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]
>>
>> Any feedback or advice would be great.
> Personally I'd prefer a new option to be added. But I agree ONLY isn't
> a good name then. Maybe something like SKIP_PARTITIONS.
>
Hi everyone,
Your proposal is indeed interesting, but I have a question: can't your
issue be resolved by properly configuring |autovacuum| instead of
developing a new feature for |ANALYZE|?
From my perspective, |ANALYZE| is intended to forcibly gather
statistics from all partitioned tables. If the goal is to ensure that
statistics are updated at the right moment, adjusting the
|autovacuum_analyze_threshold| and |autovacuum_analyze_scale_factor|
parameters might be the solution.
--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Cc: | Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Michael Harris <harmic(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-20 11:49:41 |
Message-ID: | CAApHDvruUROZVVubW=Qzp8CzB4HEqu_jOuvLd3-nmJHXme+T1A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, 20 Aug 2024 at 23:25, Ilia Evdokimov
<ilya(dot)evdokimov(at)tantorlabs(dot)com> wrote:
> Your proposal is indeed interesting, but I have a question: can't your issue be resolved by properly configuring autovacuum instead of developing a new feature for ANALYZE?
Basically, no. There's a "tip" in [1] which provides information on
the limitation, namely:
"The autovacuum daemon does not issue ANALYZE commands for partitioned
tables. Inheritance parents will only be analyzed if the parent itself
is changed - changes to child tables do not trigger autoanalyze on the
parent table. If your queries require statistics on parent tables for
proper planning, it is necessary to periodically run a manual ANALYZE
on those tables to keep the statistics up to date."
There is also some discussion about removing the limitation in [2].
While I agree that it would be nice to have autovacuum handle this,
it's not clear how exactly it would work. Additionally, if we had
that, it would still be useful if the ANALYZE command could be
instructed to just gather statistics for the partitioned table only.
David
[1] https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-STATISTICS
[2] https://www.postgresql.org/message-id/flat/CAKkQ508_PwVgwJyBY%3D0Lmkz90j8CmWNPUxgHvCUwGhMrouz6UA%40mail.gmail.com
From: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jelte Fennema <postgres(at)jeltef(dot)nl>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-20 16:26:48 |
Message-ID: | CAGPVpCQpVqkFq1ja6EDU6ZLyJxXSRk2yGbdV0-6GGs87Y_3qtA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Michael,
Thanks for starting this thread. I've also spent a bit time on this after
reading your first thread on this issue [1]
Michael Harris <harmic(at)gmail(dot)com>, 20 Ağu 2024 Sal, 08:52 tarihinde şunu
yazdı:
> The problem is that giving an ANALYZE command targeting a partitioned table
> causes it to update statistics for the partitioned table AND all the
> individual
> partitions. There is currently no option to prevent it from including the
> partitions.
>
> This is wasteful for our application: for one thing the autovacuum
> has already analyzed the individual partitions; for another most of
> the partitions
> will have had no changes, so they don't need to be analyzed repeatedly.
>
I agree that it's a waste to analyze partitions when they're already
analyzed by autovacuum. It would be nice to have a way to run analyze only
on a partitioned table without its partitions.
> I took some measurements when running ANALYZE on one of our tables. It
> took approx
> 4 minutes to analyze the partitioned table, then 29 minutes to analyze the
> partitions. We have hundreds of these tables, so the cost is very
> significant.
>
I quickly tweaked the code a bit to exclude partitions when a partitioned
table is being analyzed. I can confirm that there is a significant gain
even on a simple case like a partitioned table with 10 partitions and 1M
rows in each partition.
1. Would such a feature be welcomed? Are there any traps I might not
> have thought of?
>
> 2. The existing ANALYZE command has the following structure:
>
> ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
>
> It would be easiest to add ONLY as another option, but that
> doesn't look quite
> right to me - surely the ONLY should be attached to the table name
> An alternative would be:
>
> ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]
>
I feel closer to adding this as an option instead of a new keyword in
ANALYZE grammar. To me, it would be easier to have this option and then
give the names of partitioned tables as opposed to typing ONLY before each
partition table.
But we should think of another name as ONLY is used differently (attached
to the table name as you mentioned) in other contexts.
I've been also thinking about how this new option should affect inheritance
tables. Should it have just no impact on them or only analyze the parent
table without taking child tables into account? There are two records for
an inheritance parent table in pg_statistic, one row for only the parent
table and a second row including children. We might only analyze the parent
table if this new "ONLY" option is specified. I'm not sure if that would be
something users would need or not, but I think this option should behave
similarly for both partitioned tables and inheritance tables.
If we decide to go with only partition tables and not care about
inheritance, then naming this option to SKIP_PARTITIONS as Jelte suggested
sounds fine. But that name wouldn't work if this option will affect
inheritance tables.
Thanks,
--
Melih Mutlu
Microsoft
From: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-20 16:29:21 |
Message-ID: | CAGPVpCT6iTV=nR-FTeoN=odBDpS+4oYiktaz-KJ+-bUkYWut8w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, 20 Ağu 2024 Sal, 19:26 tarihinde şunu
yazdı:
> Hi Michael,
>
> Thanks for starting this thread. I've also spent a bit time on this after
> reading your first thread on this issue [1]
>
Forgot to add the reference [1]
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-20 18:40:41 |
Message-ID: | CA+Tgmob=UV9SVqhD0Jj3qRcVNEymi=qQ0ywP_XQKC4c_K1CqBw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Aug 20, 2024 at 1:52 AM Michael Harris <harmic(at)gmail(dot)com> wrote:
> 2. The existing ANALYZE command has the following structure:
>
> ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
>
> It would be easiest to add ONLY as another option, but that
> doesn't look quite
> right to me - surely the ONLY should be attached to the table name?
> An alternative would be:
>
> ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]
>
> Any feedback or advice would be great.
I like trying to use ONLY somehow.
--
Robert Haas
EDB: http://www.enterprisedb.com
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-20 22:53:18 |
Message-ID: | CAApHDvrTerg5RJehpp24J8MK3qtYB3SuvoXM7d7O_XQZG3_png@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, 21 Aug 2024 at 06:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I like trying to use ONLY somehow.
Do you mean as an ANALYZE command option, i.e. ANALYZE (only) table;
or as a table modifier like gram.y's extended_relation_expr?
Making it a command option means that the option would apply to all
tables listed, whereas if it was more like an extended_relation_expr,
the option would be applied per table listed in the command.
1. ANALYZE ONLY ptab, ptab2; -- gather stats on ptab but not on its
partitions but get stats on ptab2 and stats on its partitions too.
2. ANALYZE ONLY ptab, ONLY ptab2; -- gather stats on ptab and ptab2
without doing that on any of their partitions.
Whereas: "ANALYZE (ONLY) ptab, ptab2;" would always give you the
behaviour of #2.
If we did it as a per-table option, then we'd need to consider what
should happen if someone did: "VACUUM ONLY parttab;". Probably
silently doing nothing wouldn't be good. Maybe a warning, akin to
what's done in:
postgres=# analyze (skip_locked) a;
WARNING: skipping analyze of "a" --- lock not available
David
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Harris <harmic(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-20 22:56:53 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> On Wed, 21 Aug 2024 at 06:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I like trying to use ONLY somehow.
> Do you mean as an ANALYZE command option, i.e. ANALYZE (only) table;
> or as a table modifier like gram.y's extended_relation_expr?
> Making it a command option means that the option would apply to all
> tables listed, whereas if it was more like an extended_relation_expr,
> the option would be applied per table listed in the command.
> 1. ANALYZE ONLY ptab, ptab2; -- gather stats on ptab but not on its
> partitions but get stats on ptab2 and stats on its partitions too.
> 2. ANALYZE ONLY ptab, ONLY ptab2; -- gather stats on ptab and ptab2
> without doing that on any of their partitions.
FWIW, I think that's the right approach, for consistency with the
way that ONLY works in DML.
regards, tom lane
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-21 02:37:05 |
Message-ID: | CA+TgmobJeWN8rPEPz9t33zWDd5AztxFTvheDFAjUxAdT4Y1ODg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Aug 20, 2024 at 6:53 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Wed, 21 Aug 2024 at 06:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I like trying to use ONLY somehow.
>
> Do you mean as an ANALYZE command option, i.e. ANALYZE (only) table;
> or as a table modifier like gram.y's extended_relation_expr?
The table modifier idea seems nice to me.
> If we did it as a per-table option, then we'd need to consider what
> should happen if someone did: "VACUUM ONLY parttab;". Probably
> silently doing nothing wouldn't be good. Maybe a warning, akin to
> what's done in:
>
> postgres=# analyze (skip_locked) a;
> WARNING: skipping analyze of "a" --- lock not available
Perhaps. I'm not sure about this part.
--
Robert Haas
EDB: http://www.enterprisedb.com
From: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Harris <harmic(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-21 10:04:16 |
Message-ID: | CAGPVpCSGwWQkU5DWoSFzZ3LPYqbWQZzi7cXWHsyYgvnsqWfYhA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
David Rowley <dgrowleyml(at)gmail(dot)com>, 21 Ağu 2024 Çar, 01:53 tarihinde şunu
yazdı:
> On Wed, 21 Aug 2024 at 06:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I like trying to use ONLY somehow.
>
> Do you mean as an ANALYZE command option, i.e. ANALYZE (only) table;
> or as a table modifier like gram.y's extended_relation_expr?
>
> Making it a command option means that the option would apply to all
> tables listed, whereas if it was more like an extended_relation_expr,
> the option would be applied per table listed in the command.
>
> 1. ANALYZE ONLY ptab, ptab2; -- gather stats on ptab but not on its
> partitions but get stats on ptab2 and stats on its partitions too.
> 2. ANALYZE ONLY ptab, ONLY ptab2; -- gather stats on ptab and ptab2
> without doing that on any of their partitions.
>
I believe we should go this route if we want this to be called "ONLY" so
that it would be consistent with other places too.
Whereas: "ANALYZE (ONLY) ptab, ptab2;" would always give you the
> behaviour of #2.
>
Havin it as an option would be easier to use when we have several partition
tables. But I agree that if we call it "ONLY ", it may be confusing and the
other approach would be appropriate.
> If we did it as a per-table option, then we'd need to consider what
> should happen if someone did: "VACUUM ONLY parttab;". Probably
> silently doing nothing wouldn't be good. Maybe a warning, akin to
> what's done in:
>
> postgres=# analyze (skip_locked) a;
> WARNING: skipping analyze of "a" --- lock not available
>
+1 to raising a warning message in that case instead of staying silent. We
might also not allow ONLY if ANALYZE is not present in VACUUM query and
raise an error. But that would require changes in grams.y and could
complicate things. So it may not be necessary and we may be fine with just
a warning.
Regards,
--
Melih Mutlu
Microsoft
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-21 23:32:41 |
Message-ID: | CADofcAUv_-zTraO6jLPjdxyswYFUS0_hLXhsWt1t5i=aVErA4w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thank you all for the replies & discussion.
It sounds like more are in favour of using the ONLY syntax attached to
the tables is the best way to go, with the main advantages being:
- consistency with other commands
- flexibility in allowing to specify whether to include partitions
for individual tables when supplying a list of tables
I will start working on an implementation along those lines. It looks
like we can simply replace qualified_name with relation_expr in the
production for vacuum_relation within gram.y.
One other thing I noticed when reading the code. The function
expand_vacuum_rel in vacuum.c seems to be responsible for adding the
partitions. If I am reading it correctly, it only adds child tables in
the case of a partitioned table, not in the case of an inheritance
parent:
include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE);
..
if (include_parts)
{
.. add partitions ..
This is a little different to some other contexts where the ONLY
keyword is used, in that ONLY would be the default and only available
mode of operation for an inheritance parent.
Regards,
Mike
On Wed, 21 Aug 2024 at 20:04, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>
> David Rowley <dgrowleyml(at)gmail(dot)com>, 21 Ağu 2024 Çar, 01:53 tarihinde şunu yazdı:
>>
>> On Wed, 21 Aug 2024 at 06:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> > I like trying to use ONLY somehow.
>>
>> Do you mean as an ANALYZE command option, i.e. ANALYZE (only) table;
>> or as a table modifier like gram.y's extended_relation_expr?
>>
>> Making it a command option means that the option would apply to all
>> tables listed, whereas if it was more like an extended_relation_expr,
>> the option would be applied per table listed in the command.
>>
>> 1. ANALYZE ONLY ptab, ptab2; -- gather stats on ptab but not on its
>> partitions but get stats on ptab2 and stats on its partitions too.
>> 2. ANALYZE ONLY ptab, ONLY ptab2; -- gather stats on ptab and ptab2
>> without doing that on any of their partitions.
>
>
> I believe we should go this route if we want this to be called "ONLY" so that it would be consistent with other places too.
>
>> Whereas: "ANALYZE (ONLY) ptab, ptab2;" would always give you the
>> behaviour of #2.
>
>
> Havin it as an option would be easier to use when we have several partition tables. But I agree that if we call it "ONLY ", it may be confusing and the other approach would be appropriate.
>
>>
>> If we did it as a per-table option, then we'd need to consider what
>> should happen if someone did: "VACUUM ONLY parttab;". Probably
>> silently doing nothing wouldn't be good. Maybe a warning, akin to
>> what's done in:
>>
>> postgres=# analyze (skip_locked) a;
>> WARNING: skipping analyze of "a" --- lock not available
>
>
> +1 to raising a warning message in that case instead of staying silent. We might also not allow ONLY if ANALYZE is not present in VACUUM query and raise an error. But that would require changes in grams.y and could complicate things. So it may not be necessary and we may be fine with just a warning.
>
> Regards,
> --
> Melih Mutlu
> Microsoft
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-22 00:53:19 |
Message-ID: | CAApHDvqnDMk6e=XHNspo5-NT+Pr2bAzdi+HtBu8gx9S3F5NQUw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, 22 Aug 2024 at 11:32, Michael Harris <harmic(at)gmail(dot)com> wrote:
> One other thing I noticed when reading the code. The function
> expand_vacuum_rel in vacuum.c seems to be responsible for adding the
> partitions. If I am reading it correctly, it only adds child tables in
> the case of a partitioned table, not in the case of an inheritance
> parent:
>
> include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE);
> ..
> if (include_parts)
> {
> .. add partitions ..
>
> This is a little different to some other contexts where the ONLY
> keyword is used, in that ONLY would be the default and only available
> mode of operation for an inheritance parent.
That's inconvenient and quite long-established behaviour. I had a look
as far back as 9.2 and we only analyze parents there too. I'm keen on
the ONLY syntax, but it would be strange if ONLY did the same thing as
not using ONLY for inheritance parents.
I feel like we might need to either bite the bullet and make ONLY work
consistently with both, or think of another way to have ANALYZE not
recursively gather stats for each partition on partitioned tables.
Could we possibly get away with changing inheritance parent behaviour?
David
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-22 01:12:16 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> I feel like we might need to either bite the bullet and make ONLY work
> consistently with both, or think of another way to have ANALYZE not
> recursively gather stats for each partition on partitioned tables.
> Could we possibly get away with changing inheritance parent behaviour?
Yeah, my thought was to change the behavior so it's consistent in
that case too. This doesn't seem to me like a change that'd
really cause anybody serious problems: ANALYZE without ONLY would
do more than before, but it's work you probably want done anyway.
regards, tom lane
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-22 01:37:52 |
Message-ID: | CADofcAW=JDNdPy=dbQK9b4AsZKbqdSDTwg2=fkriq+KbnDZ-DA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, my thought was to change the behavior so it's consistent in
> that case too. This doesn't seem to me like a change that'd
> really cause anybody serious problems: ANALYZE without ONLY would
> do more than before, but it's work you probably want done anyway.
Would we want to apply that change to VACUUM too? That might be a
bit drastic, especially if you had a multi-level inheritance structure featuring
large tables.
It feels a bit like VACUUM and ANALYZE have opposite natural defaults here.
For VACUUM it does not make much sense to vacuum only at the partitioned
table level and not include the partitions, since it would do nothing
- that might
be why the existing code always adds the partitions.
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-22 02:09:12 |
Message-ID: | CAApHDvp4ry6L4dOEU-87R7KAVsPmxySxCJBtDVXN6hJPg1ic=w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, 22 Aug 2024 at 13:38, Michael Harris <harmic(at)gmail(dot)com> wrote:
> Would we want to apply that change to VACUUM too? That might be a
> bit drastic, especially if you had a multi-level inheritance structure featuring
> large tables.
I think they'd need to work the same way as for "VACUUM (ANALYZE)", it
would be strange to analyze some tables that you didn't vacuum. It's
just a much bigger pill to swallow in terms of the additional effort.
> It feels a bit like VACUUM and ANALYZE have opposite natural defaults here.
> For VACUUM it does not make much sense to vacuum only at the partitioned
> table level and not include the partitions, since it would do nothing
> - that might
> be why the existing code always adds the partitions.
Yeah, I suspect that's exactly why it was coded that way.
David
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-22 08:52:30 |
Message-ID: | CADofcAWWfNTyqfzCvP8_AEDCk_h35DPMmrAXAMYLK5H1s0EbeA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi All,
Here is a first draft of a patch to implement the ONLY option for
VACUUM and ANALYZE.
I'm a little nervous about the implications of changing the behaviour of VACUUM
for inheritance structures; I can imagine people having regularly
executed scripts
that currently vacuum all the tables in their inheritance structure;
after this change
they might get more vacuuming than they bargained for.
It's my first attempt to submit a patch here so apologies if I've
missed any part
of the process.
Cheers
Mike
On Thu, 22 Aug 2024 at 12:09, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Thu, 22 Aug 2024 at 13:38, Michael Harris <harmic(at)gmail(dot)com> wrote:
> > Would we want to apply that change to VACUUM too? That might be a
> > bit drastic, especially if you had a multi-level inheritance structure featuring
> > large tables.
>
> I think they'd need to work the same way as for "VACUUM (ANALYZE)", it
> would be strange to analyze some tables that you didn't vacuum. It's
> just a much bigger pill to swallow in terms of the additional effort.
>
> > It feels a bit like VACUUM and ANALYZE have opposite natural defaults here.
> > For VACUUM it does not make much sense to vacuum only at the partitioned
> > table level and not include the partitions, since it would do nothing
> > - that might
> > be why the existing code always adds the partitions.
>
> Yeah, I suspect that's exactly why it was coded that way.
>
> David
Attachment | Content-Type | Size |
---|---|---|
0001-Initial-implementation-of-the-ONLY-feature.patch | application/octet-stream | 11.8 KB |
From: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-22 11:26:11 |
Message-ID: | CAGPVpCT2_fCP-GgObXNOkMXEvKU2L6Gud2ABpGfRQ8--9ed=WA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Michael,
Thanks for the patch.
I quickly tried running some ANALYZE ONLY queries, it seems like it works
fine.
-ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] )
> ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
> +ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] )
> ] [ [ ONLY ] <replaceable class="parameter">table_and_columns</replaceable>
> [, ...] ]
It seems like extended_relation_expr allows "tablename *" syntax too. That
should be added in docs as well. (Same for VACUUM doc)
<para>
> For partitioned tables, <command>ANALYZE</command> gathers statistics
> by
> sampling rows from all partitions; in addition, it will recurse into
> each
> partition and update its statistics. Each leaf partition is analyzed
> only
> once, even with multi-level partitioning. No statistics are collected
> for
> only the parent table (without data from its partitions), because with
> partitioning it's guaranteed to be empty.
> </para>
We may also want to update the above note in ANALYZE doc.
+-- ANALYZE ONLY / VACUUM ONLY on partitioned table
> +CREATE TABLE only_parted (a int, b char) PARTITION BY LIST (a);
> +CREATE TABLE only_parted1 PARTITION OF vacparted FOR VALUES IN (1);
+INSERT INTO only_parted1 VALUES (1, 'a');
>
Tests don't seem right to me.
I believe the above should be " PARTITION OF only_parted " instead of
vacparted.
It may better to insert into partitioned table (only_parted) instead of the
partition (only_parted1);
Also it may be a good idea to test VACUUM ONLY for inheritance tables the
same way you test ANALYZE ONLY.
Lastly, the patch includes an unrelated file (compile_flags.txt) and has
whitespace errors when I apply it.
Regards,
--
Melih Mutlu
Microsoft
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-23 10:01:33 |
Message-ID: | CADofcAUoDTFT0YfcEb=bfZyvMPUJ2cs+cTNdD9VMMjYLixkZEQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thanks for the feedback Melih,
On Thu, 22 Aug 2024 at 21:26, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> It seems like extended_relation_expr allows "tablename *" syntax too. That should be added in docs as well. (Same for VACUUM doc)
I had included it in the parameter description but had missed it from
the synopsis. I've fixed that now.
> We may also want to update the above note in ANALYZE doc.
Well spotted. I have updated that now.
> Tests don't seem right to me.
> I believe the above should be " PARTITION OF only_parted " instead of vacparted.
> It may better to insert into partitioned table (only_parted) instead of the partition (only_parted1);
>
> Also it may be a good idea to test VACUUM ONLY for inheritance tables the same way you test ANALYZE ONLY.
Well spotted again. I have fixed the incorrect table names and added
tests as suggested.
> Lastly, the patch includes an unrelated file (compile_flags.txt) and has whitespace errors when I apply it.
Oops! Fixed,
V2 of the patch is attached.
Cheers
Mike
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Initial-implementation-of-the-ONLY-feature.patch | application/octet-stream | 14.5 KB |
From: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-29 11:30:27 |
Message-ID: | CAGPVpCTDU-j7c81trGtqyjQTzZrNTAr60DxwdOUL6G_gg41_5A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Michael,
Michael Harris <harmic(at)gmail(dot)com>, 23 Ağu 2024 Cum, 13:01 tarihinde şunu
yazdı:
> V2 of the patch is attached.
>
Thanks for updating the patch. I have a few more minor feedbacks.
-ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] )
> ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
> +ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] )
> ] [ [ ONLY ] <replaceable class="parameter">table_and_columns</replaceable>
> [, ...] ]
I believe moving "[ ONLY ]" to the definitions of table_and_columns below,
as you did with "[ * ]", might be better to be consistent with other places
(see [1])
+ if ((options & VACOPT_VACUUM) && is_partitioned_table && ! i
> nclude_children)
There are also some issues with coding conventions in some places (e.g. the
space between "!" and "include_children" abode). I think running pgindent
would resolve such issue in most places.
[1] https://www.postgresql.org/docs/16/sql-createpublication.html
Regards,
--
Melih Mutlu
Microsoft
From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-30 06:45:55 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Michael,
On 2024-08-23 19:01, Michael Harris wrote:
> V2 of the patch is attached.
Thanks for the proposal and the patch.
You said this patch was a first draft version, so these may be too minor
comments, but I will share them:
-- https://www.postgresql.org/docs/devel/progress-reporting.html
> Note that when ANALYZE is run on a partitioned table, all of its
> partitions are also recursively analyzed.
Should we also note this is the default, i.e. not using ONLY option
behavior here?
-- https://www.postgresql.org/docs/devel/ddl-partitioning.html
> If you are using manual VACUUM or ANALYZE commands, don't forget that
> you need to run them on each child table individually. A command like:
>
> ANALYZE measurement;
> will only process the root table.
This part also should be modified, shouldn't it?
When running ANALYZE VERBOSE ONLY on a partition table, the INFO message
is like this:
=# ANALYZE VERBOSE ONLY only_parted;
INFO: analyzing "public.only_parted" inheritance tree
I may be wrong, but 'inheritance tree' makes me feel it includes child
tables.
Removing 'inheritance tree' and just describing the table name as below
might be better:
INFO: analyzing "public.only_parted"
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-30 08:19:12 |
Message-ID: | CACJufxHY9nOecqeJr8WsrvtHG+8BTOOXqpM7E4OuCCEcrBC7qg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Aug 29, 2024 at 7:31 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>
> Hi Michael,
>
> Michael Harris <harmic(at)gmail(dot)com>, 23 Ağu 2024 Cum, 13:01 tarihinde şunu yazdı:
>>
>> V2 of the patch is attached.
>
>
> Thanks for updating the patch. I have a few more minor feedbacks.
>
>> -ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
>> +ANALYZE [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ [ ONLY ] <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
>
>
> I believe moving "[ ONLY ]" to the definitions of table_and_columns below, as you did with "[ * ]", might be better to be consistent with other places (see [1])
>
>> + if ((options & VACOPT_VACUUM) && is_partitioned_table && ! include_children)
>
>
I think you are right.
ANALYZE [ ( option [, ...] ) ] [ [ ONLY ] table_and_columns [, ...] ]
seems not explain commands like:
ANALYZE ONLY only_parted(a), ONLY only_parted(b);
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-08-30 08:43:56 |
Message-ID: | CAApHDvrpy8hYB4spyL0ZRNh1hF9+h29tfUK7tr90W8awn9rfpQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Michael,
On Fri, 23 Aug 2024 at 22:01, Michael Harris <harmic(at)gmail(dot)com> wrote:
> V2 of the patch is attached.
(I understand this is your first PostgreSQL patch)
I just wanted to make sure you knew about the commitfest cycle we use
for the development of PostgreSQL. If you've not got one already, can
you register a community account. That'll allow you to include this
patch in the September commitfest that starts on the 1st. See
https://commitfest.postgresql.org/49/
I understand there's now some sort of cool-off period for new
community accounts, so if you have trouble adding the patch before the
CF starts, let me know and I should be able to do it for you. The
commitfest normally closes for new patches around 12:00 UTC on the 1st
of the month.
There's a bit more information in
https://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission
David
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-01 01:31:48 |
Message-ID: | CADofcAWmRMdP81OuP+Gm-BKmSPKu1Bxx6w-pY7GE8qT_f_mX=w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello Atsushi & Melih
Thank you both for your further feedback.
On Thu, 29 Aug 2024 at 21:31, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> I believe moving "[ ONLY ]" to the definitions of table_and_columns below, as you did with "[ * ]", might be better to be consistent with other places (see [1])
Agreed. I have updated this.
>> + if ((options & VACOPT_VACUUM) && is_partitioned_table && ! include_children)
>
> There are also some issues with coding conventions in some places (e.g. the space between "!" and "include_children" abode). I think running pgindent would resolve such issue in most places.
I fixed that extra space, and then ran pgindent. It did not report any
more problems.
On Fri, 30 Aug 2024 at 16:45, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
> -- https://www.postgresql.org/docs/devel/progress-reporting.html
> > Note that when ANALYZE is run on a partitioned table, all of its
> > partitions are also recursively analyzed.
>
> Should we also note this is the default, i.e. not using ONLY option
> behavior here?
> -- https://www.postgresql.org/docs/devel/ddl-partitioning.html
> > If you are using manual VACUUM or ANALYZE commands, don't forget that
> > you need to run them on each child table individually. A command like:
> >
> > ANALYZE measurement;
> > will only process the root table.
>
> This part also should be modified, shouldn't it?
Agreed. I have updated both of these.
> When running ANALYZE VERBOSE ONLY on a partition table, the INFO message
> is like this:
>
> =# ANALYZE VERBOSE ONLY only_parted;
> INFO: analyzing "public.only_parted" inheritance tree
>
> I may be wrong, but 'inheritance tree' makes me feel it includes child
> tables.
> Removing 'inheritance tree' and just describing the table name as below
> might be better:
>
> INFO: analyzing "public.only_parted"
I'm not sure about that one. If I understand the source correctly,
that particular progress
message tells the user that the system is gathering stats from the inheritance
tree in order to update the stats of the given table, not that it is
actually updating
the stats of the descendant tables.
When analyzing an inheritance structure with the ONLY you see
something like this:
=> ANALYZE VERBOSE ONLY only_inh_parent;
INFO: analyzing "public.only_inh_parent"
INFO: "only_inh_parent": scanned 0 of 0 pages, containing 0 live rows
and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.only_inh_parent" inheritance tree
INFO: "only_inh_child": scanned 1 of 1 pages, containing 3 live rows
and 0 dead rows; 3 rows in sample, 3 estimated total rows
ANALYZE
The reason you don't see the first one for partitioned tables is that
it corresponds
to sampling the contents of the parent table itself, which in the case
of a partitioned
table is guaranteed to be empty, so it is skipped.
I agree the text could be confusing, and in fact is probably confusing
even today
without the ONLY keyword, but I'm not sure what would be the best
wording to cover both the partitioned and inherited cases.
v3 of the patch is attached, and I will submit it to the commitfest.
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Implementation-of-the-ONLY-feature.patch | application/octet-stream | 15.5 KB |
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-01 01:41:45 |
Message-ID: | CADofcAX32oX5WNW5tS7Ne2ibXykKy1rxBrLZ=neTejduW=ky0w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thanks David
I had not read that wiki page well enough, so many thanks for alerting
me that I had to create a commitfest entry. I already had a community
account so I was able to login and create it here:
https://commitfest.postgresql.org/49/5226/
I was not sure what to put for some of the fields - for 'reviewer' should
I have put the people who have provided feedback on this thread?
Is there anything else I need to do?
Thanks again
Cheers
Mike
On Fri, 30 Aug 2024 at 18:44, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> Hi Michael,
>
> On Fri, 23 Aug 2024 at 22:01, Michael Harris <harmic(at)gmail(dot)com> wrote:
> > V2 of the patch is attached.
>
> (I understand this is your first PostgreSQL patch)
>
> I just wanted to make sure you knew about the commitfest cycle we use
> for the development of PostgreSQL. If you've not got one already, can
> you register a community account. That'll allow you to include this
> patch in the September commitfest that starts on the 1st. See
> https://commitfest.postgresql.org/49/
>
> I understand there's now some sort of cool-off period for new
> community accounts, so if you have trouble adding the patch before the
> CF starts, let me know and I should be able to do it for you. The
> commitfest normally closes for new patches around 12:00 UTC on the 1st
> of the month.
>
> There's a bit more information in
> https://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission
>
> David
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-01 09:31:18 |
Message-ID: | CAApHDvqQMkED6MAcY+BCkpuZc3uurDDhdZtb-Y7X=zmhVLbtxA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, 1 Sept 2024 at 13:41, Michael Harris <harmic(at)gmail(dot)com> wrote:
> https://commitfest.postgresql.org/49/5226/
>
> I was not sure what to put for some of the fields - for 'reviewer' should
> I have put the people who have provided feedback on this thread?
I think it's fairly common that only reviewers either add themselves
or don't bother. I don't think it's common that patch authors add
reviewers. Some reviewers like their reviews to be more informal and
putting themselves as a reviewer can put other people off reviewing as
they think it's already handled by someone else. That may result in
the patch being neglected by reviewers.
> Is there anything else I need to do?
I see you set the target version as 17. That should be blank or "18".
PG17 is about to go into release candidate, so it is too late for
that. It's been fairly consistent that we open for new patches in
early July and close before the 2nd week of April the following year.
We're currently 2 months into PG18 dev. We don't back-patch new
features.
Nothing else aside from continuing to address reviews, as you are already.
David
From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-02 03:29:37 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2024-09-01 10:31, Michael Harris wrote:
> Hello Atsushi & Melih
>
> Thank you both for your further feedback.
>
> On Thu, 29 Aug 2024 at 21:31, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
> wrote:
>> I believe moving "[ ONLY ]" to the definitions of table_and_columns
>> below, as you did with "[ * ]", might be better to be consistent with
>> other places (see [1])
>
> Agreed. I have updated this.
>
>>> + if ((options & VACOPT_VACUUM) && is_partitioned_table && !
>>> include_children)
>>
>> There are also some issues with coding conventions in some places
>> (e.g. the space between "!" and "include_children" abode). I think
>> running pgindent would resolve such issue in most places.
>
> I fixed that extra space, and then ran pgindent. It did not report any
> more problems.
>
> On Fri, 30 Aug 2024 at 16:45, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> wrote:
>> -- https://www.postgresql.org/docs/devel/progress-reporting.html
>> > Note that when ANALYZE is run on a partitioned table, all of its
>> > partitions are also recursively analyzed.
>>
>> Should we also note this is the default, i.e. not using ONLY option
>> behavior here?
>
>> -- https://www.postgresql.org/docs/devel/ddl-partitioning.html
>> > If you are using manual VACUUM or ANALYZE commands, don't forget that
>> > you need to run them on each child table individually. A command like:
>> >
>> > ANALYZE measurement;
>> > will only process the root table.
>>
>> This part also should be modified, shouldn't it?
>
> Agreed. I have updated both of these.
Thanks!
>> When running ANALYZE VERBOSE ONLY on a partition table, the INFO
>> message
>> is like this:
>>
>> =# ANALYZE VERBOSE ONLY only_parted;
>> INFO: analyzing "public.only_parted" inheritance tree
>>
>> I may be wrong, but 'inheritance tree' makes me feel it includes child
>> tables.
>> Removing 'inheritance tree' and just describing the table name as
>> below
>> might be better:
>>
>> INFO: analyzing "public.only_parted"
>
> I'm not sure about that one. If I understand the source correctly,
> that particular progress
> message tells the user that the system is gathering stats from the
> inheritance
> tree in order to update the stats of the given table, not that it is
> actually updating
> the stats of the descendant tables.
That makes sense.
> When analyzing an inheritance structure with the ONLY you see
> something like this:
>
> => ANALYZE VERBOSE ONLY only_inh_parent;
> INFO: analyzing "public.only_inh_parent"
> INFO: "only_inh_parent": scanned 0 of 0 pages, containing 0 live rows
> and 0 dead rows; 0 rows in sample, 0 estimated total rows
> INFO: analyzing "public.only_inh_parent" inheritance tree
> INFO: "only_inh_child": scanned 1 of 1 pages, containing 3 live rows
> and 0 dead rows; 3 rows in sample, 3 estimated total rows
> ANALYZE
>
> The reason you don't see the first one for partitioned tables is that
> it corresponds
> to sampling the contents of the parent table itself, which in the case
> of a partitioned
> table is guaranteed to be empty, so it is skipped.
>
> I agree the text could be confusing, and in fact is probably confusing
> even today
> without the ONLY keyword,
Yeah, it seems this isn't dependent on your proposal.
(BTW I'm also wondering if the expression “inheritance" is appropriate
when the target is a partitioned table, but this should be discussed in
another thread)
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-09 01:27:09 |
Message-ID: | CAApHDvrYF_6xjfq+SOq61sf4CmZUD+JPVgsoefA+CK5uVUpW2Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, 1 Sept 2024 at 13:32, Michael Harris <harmic(at)gmail(dot)com> wrote:
> v3 of the patch is attached, and I will submit it to the commitfest.
I think this patch is pretty good.
I only have a few things I'd point out:
1. The change to ddl.sgml, technically you're removing the caveat, but
I think the choice you've made to mention the updated behaviour is
likely a good idea still. So I agree with this change, but just wanted
to mention it as someone else might think otherwise.
2. I think there's some mixing of tense in the changes to analyze.sgml:
"If <literal>ONLY</literal> was not specified, it will also recurse
into each partition and update its statistics."
You've written "was" (past tense), but then the existing text uses
"will" (future tense). I guess if the point in time is after parse and
before work has been done, then that's correct, but I think using "is"
instead of "was" is better.
3. In vacuum.sgml;
"If <literal>ONLY</literal> is not specified, the table and all its
descendant tables or partitions (if any) are vacuumed"
Maybe "are also vacuumed" instead of "are vacuumed" is more clear?
It's certainly true for inheritance parents, but I guess you could
argue that's wrong for partitioned tables.
4. A very minor detail, but I think in vacuum.c the WARNING you've
added should use RelationGetRelationName(). We seem to be very
inconsistent with using that macro and I see it's not used just above
for the lock warning, which I imagine you copied.
Aside from those, that just leaves me with the behavioural change. I
noted Tom was ok with the change in behaviour for ANALYZE (mentioned
in [1]). Tom, wondering if you feel the same for VACUUM too? If we're
doing this, I think we'd need to be quite clear about it on the
release notes.
David
[1] https://postgr.es/m/[email protected]
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-09 07:56:54 |
Message-ID: | CADofcAVYsvj=RYdBcpqreC-4yc474n-3noyuqzzmq9VagTWS2A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thanks for the feedback David.
On Mon, 9 Sept 2024 at 11:27, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> You've written "was" (past tense), but then the existing text uses
> "will" (future tense). I guess if the point in time is after parse and
> before work has been done, then that's correct, but I think using "is"
> instead of "was" is better.
> Maybe "are also vacuumed" instead of "are vacuumed" is more clear?
Agreed. I have updated my patch with both of these suggestions.
> 4. A very minor detail, but I think in vacuum.c the WARNING you've
> added should use RelationGetRelationName(). We seem to be very
> inconsistent with using that macro and I see it's not used just above
> for the lock warning, which I imagine you copied.
As far as I can tell RelationGetRelationName is for extracting the name
from a Relation struct, but in this case we have a RangeVar so it doesn't appear
to be applicable. I could not find an equivalent access macro for RangeVar.
Thanks again.
Cheers
Mike
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Implementation-of-the-ONLY-feature.patch | application/octet-stream | 15.5 KB |
From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-10 12:03:40 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2024-09-09 16:56, Michael Harris wrote:
Thanks for updating the patch.
Here is a minor comment.
> @@ -944,20 +948,32 @@ expand_vacuum_rel(VacuumRelation *vrel,
> MemoryContext vac_context,
> MemoryContextSwitchTo(oldcontext);
> }
..
> + * Unless the user has specified ONLY, make relation list
> entries for
> + * its partitions and/or descendant tables.
Regarding the addition of partition descendant tables, should we also
update the below comment on expand_vacuum_rel? Currently it refers only
partitions:
| * Given a VacuumRelation, fill in the table OID if it wasn't
specified,
| * and optionally add VacuumRelations for partitions of the table.
Other than this and the following, it looks good to me.
On Mon, Sep 9, 2024 at 10:27 AM David Rowley <dgrowleyml(at)gmail(dot)com>
wrote:
> Aside from those, that just leaves me with the behavioural change. I
> noted Tom was ok with the change in behaviour for ANALYZE (mentioned
> in [1]). Tom, wondering if you feel the same for VACUUM too? If we're
> doing this, I think we'd need to be quite clear about it on the
> release notes.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-11 07:47:09 |
Message-ID: | CADofcAXaB7Ouo2i9jpK0HesHQFBa=KhCx0uKTwEc2MAjcwVJUQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thanks for the feedback.
On Tue, 10 Sept 2024 at 22:03, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
> Regarding the addition of partition descendant tables, should we also
> update the below comment on expand_vacuum_rel? Currently it refers only
> partitions:
>
> | * Given a VacuumRelation, fill in the table OID if it wasn't
> specified,
> | * and optionally add VacuumRelations for partitions of the table.
>
Well spotted! I have updated the comment to read:
* Given a VacuumRelation, fill in the table OID if it wasn't specified,
* and optionally add VacuumRelations for partitions or descendant tables
* of the table.
Cheers
Mike.
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Implementation-of-the-ONLY-feature.patch | application/octet-stream | 16.0 KB |
From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-13 01:50:39 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2024-09-11 16:47, Michael Harris wrote:
> Thanks for the feedback.
>
> On Tue, 10 Sept 2024 at 22:03, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> wrote:
>> Regarding the addition of partition descendant tables, should we also
>> update the below comment on expand_vacuum_rel? Currently it refers
>> only
>> partitions:
>>
>> | * Given a VacuumRelation, fill in the table OID if it wasn't
>> specified,
>> | * and optionally add VacuumRelations for partitions of the table.
>>
>
> Well spotted! I have updated the comment to read:
>
> * Given a VacuumRelation, fill in the table OID if it wasn't
> specified,
> * and optionally add VacuumRelations for partitions or descendant
> tables
> * of the table.
Thanks for the update!
I've switched the status on the commitfest to 'ready for committer'.
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-16 02:29:47 |
Message-ID: | CACJufxEFOhe5FQqoYhatB=kRtc9aA5HzCVXnu_-tnv+31Jh2Hw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
hi.
in https://www.postgresql.org/docs/current/ddl-inherit.html
<<<
Commands that do database maintenance and tuning (e.g., REINDEX,
VACUUM) typically only work on individual, physical tables and do not
support recursing over inheritance hierarchies. The respective
behavior of each individual command is documented in its reference
page (SQL Commands).
<<<
does the above para need some tweaks?
in section, your patch output is
<<<<<<<<<
and table_and_columns is:
[ ONLY ] table_name [ * ] [ ( column_name [, ...] ) ]
<<<<<<<<<
ANALYZE ONLY only_parted*;
will fail.
Maybe we need a sentence saying ONLY and * are mutually exclusive?
>>>
If the table being analyzed has inheritance children, ANALYZE gathers
two sets of statistics: one on the rows of the parent table only, and
a second including rows of both the parent table and all of its
children. This second set of statistics is needed when planning
queries that process the inheritance tree as a whole. The child tables
themselves are not individually analyzed in this case.
>>>
is this still true for table inheritance.
for example:
drop table if exists only_inh_parent,only_inh_child;
CREATE TABLE only_inh_parent (a int , b INT) with (autovacuum_enabled = false);
CREATE TABLE only_inh_child () INHERITS (only_inh_parent) with
(autovacuum_enabled = false);
INSERT INTO only_inh_child(a,b) select g % 80, (g + 1) % 200 from
generate_series(1,1000) g;
select pg_table_size('only_inh_parent');
ANALYZE ONLY only_inh_parent;
here, will "ANALYZE ONLY only_inh_parent;" will have minimal "effects".
since the physical table only_inh_parent has zero storage.
but
select stadistinct,starelid::regclass,staattnum, stainherit
from pg_statistic
where starelid = ANY ('{only_inh_parent, only_inh_child}'::regclass[]);
output is
stadistinct | starelid | staattnum | stainherit
-------------+-----------------+-----------+------------
80 | only_inh_parent | 1 | t
-0.2 | only_inh_parent | 2 | t
I am not sure if this output and the manual description about table
inheritance is consistent.
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-20 01:20:24 |
Message-ID: | CADofcAW43AD=qqtj1cLkTyRpPM6JB5ZALUK7CA1KZZqpcouoYw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thanks for the feedback, and sorry it has taken a few days to respond.
On Mon, 16 Sept 2024 at 12:29, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> in https://www.postgresql.org/docs/current/ddl-inherit.html
> <<<
> Commands that do database maintenance and tuning (e.g., REINDEX,
> VACUUM) typically only work on individual, physical tables and do not
> support recursing over inheritance hierarchies. The respective
> behavior of each individual command is documented in its reference
> page (SQL Commands).
> <<<
> does the above para need some tweaks?
Yes, good catch.
> ANALYZE ONLY only_parted*;
> will fail.
> Maybe we need a sentence saying ONLY and * are mutually exclusive?
I used the same text that appears on other pages that are describing the
operation of ONLY / *, eg. the page for TRUNCATE
(https://www.postgresql.org/docs/current/sql-truncate.html)
I think it would be good to keep them all consistent if possible.
> >>>
> If the table being analyzed has inheritance children, ANALYZE gathers
> two sets of statistics: one on the rows of the parent table only, and
> a second including rows of both the parent table and all of its
> children. This second set of statistics is needed when planning
> queries that process the inheritance tree as a whole. The child tables
> themselves are not individually analyzed in this case.
> >>>
> is this still true for table inheritance.
The way I interpret this is that when you analyze an inheritance parent,
it will sample the rows of the parent and all its children. It will
prepare two sets
of stats, one based on the samples of any rows in the parent itself,
the other based on samples of rows in both parent and child tables.
This is distinct from the activity of updating the stats on the child
tables themselves.
> I am not sure if this output and the manual description about table
> inheritance is consistent.
With the test you performed, the result was a set of statistics for the
parent table which included samples from the child tables
(stainherit=t) but no entries for the parent table only
(stainherit=f) as there is no data in the parent table itself.
There are no statistics for only_inh_child.
If you add some records to the parent table and re-analyze, you
do get statistics with stainherit=f:
postgres=# insert into only_inh_parent(a,b) select g % 10, 100 from
generate_series(1,1000) g;
INSERT 0 1000
postgres=# select pg_table_size('only_inh_parent');
pg_table_size
---------------
65536
(1 row)
postgres=# ANALYZE ONLY only_inh_parent;
ANALYZE
postgres=# select stadistinct,starelid::regclass,staattnum, stainherit
from pg_statistic
where starelid = ANY ('{only_inh_parent, only_inh_child}'::regclass[]);
stadistinct | starelid | staattnum | stainherit
-------------+-----------------+-----------+------------
80 | only_inh_parent | 1 | t
200 | only_inh_parent | 2 | t
10 | only_inh_parent | 1 | f
1 | only_inh_parent | 2 | f
(4 rows)
and if you perform ANALYZE without ONLY:
postgres=# ANALYZE only_inh_parent;
ANALYZE
postgres=# select stadistinct,starelid::regclass,staattnum, stainherit
from pg_statistic
where starelid = ANY ('{only_inh_parent, only_inh_child}'::regclass[]);
stadistinct | starelid | staattnum | stainherit
-------------+-----------------+-----------+------------
80 | only_inh_parent | 1 | t
10 | only_inh_parent | 1 | f
1 | only_inh_parent | 2 | f
200 | only_inh_parent | 2 | t
80 | only_inh_child | 1 | f
-0.2 | only_inh_child | 2 | f
(6 rows)
Now we also have statistics for only_inh_child.
At least for me, that is consistent with the para you
quoted, except perhaps for this sentence:
>> The child tables themselves are not individually analyzed in this case.
This should probably read:
>> If the ONLY keyword is used, the child tables themselves are not individually analyzed.
I have attached a new version of the patch with this feedback incorporated.
Thanks again!
Cheers
Mike.
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Implementation-of-the-ONLY-feature.patch | application/octet-stream | 18.4 KB |
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Michael Harris <harmic(at)gmail(dot)com> |
Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-22 13:09:09 |
Message-ID: | CAApHDvpd-0aQkWibMew+dOVFwXGd=LSsiS9ZxmMDnrX+RLfU+Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, 20 Sept 2024 at 13:20, Michael Harris <harmic(at)gmail(dot)com> wrote:
> I have attached a new version of the patch with this feedback incorporated.
I looked over the v6 patch and I don't have any complaints. However, I
did make some minor adjustments:
* A few places said things like "and possibly partitions and/or child
tables thereof". It's not possible for a partition to have inheritance
children, so "and/" is wrong there, only "or" is possible.
* I spent a bit of time trying to massage the new text into the
documents. I just felt that sometimes the new text was a little bit
clumsy, for example:
<command>ANALYZE</command> gathers two sets of statistics: one on the rows
of the parent table only, and a second including rows of both the parent
table and all of its children. This second set of statistics is
needed when
- planning queries that process the inheritance tree as a whole. The child
- tables themselves are not individually analyzed in this case.
- The autovacuum daemon, however, will only consider inserts or
- updates on the parent table itself when deciding whether to trigger an
- automatic analyze for that table. If that table is rarely inserted into
- or updated, the inheritance statistics will not be up to date unless you
- run <command>ANALYZE</command> manually.
+ planning queries that process the inheritance tree as a whole. If the
+ <literal>ONLY</literal> keyword is used, child tables themselves are not
+ individually analyzed. The autovacuum daemon, however, will only consider
+ inserts or updates on the parent table itself when deciding whether to
+ trigger an automatic analyze for that table. If that table is rarely
+ inserted into or updated, the inheritance statistics will not be up to
+ date unless you run <command>ANALYZE</command> manually.
I kinda blame the existing text which reads "The child tables
themselves are not individually analyzed in this case." as that seems
to have led you to detail the new behaviour in the same place, but I
think that we really need to finish talking about autovacuum maybe not
analyzing the inheritance parent unless it receives enough changes
itself before we talk about what ONLY does.
* Very minor adjustments to the tests to upper case all the keywords
"is not null as" was all lowercase. I wrapped some longer lines too.
Also, I made some comment adjustments and I dropped the partitioned
table directly after we're done with it instead of waiting until after
the inheritance parent tests.
* A bunch of uses of "descendant tables" when you meant "inheritance children"
v7-0001 is the same as your v6 patch, but I adjusted the commit
message, which I'm happy to take feedback on. Nobody I've spoken to
seems to be concerned about VACUUM on inheritance parents now being
recursive by default. I included "release notes" and "incompatibility"
in the commit message in the hope that Bruce will stumble upon it and
write about this when he does the release notes for v18.
v7-0002 is all my changes.
I'd like to push this soon, so if anyone has any last-minute feedback,
please let me know in the next few days.
David
Attachment | Content-Type | Size |
---|---|---|
v7-0001-Add-ONLY-support-for-VACUUM-and-ANALYZE.patch | application/octet-stream | 19.6 KB |
v7-0002-fixup-Implementation-of-the-ONLY-feature-for-ANAL.patch | application/octet-stream | 20.2 KB |
From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-23 01:41:52 |
Message-ID: | CADofcAUdxr_S1JtQm6e6iMaeV_1-758-bX3PMLH9pbtbpFq3Bg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, 22 Sept 2024 at 23:09, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I'd like to push this soon, so if anyone has any last-minute feedback,
> please let me know in the next few days.
Many thanks for your updates and assistance.
Looks good. Agreed, I was probably too conservative in some of the
doc updates.
Thanks to all reviewers for helping to get it to this stage!
Cheers
Mike
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-23 03:28:00 |
Message-ID: | CACJufxGQ5DyvC85ESRti5Coau=vbxKNV-PTcDByZr8GwVS0bBQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Sep 22, 2024 at 9:09 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> v7-0002 is all my changes.
>
> I'd like to push this soon, so if anyone has any last-minute feedback,
> please let me know in the next few days.
>
drop table if exists only_inh_parent,only_inh_child;
CREATE TABLE only_inh_parent (a int , b INT) with (autovacuum_enabled = false);
CREATE TABLE only_inh_child () INHERITS (only_inh_parent) with
(autovacuum_enabled = false);
INSERT INTO only_inh_child(a,b) select g % 80, (g + 1) % 200 from
generate_series(1,1000) g;
select pg_table_size('only_inh_parent');
ANALYZE ONLY only_inh_parent;
select stadistinct,starelid::regclass,staattnum, stainherit
from pg_statistic
where starelid = ANY ('{only_inh_parent, only_inh_child}'::regclass[]);
stadistinct | starelid | staattnum | stainherit
-------------+-----------------+-----------+------------
80 | only_inh_parent | 1 | t
-0.2 | only_inh_parent | 2 | t
---------------
catalog-pg-statistic.html
stainherit bool
If true, the stats include values from child tables, not just the
values in the specified relation
Normally there is one entry, with stainherit = false, for each table
column that has been analyzed. If the table has inheritance children
or partitions, a second entry with stainherit = true is also created.
This row represents the column's statistics over the inheritance tree,
i.e., statistics for the data you'd see with SELECT column FROM
table*, whereas the stainherit = false row represents the results of
SELECT column FROM ONLY table.
I do understand what Michael Harris said in [1]
---------------
Given the above context, I am still confused with this sentence in
sql-analyze.html.
"If ONLY is specified before the table name, only that table is analyzed."
like in the above sql example, only_inh_parent's child is also being analyzed.
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-23 04:46:21 |
Message-ID: | CAApHDvrKY5KGwND7bJbUyTDLmpi64O8Ye1CUSHCJEvaUy0LH4Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, 23 Sept 2024 at 15:29, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> Given the above context, I am still confused with this sentence in
> sql-analyze.html.
> "If ONLY is specified before the table name, only that table is analyzed."
>
> like in the above sql example, only_inh_parent's child is also being analyzed.
I guess it depends what you define "analyzed" to mean. In this
context, it means gathering statistics specifically for a certain
table.
Would it be more clear if "only that table is analyzed." was changed
to "then statistics are only gathered specifically for that table."?
David
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-23 07:39:15 |
Message-ID: | CACJufxFeHWD_NKAchTErSu_ZxORv8YdsoeVeOdiMQ+rkdz-BTQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Sep 23, 2024 at 12:46 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Mon, 23 Sept 2024 at 15:29, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> > Given the above context, I am still confused with this sentence in
> > sql-analyze.html.
> > "If ONLY is specified before the table name, only that table is analyzed."
> >
> > like in the above sql example, only_inh_parent's child is also being analyzed.
>
> I guess it depends what you define "analyzed" to mean. In this
> context, it means gathering statistics specifically for a certain
> table.
>
> Would it be more clear if "only that table is analyzed." was changed
> to "then statistics are only gathered specifically for that table."?
>
looking at expand_vacuum_rel, analyze_rel.
if we
---------
if (onerel->rd_rel->relhassubclass)
do_analyze_rel(onerel, params, va_cols, acquirefunc, relpages,
true, in_outer_xact, elevel);
change to
if (onerel->rd_rel->relhassubclass && ((!relation ||
relation->inh) || onerel->rd_rel->relkind ==
RELKIND_PARTITIONED_TABLE))
do_analyze_rel(onerel, params, va_cols, acquirefunc, relpages,
true, in_outer_xact, elevel);
then the inheritance table will behave the way the doc says.
for example:
drop table if exists only_inh_parent,only_inh_child;
CREATE TABLE only_inh_parent (a int , b INT) with (autovacuum_enabled = false);
CREATE TABLE only_inh_child () INHERITS (only_inh_parent) with
(autovacuum_enabled = false);
INSERT INTO only_inh_child(a,b) select g % 80, (g + 1) % 200 from
generate_series(1,1000) g;
ANALYZE ONLY only_inh_parent;
select stadistinct,starelid::regclass,staattnum, stainherit
from pg_statistic
where starelid = ANY ('{only_inh_parent, only_inh_child}'::regclass[]);
will return zero rows, since the physical table only_inh_parent has no storage.
sql-analyze.html
For partitioned tables, ANALYZE gathers statistics by sampling rows
from all partitions. Each leaf partition is also recursively analyzed
and the statistics updated. This recursive part may be disabled by
using the ONLY keyword, otherwise, leaf partitions are analyzed only
once, even with multi-level partitioning. No statistics are collected
for only the parent table (without data from its partitions), because
with partitioning it's guaranteed to be empty.
allow me to ask anenglish language question.
here "otherwise" means specify ONLY or not?
As far as i understand.
if you specify ONLY, postgres will only do "For partitioned tables,
ANALYZE gathers statistics by sampling rows from all partitions"
if you not specify ONLY, postgres will do "For partitioned tables,
ANALYZE gathers statistics by sampling rows from all partitions *AND*
also recursively analyze each leaf partition"
Is my understanding correct?
I think there is a whitespace error in "ANALYZE ONLY vacparted(a,b); "
in vacuum.out.
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-23 10:04:09 |
Message-ID: | CAApHDvq2vikCtDEkSa6DVYUT6ZaPSXyhmG4EGefpjSFk4Zj-ow@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, 23 Sept 2024 at 19:39, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> sql-analyze.html
> For partitioned tables, ANALYZE gathers statistics by sampling rows
> from all partitions. Each leaf partition is also recursively analyzed
> and the statistics updated. This recursive part may be disabled by
> using the ONLY keyword, otherwise, leaf partitions are analyzed only
> once, even with multi-level partitioning. No statistics are collected
> for only the parent table (without data from its partitions), because
> with partitioning it's guaranteed to be empty.
>
> allow me to ask anenglish language question.
> here "otherwise" means specify ONLY or not?
> As far as i understand.
> if you specify ONLY, postgres will only do "For partitioned tables,
> ANALYZE gathers statistics by sampling rows from all partitions"
> if you not specify ONLY, postgres will do "For partitioned tables,
> ANALYZE gathers statistics by sampling rows from all partitions *AND*
> also recursively analyze each leaf partition"
>
> Is my understanding correct?
The "Otherwise" case applies when "ONLY" isn't used.
If this is confusing, I think there's a bunch of detail that I tried
to keep that's just not that useful. The part about analyzing
partitions just once and the part about not collecting non-inheritance
stats for the partitioned table seems like extra detail that's either
obvious or just not that important.
Can you have a look at the attached and let me know if it's easier to
understand now?
David
Attachment | Content-Type | Size |
---|---|---|
v8-0001-Add-ONLY-support-for-VACUUM-and-ANALYZE.patch | application/octet-stream | 19.6 KB |
v8-0002-fixup-Implementation-of-the-ONLY-feature-for-ANAL.patch | application/octet-stream | 20.0 KB |
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-23 11:23:18 |
Message-ID: | CACJufxH_hEeNmkKcV9gxhnVPf-h_PDxEKZJAHxCanxrJ8wmr8g@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Sep 23, 2024 at 6:04 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
>
> If this is confusing, I think there's a bunch of detail that I tried
> to keep that's just not that useful. The part about analyzing
> partitions just once and the part about not collecting non-inheritance
> stats for the partitioned table seems like extra detail that's either
> obvious or just not that important.
>
> Can you have a look at the attached and let me know if it's easier to
> understand now?
>
Now the regress test passed.
<para>
For partitioned tables, <command>ANALYZE</command> gathers statistics by
sampling rows from all partitions. By default,
<command>ANALYZE</command> will also recursively collect and update the
statistics for each partition. The <literal>ONLY</literal> keyword may be
used to disable this.
</para>
is very clear to me!
<para>
If the table being analyzed has inheritance children,
<command>ANALYZE</command> gathers two sets of statistics: one on the rows
of the parent table only, and a second including rows of both the parent
table and all of its children. This second set of statistics is needed when
planning queries that process the inheritance tree as a whole. The
autovacuum daemon, however, will only consider inserts or updates on the
parent table itself when deciding whether to trigger an automatic analyze
for that table. If that table is rarely inserted into or updated, the
inheritance statistics will not be up to date unless you run
<command>ANALYZE</command> manually. By default,
<command>ANALYZE</command> will also recursively collect and update the
statistics for each inheritance child table. The <literal>ONLY</literal>
keyword may be used to disable this.
</para>
looks fine. but maybe we can add the following information
"if The <literal>ONLY</literal> is specified, the second set of
statistics won't include each children individual statistics"
I think that's the main difference between specifying ONLY or not?
catalog-pg-statistic.html second paragraph seems very clear to me.
Maybe we can link it somehow
Other than that, it looks good to me.
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-23 11:53:42 |
Message-ID: | CAApHDvpsq0oMDVQHmbDO4yMsZeVGEiPr2uo_DwVcoKk3Tvv35Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, 23 Sept 2024 at 23:23, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> looks fine. but maybe we can add the following information
> "if The <literal>ONLY</literal> is specified, the second set of
> statistics won't include each children individual statistics"
> I think that's the main difference between specifying ONLY or not?
Ok, I think you're not understanding this yet and I'm not sure what I
can make more clear in the documents.
Let me explain... For inheritance parent tables, ANALYZE ONLY will
gather inheritance and non-inheritance statistics for ONLY the parent.
Here's an example of that:
drop table if exists parent,child;
create table parent(a int);
create table child () inherits (parent);
insert into parent values(1);
insert into child values(1);
analyze ONLY parent;
select starelid::regclass,stainherit,stadistinct from pg_statistic
where starelid in ('parent'::regclass,'child'::regclass);
starelid | stainherit | stadistinct
----------+------------+-------------
parent | f | -1 <- this is the distinct estimate
for SELECT * FROM ONLY parent;
parent | t | -0.5 <- this is the distinct estimate
for SELECT * FROM parent;
(2 rows)
For the stainherit==false stats, only 1 row is sampled here as that's
the only row directly located in the "parent" table.
For the stainherit==true stats, 2 rows are sampled, both of them have
"a" == 1. The stadistinct reflects that fact.
Note there have been no statistics recorded for "child". However,
analyze did sample rows in that table as part of gathering sample rows
for "parent" for the stainherit==true row.
Now let's try again without ONLY.
analyze parent;
select starelid::regclass,stainherit,stadistinct from pg_statistic
where starelid in ('parent'::regclass,'child'::regclass);
starelid | stainherit | stadistinct
----------+------------+-------------
parent | f | -1
parent | t | -0.5
child | f | -1
(3 rows)
All of the above rows were re-calculated with the "analyze parent"
command, the first two rows have the same values as nothing changed in
the table, however, there are now statistics stored for the "child"
table.
> catalog-pg-statistic.html second paragraph seems very clear to me.
> Maybe we can link it somehow
I don't know what "link it" means in this context.
David
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-24 01:00:15 |
Message-ID: | CACJufxGPeivYnJjGpQf5YP-ANEBTsas5-jx4eknXkFbj7g4c_w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Sep 23, 2024 at 7:53 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Mon, 23 Sept 2024 at 23:23, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> > looks fine. but maybe we can add the following information
> > "if The <literal>ONLY</literal> is specified, the second set of
> > statistics won't include each children individual statistics"
> > I think that's the main difference between specifying ONLY or not?
>
> Ok, I think you're not understanding this yet and I'm not sure what I
> can make more clear in the documents.
>
> Let me explain... For inheritance parent tables, ANALYZE ONLY will
> gather inheritance and non-inheritance statistics for ONLY the parent.
>
> Here's an example of that:
>
> drop table if exists parent,child;
> create table parent(a int);
> create table child () inherits (parent);
> insert into parent values(1);
> insert into child values(1);
>
> analyze ONLY parent;
> select starelid::regclass,stainherit,stadistinct from pg_statistic
> where starelid in ('parent'::regclass,'child'::regclass);
> starelid | stainherit | stadistinct
> ----------+------------+-------------
> parent | f | -1 <- this is the distinct estimate
> for SELECT * FROM ONLY parent;
> parent | t | -0.5 <- this is the distinct estimate
> for SELECT * FROM parent;
> (2 rows)
>
> For the stainherit==false stats, only 1 row is sampled here as that's
> the only row directly located in the "parent" table.
> For the stainherit==true stats, 2 rows are sampled, both of them have
> "a" == 1. The stadistinct reflects that fact.
>
> Note there have been no statistics recorded for "child". However,
> analyze did sample rows in that table as part of gathering sample rows
> for "parent" for the stainherit==true row.
>
> Now let's try again without ONLY.
>
> analyze parent;
>
> select starelid::regclass,stainherit,stadistinct from pg_statistic
> where starelid in ('parent'::regclass,'child'::regclass);
> starelid | stainherit | stadistinct
> ----------+------------+-------------
> parent | f | -1
> parent | t | -0.5
> child | f | -1
> (3 rows)
>
> All of the above rows were re-calculated with the "analyze parent"
> command, the first two rows have the same values as nothing changed in
> the table, however, there are now statistics stored for the "child"
> table.
thanks for your explanation!
now I don't have any questions about this patch.
> > catalog-pg-statistic.html second paragraph seems very clear to me.
> > Maybe we can link it somehow
>
> I don't know what "link it" means in this context.
>
i mean, change to:
By default,
<command>ANALYZE</command> will also recursively collect and update the
statistics for each inheritance child table. The <literal>ONLY</literal>
keyword may be used to disable this.
You may also refer to catalog <link
linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
description about <literal>stainherit</literal>.
but <link linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
already mentioned once.
maybe not a good idea.
From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com |
Subject: | Re: ANALYZE ONLY |
Date: | 2024-09-24 06:05:42 |
Message-ID: | CAApHDvq_wTEMYPk_h-Vo_LU=3hRfQvEJJthea-fmm1AQSJV-TA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, 23 Sept 2024 at 22:04, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Can you have a look at the attached and let me know if it's easier to
> understand now?
Pushed.
David