Quick Links

Re: why there is not VACUUM FULL CONCURRENTLY?

Lists:	pgsql-hackers

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-30 08:01:57
Message-ID:	CAFj8pRDK89FtY_yyGw7-MW-zTaHOCY4m6qfLRittdoPocz+dMQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

I have one question, what is a block of implementation of some variant of
VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
CONCURRENTLY cannot be used for VACUUM FULL?

Regards

Pavel

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-30 08:14:07
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 30, 2024 at 09:01:57AM +0100, Pavel Stehule wrote:
> I have one question, what is a block of implementation of some variant of
> VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
> CONCURRENTLY cannot be used for VACUUM FULL?

You may be interested in these threads:
https://www.postgresql.org/message-id/CAB7nPqTGmNUFi%2BW6F1iwmf7J-o6sY%2Bxxo6Yb%3DmkUVYT-CG-B5A%40mail.gmail.com
https://www.postgresql.org/message-id/CAB7nPqTys6JUQDxUczbJb0BNW0kPrW8WdZuk11KaxQq6o98PJg%40mail.gmail.com

VACUUM FULL is CLUSTER under the hoods. One may question whether it
is still a relevant discussion these days if we assume that autovacuum
is able to keep up, because it always keeps up with the house cleanup,
right? ;)

More seriously, we have a lot more options these days with VACUUM like
PARALLEL, so CONCURRENTLY may still have some uses, but the new toys
available may have changed things. So, would it be worth the
complexities around heap manipulations that lower locks would require?
--
Michael

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-30 08:31:12
Message-ID:	CAFj8pRDNDOrg90hLMmbo_hiWpgBm+73gmWMRUHRkTKwrGnvYJQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

út 30. 1. 2024 v 9:14 odesílatel Michael Paquier <michael(at)paquier(dot)xyz>
napsal:

> On Tue, Jan 30, 2024 at 09:01:57AM +0100, Pavel Stehule wrote:
> > I have one question, what is a block of implementation of some variant of
> > VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
> > CONCURRENTLY cannot be used for VACUUM FULL?
>
> You may be interested in these threads:
>
> https://www.postgresql.org/message-id/CAB7nPqTGmNUFi%2BW6F1iwmf7J-o6sY%2Bxxo6Yb%3DmkUVYT-CG-B5A%40mail.gmail.com
>
> https://www.postgresql.org/message-id/CAB7nPqTys6JUQDxUczbJb0BNW0kPrW8WdZuk11KaxQq6o98PJg%40mail.gmail.com
>
> VACUUM FULL is CLUSTER under the hoods. One may question whether it
> is still a relevant discussion these days if we assume that autovacuum
> is able to keep up, because it always keeps up with the house cleanup,
> right? ;)
>
> More seriously, we have a lot more options these days with VACUUM like
> PARALLEL, so CONCURRENTLY may still have some uses, but the new toys
> available may have changed things. So, would it be worth the
> complexities around heap manipulations that lower locks would require?
>

One of my customer today is reducing one table from 140GB to 20GB. Now he
is able to run archiving. He should play with pg_repack, and it is working
well today, but I ask myself, what pg_repack does not be hard to do
internally because it should be done for REINDEX CONCURRENTLY. This is not
a common task, and not will be, but on the other hand, it can be nice to
have feature, and maybe not too hard to implement today. But I didn't try it

I'll read the threads

Pavel

--
> Michael
>

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-30 10:31:17
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Jan-30, Pavel Stehule wrote:

> One of my customer today is reducing one table from 140GB to 20GB. Now he
> is able to run archiving. He should play with pg_repack, and it is working
> well today, but I ask myself, what pg_repack does not be hard to do
> internally because it should be done for REINDEX CONCURRENTLY. This is not
> a common task, and not will be, but on the other hand, it can be nice to
> have feature, and maybe not too hard to implement today. But I didn't try it

FWIW a newer, more modern and more trustworthy alternative to pg_repack
is pg_squeeze, which I discovered almost by random chance, and soon
discovered I liked it much more.

So thinking about your question, I think it might be possible to
integrate a tool that works like pg_squeeze, such that it runs when
VACUUM is invoked -- either under some new option, or just replace the
code under FULL, not sure. If the Cybertec people allows it, we could
just grab the pg_squeeze code and add it to the things that VACUUM can
run.

Now, pg_squeeze has some additional features, such as periodic
"squeezing" of tables. In a first attempt, for simplicity, I would
leave that stuff out and just allow it to run from the user invoking it,
and then have the command to do a single run. (The scheduling features
could be added later, or somehow integrated into autovacuum, or maybe
something else.)

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"We're here to devour each other alive" (Hobbes)

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-30 10:47:49
Message-ID:	CAFj8pRCRhhHQOKwEubyyBUeA_hyhOyV-QxpVw5-wnce+S8qyXA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

út 30. 1. 2024 v 11:31 odesílatel Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
napsal:

> On 2024-Jan-30, Pavel Stehule wrote:
>
> > One of my customer today is reducing one table from 140GB to 20GB. Now
> he
> > is able to run archiving. He should play with pg_repack, and it is
> working
> > well today, but I ask myself, what pg_repack does not be hard to do
> > internally because it should be done for REINDEX CONCURRENTLY. This is
> not
> > a common task, and not will be, but on the other hand, it can be nice to
> > have feature, and maybe not too hard to implement today. But I didn't
> try it
>
> FWIW a newer, more modern and more trustworthy alternative to pg_repack
> is pg_squeeze, which I discovered almost by random chance, and soon
> discovered I liked it much more.
>
> So thinking about your question, I think it might be possible to
> integrate a tool that works like pg_squeeze, such that it runs when
> VACUUM is invoked -- either under some new option, or just replace the
> code under FULL, not sure. If the Cybertec people allows it, we could
> just grab the pg_squeeze code and add it to the things that VACUUM can
> run.
>
> Now, pg_squeeze has some additional features, such as periodic
> "squeezing" of tables. In a first attempt, for simplicity, I would
> leave that stuff out and just allow it to run from the user invoking it,
> and then have the command to do a single run. (The scheduling features
> could be added later, or somehow integrated into autovacuum, or maybe
> something else.)
>

some basic variant (without autovacuum support) can be good enough. We have
no autovacuum support for REINDEX CONCURRENTLY and I don't see a necessity
for it (sure, it can be limited by my perspective) . The necessity of
reducing table size is not too common (a lot of use cases are better
covered by using partitioning), but sometimes it is, and then buildin
simple available solution can be helpful.

> --
> Álvaro Herrera PostgreSQL Developer —
> https://www.EnterpriseDB.com/
> "We're here to devour each other alive" (Hobbes)
>

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-30 11:37:12
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Jan-30, Pavel Stehule wrote:

> some basic variant (without autovacuum support) can be good enough. We have
> no autovacuum support for REINDEX CONCURRENTLY and I don't see a necessity
> for it (sure, it can be limited by my perspective) . The necessity of
> reducing table size is not too common (a lot of use cases are better
> covered by using partitioning), but sometimes it is, and then buildin
> simple available solution can be helpful.

That's my thinking as well.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-30 23:06:17
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 30, 2024 at 12:37:12PM +0100, Alvaro Herrera wrote:
> On 2024-Jan-30, Pavel Stehule wrote:
>
> > some basic variant (without autovacuum support) can be good enough. We have
> > no autovacuum support for REINDEX CONCURRENTLY and I don't see a necessity
> > for it (sure, it can be limited by my perspective) . The necessity of
> > reducing table size is not too common (a lot of use cases are better
> > covered by using partitioning), but sometimes it is, and then buildin
> > simple available solution can be helpful.
>
> That's my thinking as well.

Or, yes, I'd agree about that. This can make for a much better user
experience. I'm just not sure how that stuff would be shaped and how
much ground it would need to cover.
--
Michael

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-31 09:55:13
Message-ID:	5186.1706694913@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2024-Jan-30, Pavel Stehule wrote:
>
> > One of my customer today is reducing one table from 140GB to 20GB. Now he
> > is able to run archiving. He should play with pg_repack, and it is working
> > well today, but I ask myself, what pg_repack does not be hard to do
> > internally because it should be done for REINDEX CONCURRENTLY. This is not
> > a common task, and not will be, but on the other hand, it can be nice to
> > have feature, and maybe not too hard to implement today. But I didn't try it
>
> FWIW a newer, more modern and more trustworthy alternative to pg_repack
> is pg_squeeze, which I discovered almost by random chance, and soon
> discovered I liked it much more.
>
> So thinking about your question, I think it might be possible to
> integrate a tool that works like pg_squeeze, such that it runs when
> VACUUM is invoked -- either under some new option, or just replace the
> code under FULL, not sure. If the Cybertec people allows it, we could
> just grab the pg_squeeze code and add it to the things that VACUUM can
> run.

There are no objections from Cybertec. Nevertheless, I don't expect much code
to be just copy & pasted. If I started to implement the extension today, I'd
do some things in a different way. (Some things might actually be simpler in
the core, i.e. a few small changes in PG core are easier than the related
workarounds in the extension.)

The core idea is that: 1) a "historic snapshot" is used to get the current
contents of the table, 2) logical decoding is used to capture the changes done
while the data is being copied to new storage, 3) the exclusive lock on the
table is only taken for very short time, to swap the storage (relfilenode) of
the table.

I think it should be coded in a way that allows use by VACUUM FULL, CLUSTER,
and possibly some subcommands of ALTER TABLE. For example, some users of
pg_squeeze requested an enhancement that allows the user to change column data
type w/o service disruption (typically when it appears that integer type is
going to overflow and change bigint is needed).

Online (re)partitioning could be another use case, although I admit that
commands that change the system catalog are a bit harder to implement than
VACUUM FULL / CLUSTER.

One thing that pg_squeeze does not handle is visibility: it uses heap_insert()
to insert the tuples into the new storage, so the problems described in [1]
can appear. The in-core implementation should rather do something like tuple
rewriting (rewriteheap.c).

Is your plan to work on it soon or should I try to write a draft patch? (I
assume this is for PG >= 18.)

[1] https://www.postgresql.org/docs/current/mvcc-caveats.html

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-01-31 10:04:24
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is great to hear.

On 2024-Jan-31, Antonin Houska wrote:

> Is your plan to work on it soon or should I try to write a draft patch? (I
> assume this is for PG >= 18.)

I don't have plans for it, so if you have resources, please go for it.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-02-16 17:43:29
Message-ID:	6347.1708105409@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> This is great to hear.
>
> On 2024-Jan-31, Antonin Houska wrote:
>
> > Is your plan to work on it soon or should I try to write a draft patch? (I
> > assume this is for PG >= 18.)
>
> I don't have plans for it, so if you have resources, please go for it.

ok, I'm thinking how can the feature be integrated into the core.

BTW, I'm failing to understand why cluster_rel() has no argument of the
BufferAccessStrategy type. According to buffer/README, the criterion for using
specific strategy is that page "is unlikely to be needed again
soon". Specifically for cluster_rel(), the page will *definitely* not be used
again (unless the VACCUM FULL/CLUSTER command fails): BufferTag contains the
relatin file number and the old relation file is eventually dropped.

Am I missing anything?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-03-07 17:56:46
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Feb-16, Antonin Houska wrote:

> BTW, I'm failing to understand why cluster_rel() has no argument of the
> BufferAccessStrategy type. According to buffer/README, the criterion for using
> specific strategy is that page "is unlikely to be needed again
> soon". Specifically for cluster_rel(), the page will *definitely* not be used
> again (unless the VACCUM FULL/CLUSTER command fails): BufferTag contains the
> relatin file number and the old relation file is eventually dropped.
>
> Am I missing anything?

No, that's just an oversight. Access strategies are newer than that
cluster code.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
sources, so let's move on." (Nathaniel Smith)
https://mail.gnu.org/archive/html/monotone-devel/2007-01/msg00080.html

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-09 15:55:58
Message-ID:	82651.1720540558@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> > Is your plan to work on it soon or should I try to write a draft patch? (I
> > assume this is for PG >= 18.)
>
> I don't have plans for it, so if you have resources, please go for it.

The first version is attached. The actual feature is in 0003. 0004 is probably
not necessary now, but I haven't realized until I coded it.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v01-0001-Adjust-signature-of-cluster_rel-and-its-subroutines.patch	text/x-diff	14.9 KB
v01-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch	text/x-diff	6.4 KB
unknown_filename	text/plain	195.4 KB
v01-0004-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	24.9 KB

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-09 16:03:29
Message-ID:	83298.1720541009@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> > > Is your plan to work on it soon or should I try to write a draft patch? (I
> > > assume this is for PG >= 18.)
> >
> > I don't have plans for it, so if you have resources, please go for it.
>
> The first version is attached. The actual feature is in 0003. 0004 is probably
> not necessary now, but I haven't realized until I coded it.

The mailing list archive indicates something is wrong with the 0003
attachment. Sending it all again, as *.tar.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
vacuum_full_concurrently_v01.tar	application/x-tar	245.0 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-09 17:30:48
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Jul-09, Antonin Houska wrote:

Thank you, this is great. I'll be studying this during the next
commitfest.

BTW I can apply 0003 from this email perfectly fine, but you're right
that the archives don't show the file name. I suspect the
"Content-Disposition: inline" PLUS the Content-Type text/plain are what
cause the problem -- for instance, [1] doesn't have a problem and they
do have inline content disposition, but the content-type is not
text/plain. In any case, I encourage you not to send patches as
tarballs :-)

[1] https://postgr.es/m/32781.1714378236@antos

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-10 06:56:27
Message-ID:	2668.1720594587@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2024-Jul-09, Antonin Houska wrote:
>
> > Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > > > Is your plan to work on it soon or should I try to write a draft patch? (I
> > > > assume this is for PG >= 18.)
> > >
> > > I don't have plans for it, so if you have resources, please go for it.
> >
> > The first version is attached. The actual feature is in 0003. 0004 is probably
> > not necessary now, but I haven't realized until I coded it.
>
> Thank you, this is great. I'll be studying this during the next
> commitfest.

Thanks. I'll register it in the CF application.

> BTW I can apply 0003 from this email perfectly fine, but you're right
> that the archives don't show the file name. I suspect the
> "Content-Disposition: inline" PLUS the Content-Type text/plain are what
> cause the problem -- for instance, [1] doesn't have a problem and they
> do have inline content disposition, but the content-type is not
> text/plain. In any case, I encourage you not to send patches as
> tarballs :-)
>
> [1] https://postgr.es/m/32781.1714378236@antos

You're right, "Content-Disposition" is the problem. I forgot that "attachment"
is better for patches and my email client (emacs+nmh) defaults to
"inline". I'll pay attention next time.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-21 15:13:11
Message-ID:	CALdSSPig+9SVK34EN33v6-iFh17FFNLxW0cpHToX=miNLDqodg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!
I'm interested in the vacuum concurrently feature being inside the
core, so will try to review patch set and give valuable feedback. For
now, just a few little thoughts..

> The first version is attached. The actual feature is in 0003. 0004 is probably
> not necessary now, but I haven't realized until I coded it.

The logical replication vacuum approach is a really smart idea, I like
it. As far as I understand, pg_squeeze works well in real production
databases, which
gives us hope that the vacuum concurrent feature in core will be good
too... What is the size of the biggest relation successfully vacuumed
via pg_squeeze?
Looks like in case of big relartion or high insertion load,
replication may lag and never catch up...

However, in general, the 3rd patch is really big, very hard to
comprehend. Please consider splitting this into smaller (and
reviewable) pieces.
Also, we obviously need more tests on this. Both tap-test and
regression tests I suppose.

One more thing is about pg_squeeze background workers. They act in an
autovacuum-like fashion, aren't they? Maybe we can support this kind
of relation processing in core too?

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-21 15:23:24
Message-ID:	CAFj8pRD2zGwNX_LKy_uMGiqRfRPrk9mb0mqEVFsyhaePyL8B0g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

ne 21. 7. 2024 v 17:13 odesílatel Kirill Reshke <reshkekirill(at)gmail(dot)com>
napsal:

> Hi!
> I'm interested in the vacuum concurrently feature being inside the
> core, so will try to review patch set and give valuable feedback. For
> now, just a few little thoughts..
>
>
>
> One more thing is about pg_squeeze background workers. They act in an
> autovacuum-like fashion, aren't they? Maybe we can support this kind
> of relation processing in core too?
>

I don't think it is necessary when this feature will be an internal
feature.

I agree so this feature is very important, I proposed it (and I very happy
so Tonda implemented it), but I am not sure, if usage of this should be
automatized, and if it should be, then

a) probably autovacuum should do,
b) we can move a discussion after vacuum full concurrently will be merged
to upstream, please. Isn't very practical to have too many open targets.

Regards

Pavel

From:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-22 08:23:03
Message-ID:	CALdSSPj=Mcsp-xq6_CqGmgi49LZN6wZ_kL0ZNVZiV=cdFgkDRA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Also, we obviously need more tests on this. Both tap-test and
> regression tests I suppose.

The one simple test to this patch can be done this way:

1) create test relation (call it vac_conc_r1 for example) and fill it
with dead tuples (insert + update or insert + delete)
2) create injection point preventing concurrent vacuum from compiling.
3) run concurrent vacuum (VACUUM FULL CONCURRENTLY) in separate thread
or in some other async way.
4) Insert new data in relation to vac_conc_r1.
5) Release injection point, assert that vacuum completed successfully.
6) check that all data is present in vac_conc_r1 (data from step 1 and
from step 4).

This way we can catch some basic buggs, if some paths of VACUUM
CONCURRENTLY will be touched in the future.
The problem with this test is: i don't know how to do anything async
in current TAP tests (needed in step 3). Also, maybe test with async
interaction
may be too flappy (producing false negative flaps) to support.
Sequential test for this feature would be much better, but I can't
think of one.

Also, should we create a cf entry for this thread already?

From:	Michael Banck <mbanck(at)gmx(dot)net>
To:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-22 09:53:58
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 22, 2024 at 01:23:03PM +0500, Kirill Reshke wrote:
> Also, should we create a cf entry for this thread already?

I was wondering about this as well, but there is one for the upcoming
commitfest already:

https://commitfest.postgresql.org/49/5117/

Michael

From:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-25 10:02:28
Message-ID:	CALdSSPg=9Y9sZMymieZoxOT0saVOOhBHv=UfXCjRAZmzDCYzzw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Tue, 30 Jan 2024 at 15:31, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> FWIW a newer, more modern and more trustworthy alternative to pg_repack
> is pg_squeeze, which I discovered almost by random chance, and soon
> discovered I liked it much more.

Can you please clarify this a bit more? What is the exact reason for
pg_squeeze being more trustworthy than pg_repack?
Is there something about the logical replication approach that makes
it more bulletproof than the trigger-based repack approach?

Also, I was thinking about pg_repack vs pg_squeeze being used for the
VACUUM FULL CONCURRENTLY feature, and I'm a bit suspicious about the
latter.
If I understand correctly, we essentially parse the whole WAL to
obtain info about one particular relation changes. That may be a big
overhead, whereas the trigger approach does
not suffer from this. So, there is the chance that VACUUM FULL
CONCURRENTLY will never keep up with vacuumed relation changes. Am I
right?

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-07-31 15:25:45
Message-ID:	16859.1722439545@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:

> Also, I was thinking about pg_repack vs pg_squeeze being used for the
> VACUUM FULL CONCURRENTLY feature, and I'm a bit suspicious about the
> latter.

> If I understand correctly, we essentially parse the whole WAL to
> obtain info about one particular relation changes. That may be a big
> overhead,

pg_squeeze is an extension but the logical decoding is performed by the core,
so there is no way to ensure that data changes of the "other tables" are not
decoded. However, it might be possible if we integrate the functionality into
the core. I'll consider doing so in the next version of [1].

> whereas the trigger approach does not suffer from this. So, there is the
> chance that VACUUM FULL CONCURRENTLY will never keep up with vacuumed
> relation changes. Am I right?

Perhaps it can happen, but note that trigger processing is also not free and
that in this case the cost is paid by the applications. So while VACUUM FULL
CONCURRENTLY (based on logical decoding) might fail to catch-up, the trigger
based solution may slow down the applications that execute DML commands while
the table is being rewritten.

[1] https://commitfest.postgresql.org/49/5117/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-08-02 06:09:26
Message-ID:	1788.1722578966@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:

> What is the size of the biggest relation successfully vacuumed
> via pg_squeeze?
> Looks like in case of big relartion or high insertion load,
> replication may lag and never catch up...

Users reports problems rather than successes, so I don't know. 400 GB was
reported in [1] but it's possible that the table size for this test was
determined based on available disk space.

I think that the amount of data changes performed during the "squeezing"
matters more than the table size. In [2] one user reported "thounsands of
UPSERTs per second", but the amount of data also depends on row size, which he
didn't mention.

pg_squeeze gives up if it fails to catch up a few times. The first version of
my patch does not check this, I'll add the corresponding code in the next
version.

> However, in general, the 3rd patch is really big, very hard to
> comprehend. Please consider splitting this into smaller (and
> reviewable) pieces.

I'll try to move some preparation steps into separate diffs, but not sure if
that will make the main diff much smaller. I prefer self-contained patches, as
also explained in [3].

> Also, we obviously need more tests on this. Both tap-test and
> regression tests I suppose.

Sure. The next version will use the injection points to test if "concurrent
data changes" are processed correctly.

> One more thing is about pg_squeeze background workers. They act in an
> autovacuum-like fashion, aren't they? Maybe we can support this kind
> of relation processing in core too?

Maybe later. Even just adding the CONCURRENTLY option to CLUSTER and VACUUM
FULL requires quite some effort.

[1] https://github.com/cybertec-postgresql/pg_squeeze/issues/51

[2]
https://github.com/cybertec-postgresql/pg_squeeze/issues/21#issuecomment-514495369

[3] http://peter.eisentraut.org/blog/2024/05/14/when-to-split-patches-for-postgresql

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-08-02 07:28:54
Message-ID:	CALdSSPiqTX7UGtGoHO6+btNQ-zXd=ETwxC1J9kGBOtJGLkaFWA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2 Aug 2024 at 11:09, Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
> > However, in general, the 3rd patch is really big, very hard to
> > comprehend. Please consider splitting this into smaller (and
> > reviewable) pieces.
>
> I'll try to move some preparation steps into separate diffs, but not sure if
> that will make the main diff much smaller. I prefer self-contained patches, as
> also explained in [3].

Thanks for sharing [3], it is a useful link.

There is actually one more case when ACCESS EXCLUSIVE is held: during
table rewrite (AT set TAM, AT set Tablespace and AT alter column type
are some examples).
This can be done CONCURRENTLY too, using the same logical replication
approach, or do I miss something?
I'm not saying we must do it immediately, this should be a separate
thread, but we can do some preparation work here.

I can see that a bunch of functions which are currently placed in
cluster.c can be moved to something like
logical_rewrite_heap.c. ConcurrentChange struct and
apply_concurrent_insert function is one example of such.

So, if this is the case, 0003 patch can be splitted in two:
The first one is general utility code for logical table rewrite
The second one with actual VACUUM CONCURRENTLY feature.

What do you think?

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-08-27 12:00:50
Message-ID:	76278.1724760050@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Attached is version 2, the feature itself is now in 0004.

Unlike version 1, it contains some regression tests (0006) and a new GUC to
control how long the AccessExclusiveLock may be held (0007).

Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:

> On Fri, 2 Aug 2024 at 11:09, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >
> > Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
> > > However, in general, the 3rd patch is really big, very hard to
> > > comprehend. Please consider splitting this into smaller (and
> > > reviewable) pieces.
> >
> > I'll try to move some preparation steps into separate diffs, but not sure if
> > that will make the main diff much smaller. I prefer self-contained patches, as
> > also explained in [3].
>
> Thanks for sharing [3], it is a useful link.
>
> There is actually one more case when ACCESS EXCLUSIVE is held: during
> table rewrite (AT set TAM, AT set Tablespace and AT alter column type
> are some examples).
> This can be done CONCURRENTLY too, using the same logical replication
> approach, or do I miss something?

Yes, the logical replication can potentially be used in other cases.

> I'm not saying we must do it immediately, this should be a separate
> thread, but we can do some preparation work here.
>
> I can see that a bunch of functions which are currently placed in
> cluster.c can be moved to something like
> logical_rewrite_heap.c. ConcurrentChange struct and
> apply_concurrent_insert function is one example of such.
>
> So, if this is the case, 0003 patch can be splitted in two:
> The first one is general utility code for logical table rewrite
> The second one with actual VACUUM CONCURRENTLY feature.

> What do you think?

I can imagine moving the function process_concurrent_changes() and subroutines
to a different file (e.g. rewriteheap.c), but moving it into a separate diff
that does not contain any call of the function makes little sense to me. Such
a diff would not add any useful functionality and could not be considered
refactoring either.

So far I at least moved some code to separate diffs: 0003 and 0005. I'll move
more if I find sensible opportunity in the future.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v02-0001-Adjust-signature-of-cluster_rel-and-its-subroutines.patch	text/x-diff	14.9 KB
v02-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch	text/x-diff	6.4 KB
v02-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v02-0004-Add-CONCURRENTLY-option-to-both-VACUUM-FULL-and-CLUS.patch	text/plain	170.4 KB
v02-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	39.0 KB
v02-0006-Add-regression-tests.patch	text/x-diff	10.4 KB
v02-0007-Introduce-cluster_max_xlock_time-configuration-varia.patch	text/x-diff	20.5 KB
v02-0008-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	26.2 KB

From:	Junwang Zhao <zhjwpku(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-08-30 10:18:12
Message-ID:	CAEG8a3+Dq5Kr9=banso5Z5_osHScpHv9-GZTrb8SOug_geY1Cw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Tue, Aug 27, 2024 at 8:01 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Attached is version 2, the feature itself is now in 0004.
>
> Unlike version 1, it contains some regression tests (0006) and a new GUC to
> control how long the AccessExclusiveLock may be held (0007).
>
> Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
>
> > On Fri, 2 Aug 2024 at 11:09, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > >
> > > Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
> > > > However, in general, the 3rd patch is really big, very hard to
> > > > comprehend. Please consider splitting this into smaller (and
> > > > reviewable) pieces.
> > >
> > > I'll try to move some preparation steps into separate diffs, but not sure if
> > > that will make the main diff much smaller. I prefer self-contained patches, as
> > > also explained in [3].
> >
> > Thanks for sharing [3], it is a useful link.
> >
> > There is actually one more case when ACCESS EXCLUSIVE is held: during
> > table rewrite (AT set TAM, AT set Tablespace and AT alter column type
> > are some examples).
> > This can be done CONCURRENTLY too, using the same logical replication
> > approach, or do I miss something?
>
> Yes, the logical replication can potentially be used in other cases.
>
> > I'm not saying we must do it immediately, this should be a separate
> > thread, but we can do some preparation work here.
> >
> > I can see that a bunch of functions which are currently placed in
> > cluster.c can be moved to something like
> > logical_rewrite_heap.c. ConcurrentChange struct and
> > apply_concurrent_insert function is one example of such.
> >
> > So, if this is the case, 0003 patch can be splitted in two:
> > The first one is general utility code for logical table rewrite
> > The second one with actual VACUUM CONCURRENTLY feature.
>
> > What do you think?
>
> I can imagine moving the function process_concurrent_changes() and subroutines
> to a different file (e.g. rewriteheap.c), but moving it into a separate diff
> that does not contain any call of the function makes little sense to me. Such
> a diff would not add any useful functionality and could not be considered
> refactoring either.
>
> So far I at least moved some code to separate diffs: 0003 and 0005. I'll move
> more if I find sensible opportunity in the future.
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>

Thanks for working on this, I think this is a very useful feature.

The patch doesn't compile in the debug build with errors:

../postgres/src/backend/commands/cluster.c: In function ‘get_catalog_state’:
../postgres/src/backend/commands/cluster.c:2771:33: error: declaration
of ‘td_src’ shadows a previous local [-Werror=shadow=compatible-local]
2771 | TupleDesc td_src, td_dst;
| ^~~~~~
../postgres/src/backend/commands/cluster.c:2741:25: note: shadowed
declaration is here
2741 | TupleDesc td_src = RelationGetDescr(rel);

you forgot the meson build for pgoutput_cluster

diff --git a/src/backend/meson.build b/src/backend/meson.build
index 78c5726814..0f9141a4ac 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
subdir('jit/llvm')
subdir('replication/libpqwalreceiver')
subdir('replication/pgoutput')
+subdir('replication/pgoutput_cluster')

I noticed that you use lmode/lock_mode/lockmode, there are lmode and lockmode
in the codebase, but I remember someone proposed all changes to lockmode, how
about sticking to lockmode in your patch?

0004:

+ sure that the old files do not change during the processing because the
+ chnages would get lost due to the swap.
typo

+ files. The data changes that took place during the creation of the new
+ table and index files are captured using logical decoding
+ (<xref linkend="logicaldecoding"/>) and applied before
+ the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+ is typically held only for the time needed to swap the files, which
+ should be pretty short.

I remember pg_squeeze also did some logical decoding after getting the exclusive
lock, if that is still true, I guess the doc above is not precise.

+ Note that <command>CLUSTER</command> with the
+ the <literal>CONCURRENTLY</literal> option does not try to order the
+ rows inserted into the table after the clustering started.

Do you mean after the *logical decoding* started here? If CLUSTER CONCURRENTLY
does not order rows at all, why bother implementing it?

+ errhint("CLUSTER CONCURRENTLY is only allowed for permanent relations")));

errhint messages should end with a dot. Why hardcoded to "CLUSTER CONCURRENTLY"
instead of parameter *stmt*.

+ ResourceOwner oldowner = CurrentResourceOwner;
+
+ /*
+ * In the CONCURRENT case, do the planning in a subtrensaction so that
typo

I did not see VacuumStmt changes in gram.y, how do we suppose to
use the vacuum full concurrently? I tried the following but no success.

[local] postgres(at)demo:5432-36097=# vacuum (concurrently) aircrafts_data;
ERROR: CONCURRENTLY can only be specified with VACUUM FULL

[local] postgres(at)demo:5432-36097=# vacuum full (concurrently) full
aircrafts_data;
ERROR: syntax error at or near "("
LINE 1: vacuum full (concurrently) full aircrafts_data;

--
Regards
Junwang Zhao

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-09-04 11:41:35
Message-ID:	9421.1725450095@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:

> Thanks for working on this, I think this is a very useful feature.
>
> The patch doesn't compile in the debug build with errors:
>
> ../postgres/src/backend/commands/cluster.c: In function ‘get_catalog_state’:
> ../postgres/src/backend/commands/cluster.c:2771:33: error: declaration
> of ‘td_src’ shadows a previous local [-Werror=shadow=compatible-local]
> 2771 | TupleDesc td_src, td_dst;
> | ^~~~~~
> ../postgres/src/backend/commands/cluster.c:2741:25: note: shadowed
> declaration is here
> 2741 | TupleDesc td_src = RelationGetDescr(rel);

ok, gcc14 complains here, the compiler I used before did not. Fixed.

> you forgot the meson build for pgoutput_cluster
>
> diff --git a/src/backend/meson.build b/src/backend/meson.build
> index 78c5726814..0f9141a4ac 100644
> --- a/src/backend/meson.build
> +++ b/src/backend/meson.build
> @@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
> subdir('jit/llvm')
> subdir('replication/libpqwalreceiver')
> subdir('replication/pgoutput')
> +subdir('replication/pgoutput_cluster')

Fixed, thanks. That might be the reason for the cfbot to fail when using
meson.

> I noticed that you use lmode/lock_mode/lockmode, there are lmode and lockmode
> in the codebase, but I remember someone proposed all changes to lockmode, how
> about sticking to lockmode in your patch?

Fixed.

> 0004:
>
> + sure that the old files do not change during the processing because the
> + chnages would get lost due to the swap.
> typo

Fixed.

>
> + files. The data changes that took place during the creation of the new
> + table and index files are captured using logical decoding
> + (<xref linkend="logicaldecoding"/>) and applied before
> + the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
> + is typically held only for the time needed to swap the files, which
> + should be pretty short.
>
> I remember pg_squeeze also did some logical decoding after getting the exclusive
> lock, if that is still true, I guess the doc above is not precise.

The decoding takes place before requesting the lock, as well as after
that. I've adjusted the paragraph, see 0007.

> + Note that <command>CLUSTER</command> with the
> + the <literal>CONCURRENTLY</literal> option does not try to order the
> + rows inserted into the table after the clustering started.
>
> Do you mean after the *logical decoding* started here? If CLUSTER CONCURRENTLY
> does not order rows at all, why bother implementing it?

The rows inserted before CLUSTER (CONCURRENTLY) started do get ordered, the
rows inserted after that do not. (Actually what matters is when the snapshot
for the initial load is created, but that happens in very early stage of the
processing. Not sure if user is interested in such implementation details.)

> + errhint("CLUSTER CONCURRENTLY is only allowed for permanent relations")));
>
> errhint messages should end with a dot. Why hardcoded to "CLUSTER CONCURRENTLY"
> instead of parameter *stmt*.

Fixed.

> + ResourceOwner oldowner = CurrentResourceOwner;
> +
> + /*
> + * In the CONCURRENT case, do the planning in a subtrensaction so that
> typo

Fixed.

> I did not see VacuumStmt changes in gram.y, how do we suppose to
> use the vacuum full concurrently? I tried the following but no success.

With the "parethesized syntax", new options can be added w/o changing
gram.y. (While the "unparenthesized syntax" is deprecated.)

> [local] postgres(at)demo:5432-36097=# vacuum (concurrently) aircrafts_data;
> ERROR: CONCURRENTLY can only be specified with VACUUM FULL

The "lazy" VACUUM works concurrently as such.

> [local] postgres(at)demo:5432-36097=# vacuum full (concurrently) full
> aircrafts_data;
> ERROR: syntax error at or near "("
> LINE 1: vacuum full (concurrently) full aircrafts_data;

This is not specific to the CONCURRENTLY option. For example:

postgres=3D# vacuum full (analyze) full aircrafts_data;
ERROR: syntax error at or near "("
LINE 1: vacuum full (analyze) full aircrafts_data;

(You seem to combine the parenthesized syntax with the unparenthesized.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v03-0001-Adjust-signature-of-cluster_rel-and-its-subroutines.patch	text/x-diff	14.9 KB
v03-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch	text/x-diff	6.4 KB
v03-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v03-0004-Add-CONCURRENTLY-option-to-both-VACUUM-FULL-and-CLUS.patch	text/plain	171.4 KB
v03-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	39.0 KB
v03-0006-Add-regression-tests.patch	text/x-diff	10.4 KB
v03-0007-Introduce-cluster_max_xlock_time-configuration-varia.patch	text/x-diff	20.4 KB
v03-0008-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	26.2 KB

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-09-06 08:08:35
Message-ID:	77690.1725610115@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

While trying to figure out why the regression tests fail sometimes on the
cfbot (not sure yet about the reason), I fixed some confusions in
begin_concurrent_cluster() and end_concurrent_cluster(). Those functions
should be a bit easier to understand now.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v04-0001-Adjust-signature-of-cluster_rel-and-its-subroutines.patch	text/x-diff	14.9 KB
v04-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch	text/x-diff	6.4 KB
v04-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v04-0004-Add-CONCURRENTLY-option-to-both-VACUUM-FULL-and-CLUS.patch	text/plain	171.1 KB
v04-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	39.0 KB
v04-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	39.0 KB
v04-0006-Add-regression-tests.patch	text/x-diff	10.4 KB
v04-0007-Introduce-cluster_max_xlock_time-configuration-varia.patch	text/x-diff	20.4 KB
v04-0008-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	26.2 KB

From:	Junwang Zhao <zhjwpku(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-09-14 14:54:06
Message-ID:	CAEG8a3LV=towSaMSZOHMBhhjSYkqA0L-KpV2537uNJzabeoF-w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 4, 2024 at 7:41 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
>
> > Thanks for working on this, I think this is a very useful feature.
> >
> > The patch doesn't compile in the debug build with errors:
> >
> > ../postgres/src/backend/commands/cluster.c: In function ‘get_catalog_state’:
> > ../postgres/src/backend/commands/cluster.c:2771:33: error: declaration
> > of ‘td_src’ shadows a previous local [-Werror=shadow=compatible-local]
> > 2771 | TupleDesc td_src, td_dst;
> > | ^~~~~~
> > ../postgres/src/backend/commands/cluster.c:2741:25: note: shadowed
> > declaration is here
> > 2741 | TupleDesc td_src = RelationGetDescr(rel);
>
> ok, gcc14 complains here, the compiler I used before did not. Fixed.
>
> > you forgot the meson build for pgoutput_cluster
> >
> > diff --git a/src/backend/meson.build b/src/backend/meson.build
> > index 78c5726814..0f9141a4ac 100644
> > --- a/src/backend/meson.build
> > +++ b/src/backend/meson.build
> > @@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
> > subdir('jit/llvm')
> > subdir('replication/libpqwalreceiver')
> > subdir('replication/pgoutput')
> > +subdir('replication/pgoutput_cluster')
>
> Fixed, thanks. That might be the reason for the cfbot to fail when using
> meson.
>
> > I noticed that you use lmode/lock_mode/lockmode, there are lmode and lockmode
> > in the codebase, but I remember someone proposed all changes to lockmode, how
> > about sticking to lockmode in your patch?
>
> Fixed.
>
> > 0004:
> >
> > + sure that the old files do not change during the processing because the
> > + chnages would get lost due to the swap.
> > typo
>
> Fixed.
>
> >
> > + files. The data changes that took place during the creation of the new
> > + table and index files are captured using logical decoding
> > + (<xref linkend="logicaldecoding"/>) and applied before
> > + the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
> > + is typically held only for the time needed to swap the files, which
> > + should be pretty short.
> >
> > I remember pg_squeeze also did some logical decoding after getting the exclusive
> > lock, if that is still true, I guess the doc above is not precise.
>
> The decoding takes place before requesting the lock, as well as after
> that. I've adjusted the paragraph, see 0007.
>
> > + Note that <command>CLUSTER</command> with the
> > + the <literal>CONCURRENTLY</literal> option does not try to order the
> > + rows inserted into the table after the clustering started.
> >
> > Do you mean after the *logical decoding* started here? If CLUSTER CONCURRENTLY
> > does not order rows at all, why bother implementing it?
>
> The rows inserted before CLUSTER (CONCURRENTLY) started do get ordered, the
> rows inserted after that do not. (Actually what matters is when the snapshot
> for the initial load is created, but that happens in very early stage of the
> processing. Not sure if user is interested in such implementation details.)
>
>
> > + errhint("CLUSTER CONCURRENTLY is only allowed for permanent relations")));
> >
> > errhint messages should end with a dot. Why hardcoded to "CLUSTER CONCURRENTLY"
> > instead of parameter *stmt*.
>
> Fixed.
>
> > + ResourceOwner oldowner = CurrentResourceOwner;
> > +
> > + /*
> > + * In the CONCURRENT case, do the planning in a subtrensaction so that
> > typo
>
> Fixed.
>
> > I did not see VacuumStmt changes in gram.y, how do we suppose to
> > use the vacuum full concurrently? I tried the following but no success.
>
> With the "parethesized syntax", new options can be added w/o changing
> gram.y. (While the "unparenthesized syntax" is deprecated.)
>
> > [local] postgres(at)demo:5432-36097=# vacuum (concurrently) aircrafts_data;
> > ERROR: CONCURRENTLY can only be specified with VACUUM FULL
>
> The "lazy" VACUUM works concurrently as such.
>
> > [local] postgres(at)demo:5432-36097=# vacuum full (concurrently) full
> > aircrafts_data;
> > ERROR: syntax error at or near "("
> > LINE 1: vacuum full (concurrently) full aircrafts_data;
>
> This is not specific to the CONCURRENTLY option. For example:
>
> postgres=3D# vacuum full (analyze) full aircrafts_data;
> ERROR: syntax error at or near "("
> LINE 1: vacuum full (analyze) full aircrafts_data;
>
> (You seem to combine the parenthesized syntax with the unparenthesized.)

Yeah, my mistake, *vacuum (full, concurrently)* works.

+ if (TransactionIdIsNormal(HeapTupleHeaderGetRawXmax(tuple->t_data)) &&
+ HeapTupleMVCCNotDeleted(tuple, snapshot, buffer))
+ {
+ /* TODO More work needed here?*/
+ tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
+ HeapTupleHeaderSetXmax(tuple->t_data, 0);
+ }

I don't quite understand the above code, IIUC xmax and xmax invalid
are set directly on the buffer page. What if the command failed? Will
this break the visibility rules?

btw, v4-0006 failed to apply.

>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>

--
Regards
Junwang Zhao

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-10-09 12:56:17
Message-ID:	62605.1728478577@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:

> + if (TransactionIdIsNormal(HeapTupleHeaderGetRawXmax(tuple->t_data)) &&
> + HeapTupleMVCCNotDeleted(tuple, snapshot, buffer))
> + {
> + /* TODO More work needed here?*/
> + tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
> + HeapTupleHeaderSetXmax(tuple->t_data, 0);
> + }
>
> I don't quite understand the above code, IIUC xmax and xmax invalid
> are set directly on the buffer page. What if the command failed? Will
> this break the visibility rules?

Oh, that's too bad. Of course, the tuple must be copied. Fixed in the next
version. Thanks!

(0009- added to get some debugging information from the cfbot. It fails
sometimes and I'm not able to reproduce the errors.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v05-0001-Adjust-signature-of-cluster_rel-and-its-subroutines.patch	text/x-diff	14.9 KB
v05-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch	text/x-diff	6.4 KB
v05-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v05-0004-Add-CONCURRENTLY-option-to-both-VACUUM-FULL-and-CLUS.patch	text/plain	170.9 KB
v05-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	39.0 KB
v05-0006-Add-regression-tests.patch	text/x-diff	10.4 KB
v05-0007-Introduce-cluster_max_xlock_time-configuration-varia.patch	text/x-diff	20.4 KB
v05-0008-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	26.6 KB
v05-0009-elog.patch	text/x-diff	2.2 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-12-06 13:46:01
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Oct-09, Antonin Houska wrote:

> diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
> index 9110938fab..f1008f5013 100644
> --- a/doc/src/sgml/ref/vacuum.sgml
> +++ b/doc/src/sgml/ref/vacuum.sgml

> @@ -61,8 +62,12 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
> <para>
> Without a <replaceable class="parameter">table_and_columns</replaceable>
> list, <command>VACUUM</command> processes every table and materialized view
> - in the current database that the current user has permission to vacuum.
> - With a list, <command>VACUUM</command> processes only those table(s).
> + in the current database that the current user has permission to vacuum. If
> + the <literal>CONCURRENTLY</literal> is specified (see below), tables which
> + have not been clustered yet are silently skipped. With a
> + list, <command>VACUUM</command> processes only those table(s). If
> + the <literal>CONCURRENTLY</literal> is specified, the list may only contain
> + tables which have already been clustered.
> </para>

The idea that VACUUM CONCURRENTLY can only process tables that have been
clustered sounds very strange to me. I don't think such a restriction
would really fly. However, I think this may just be a documentation
mistake; can you please clarify? I am tempted to suggest that VACUUM
CONCURRENTLY should receive a table list; without a list, it should
raise an error. This is not supposed to be a routine maintenance
command that you can run on all your tables, after all. Heck, maybe
don't even accept a partitioned table -- the user can process one
partition at a time, if they need that.

I don't believe in the need for the LOCK_CLUSTER_CONCURRENT define; IMO
the code should just use ShareUpdateExclusiveLock where needed.

In 0001, the new API of make_new_heap() is somewhat bizarre regarding
the output lockmode_new_p parameter. I didn't find any place in the
patch series where we use that to return a different lock level that the
caller gave; the only case were we do something that looks funny is when
a toast table is involved. But I don't think I fully understand what is
going on in that case. I'm likely missing something here, but isn't it
simpler to just state that make_new_heap will obtain a lock on the new
heap, and that the immediately following table_open needn't acquire a
lock (or, in the case of RefreshMatViewByOid, no LockRelationOid is
necessary)?

Anyway, I propose some cosmetic cleanups for 0001 in attachment,
including changing make_new_heap to assume a non-null value of
lockmode_new_p. I didn't go as far as making it no longer a pointer,
but if it can be done, then I suggest we should do that. I didn't try
to apply the next patches in the series after this one.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

Attachment	Content-Type	Size
0001-Minor-code-review.patch.nocfbot	text/plain	10.8 KB

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2024-12-11 18:29:57
Message-ID:	47194.1733941797@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2024-Oct-09, Antonin Houska wrote:
>
> > @@ -61,8 +62,12 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
> > <para>
> > Without a <replaceable class="parameter">table_and_columns</replaceable>
> > list, <command>VACUUM</command> processes every table and materialized view
> > - in the current database that the current user has permission to vacuum.
> > - With a list, <command>VACUUM</command> processes only those table(s).
> > + in the current database that the current user has permission to vacuum. If
> > + the <literal>CONCURRENTLY</literal> is specified (see below), tables which
> > + have not been clustered yet are silently skipped. With a
> > + list, <command>VACUUM</command> processes only those table(s). If
> > + the <literal>CONCURRENTLY</literal> is specified, the list may only contain
> > + tables which have already been clustered.
> > </para>
>
> The idea that VACUUM CONCURRENTLY can only process tables that have been
> clustered sounds very strange to me. I don't think such a restriction
> would really fly. However, I think this may just be a documentation
> mistake; can you please clarify?

Right, it was a documentation problem. I think the fact that VACUUM FULL is
internally (almost) an alias for CLUSTER is what distracted me.

> I am tempted to suggest that VACUUM CONCURRENTLY should receive a table
> list; without a list, it should raise an error. This is not supposed to be
> a routine maintenance command that you can run on all your tables, after
> all.

ok, implemented

> Heck, maybe don't even accept a partitioned table -- the user can
> process one partition at a time, if they need that.

I also thought of this but forgot. Done now.

> I don't believe in the need for the LOCK_CLUSTER_CONCURRENT define; IMO
> the code should just use ShareUpdateExclusiveLock where needed.

ok, removed

> In 0001, the new API of make_new_heap() is somewhat bizarre regarding
> the output lockmode_new_p parameter.

Oh, it was too messy. I think I was thinking of too many things at once (such
as locking the old heap, the new heap and the new heap's TOAST). Also, one
thing that might have contributed to the confusion is that make_new_heap() has
the 'lockmode' argument, which receives various values from various
callers. However, both the new heap and its TOAST relation are eventually
created by heap_create_with_catalog(), and this function always leaves the new
relation locked in AccessExclusiveMode. Maybe this needs some refactoring.

Therefore I reverted the changes arount make_new_heap() and simply pass NoLock
for lockmode in cluster.c

> Anyway, I propose some cosmetic cleanups for 0001 in attachment,
> including changing make_new_heap to assume a non-null value of
> lockmode_new_p. I didn't go as far as making it no longer a pointer,
> but if it can be done, then I suggest we should do that. I didn't try
> to apply the next patches in the series after this one.

Thanks, applied (except the changes related to make_new_heap(), which is not
touched in the next version of the patch.)

Next version is attached. It also tries to fix CI problems reported on
machines with -DRELCACHE_FORCE_RELEASE.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v06-0001-Adjust-signature-of-cluster_rel-and-its-subroutines.patch	text/x-diff	13.3 KB
v06-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch	text/x-diff	6.4 KB
v06-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v06-0004-Add-CONCURRENTLY-option-to-both-VACUUM-FULL-and-CLUS.patch	text/plain	170.9 KB
v06-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	39.7 KB
v06-0006-Add-regression-tests.patch	text/x-diff	10.4 KB
v06-0007-Introduce-cluster_max_xlock_time-configuration-varia.patch	text/x-diff	20.3 KB
v06-0008-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	26.2 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-09 13:35:42
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Dec-11, Antonin Houska wrote:

> Oh, it was too messy. I think I was thinking of too many things at once (such
> as locking the old heap, the new heap and the new heap's TOAST). Also, one
> thing that might have contributed to the confusion is that make_new_heap() has
> the 'lockmode' argument, which receives various values from various
> callers. However, both the new heap and its TOAST relation are eventually
> created by heap_create_with_catalog(), and this function always leaves the new
> relation locked in AccessExclusiveMode. Maybe this needs some refactoring.
>
> Therefore I reverted the changes arount make_new_heap() and simply pass NoLock
> for lockmode in cluster.c

Cool, thanks, I have pushed this. I made some additional minor changes,
nothing earth-shattering.

Meanwhile the patch 0004 has some seemingly trivial conflicts. If you
want to rebase, I'd appreciate that. In the meantime I'll give a look
at the next two other API changes.

I'm not happy with the idea of having this new command be VACUUM (FULL
CONCURRENTLY). It's a bit of an absurd name if you ask me. Heck, even
VACUUM (FULL) seems a bit absurd nowadays.

Maybe we should have a new toplevel command. Some ideas that have been
thrown around:

- RETABLE (it's like REINDEX, but for tables)
- ALTER TABLE <tab> SQUEEZE
- SQUEEZE <table>
- VACUUM (SQUEEZE)
- VACUUM (COMPACT)
- MAINTAIN <tab> COMPACT
- MAINTAIN <tab> SQUEEZE

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-09 17:08:39
Message-ID:	CAFj8pRDRnAH9gXBhK2DzXUpgZuNm-BZ3oP6PocHRuBqo1Se2Hg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

čt 9. 1. 2025 v 14:35 odesílatel Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
napsal:

> On 2024-Dec-11, Antonin Houska wrote:
>
> > Oh, it was too messy. I think I was thinking of too many things at once
> (such
> > as locking the old heap, the new heap and the new heap's TOAST). Also,
> one
> > thing that might have contributed to the confusion is that
> make_new_heap() has
> > the 'lockmode' argument, which receives various values from various
> > callers. However, both the new heap and its TOAST relation are eventually
> > created by heap_create_with_catalog(), and this function always leaves
> the new
> > relation locked in AccessExclusiveMode. Maybe this needs some
> refactoring.
> >
> > Therefore I reverted the changes arount make_new_heap() and simply pass
> NoLock
> > for lockmode in cluster.c
>
> Cool, thanks, I have pushed this. I made some additional minor changes,
> nothing earth-shattering.
>
> Meanwhile the patch 0004 has some seemingly trivial conflicts. If you
> want to rebase, I'd appreciate that. In the meantime I'll give a look
> at the next two other API changes.
>
> I'm not happy with the idea of having this new command be VACUUM (FULL
> CONCURRENTLY). It's a bit of an absurd name if you ask me. Heck, even
> VACUUM (FULL) seems a bit absurd nowadays.
>

Although it can sound absurd - it makes perfect sense for me - both "FULL"
and "CONCURRENTLY" are years used terms.

Maybe we can introduce a synonym like COMPACT for FULL.

I don't see a strong benefit for introducing a new command (with almost all
identical functionality) just because the words sound strange. But I
understand this.
There is inconsistency between behaviour of VACUUM COMMAND - lazy VACUUM is
by default CONCURRENTLY, but FULL VACUUM NOT, so I understand
to request to introduce a new command. But is it really necessary?

Introducing synonyms for (FULL) COMPACT or SQUEEZE can be enough, and can
work well. It is better than the introduction synonym for VACUUM command.

Regards

Pavel

> Maybe we should have a new toplevel command. Some ideas that have been
> thrown around:
>
> - RETABLE (it's like REINDEX, but for tables)
> - ALTER TABLE <tab> SQUEEZE
> - SQUEEZE <table>
> - VACUUM (SQUEEZE)
> - VACUUM (COMPACT)
> - MAINTAIN <tab> COMPACT
> - MAINTAIN <tab> SQUEEZE
>
> --
> Álvaro Herrera 48°01'N 7°57'E —
> https://www.EnterpriseDB.com/
>

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-09 17:26:02
Message-ID:	10818.1736443562@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2024-Dec-11, Antonin Houska wrote:
>
> > Oh, it was too messy. I think I was thinking of too many things at once (such
> > as locking the old heap, the new heap and the new heap's TOAST). Also, one
> > thing that might have contributed to the confusion is that make_new_heap() has
> > the 'lockmode' argument, which receives various values from various
> > callers. However, both the new heap and its TOAST relation are eventually
> > created by heap_create_with_catalog(), and this function always leaves the new
> > relation locked in AccessExclusiveMode. Maybe this needs some refactoring.
> >
> > Therefore I reverted the changes arount make_new_heap() and simply pass NoLock
> > for lockmode in cluster.c
>
> Cool, thanks, I have pushed this. I made some additional minor changes,
> nothing earth-shattering.

It seems you accidentally fixed another problem :-) I was referring to the
'lockmode' argument of make_new_heap(). I can try to write a patch for that
but ...

> Meanwhile the patch 0004 has some seemingly trivial conflicts. If you
> want to rebase, I'd appreciate that. In the meantime I'll give a look
> at the next two other API changes.

... I can apply v06 even though I do have the commit ebd8fc7e47 in my working
tree. (And the CF bot does not complain (yet?).) Have you removed the
'lockmode' argument also from make_new_heap() and forgot to push it? This
change would probably cause a conflict with v06.

> I'm not happy with the idea of having this new command be VACUUM (FULL
> CONCURRENTLY). It's a bit of an absurd name if you ask me. Heck, even
> VACUUM (FULL) seems a bit absurd nowadays.
>
> Maybe we should have a new toplevel command. Some ideas that have been
> thrown around:
>
> - RETABLE (it's like REINDEX, but for tables)
> - ALTER TABLE <tab> SQUEEZE
> - SQUEEZE <table>
> - VACUUM (SQUEEZE)
> - VACUUM (COMPACT)
> - MAINTAIN <tab> COMPACT
> - MAINTAIN <tab> SQUEEZE

I recall that DB2 has REORG command, which also can do clustering [1]

Regardless the name of the new command, should that also handle the
non-concurrent cases? In that case we'd probably need to mark CLUSTER and
VACUUM (FULL) as deprecated.

[1] https://www.ibm.com/docs/en/db2/12.1?topic=commands-reorg-table

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-09 19:52:43
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-Jan-09, Antonin Houska wrote:

> It seems you accidentally fixed another problem :-) I was referring to the
> 'lockmode' argument of make_new_heap(). I can try to write a patch for that
> but ...
>
> > Meanwhile the patch 0004 has some seemingly trivial conflicts. If you
> > want to rebase, I'd appreciate that. In the meantime I'll give a look
> > at the next two other API changes.
>
> ... I can apply v06 even though I do have the commit ebd8fc7e47 in my working
> tree. (And the CF bot does not complain (yet?).) Have you removed the
> 'lockmode' argument also from make_new_heap() and forgot to push it? This
> change would probably cause a conflict with v06.

Hmm, I'm not sure what you mean. My changes are just comment updates,
plus I added asserts that relations are locked to match what the
comments say (this made me touch matview.c, including removal of a
pointless lock acquisition there); also in cluster_rel I moved one
initialization to a different place. A diff between your patch and what
I committed is below. But neither of those changes cause the conflict
to apply 0002,3,4 from v06 to master; rather the conflict comes from
older commits to cluster.c. I'm really surprised that you see no
conflicts. Maybe I'm doing something wrong. (I tried "git am", "git
apply -3" and "patch -p1" and they all result in a few rejects.)

> > Maybe we should have a new toplevel command. Some ideas that have been
> > thrown around:
> >
> > - RETABLE (it's like REINDEX, but for tables)
> > - ALTER TABLE <tab> SQUEEZE
> > - SQUEEZE <table>
> > - VACUUM (SQUEEZE)
> > - VACUUM (COMPACT)
> > - MAINTAIN <tab> COMPACT
> > - MAINTAIN <tab> SQUEEZE
>
> I recall that DB2 has REORG command, which also can do clustering [1]

That's an idea, but ... man, is that ugly!

> Regardless the name of the new command, should that also handle the
> non-concurrent cases? In that case we'd probably need to mark CLUSTER and
> VACUUM (FULL) as deprecated.

Hmm, I don't feel a need to change CLUSTER (since it does one pretty
well defined thing), but I would be okay adding a command that does both
things (concurrent and non concurrent, chosen as per options specified),
and redefine VACUUM FULL as "an obsolete alias for New Shiny Command".
I'm not strongly opposed to also aliasing CLUSTER, note, I just think
it's not really necessary.

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 4c0d6b80d2a..99193f5c886 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -316,7 +316,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
int save_nestlevel;
bool verbose = ((params->options & CLUOPT_VERBOSE) != 0);
bool recheck = ((params->options & CLUOPT_RECHECK) != 0);
- Relation index = NULL;
+ Relation index;

Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));

@@ -434,10 +434,13 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
/* Check heap and index are valid to cluster on */
if (OidIsValid(indexOid))
{
+ /* verify the index is good and lock it */
check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
- /* Open the index (It should already be locked.) */
+ /* also open it */
index = index_open(indexOid, NoLock);
}
+ else
+ index = NULL;

/*
* Quietly ignore the request if this is a materialized view which has not
@@ -615,12 +618,12 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
/*
* rebuild_relation: rebuild an existing relation in index or physical order
*
- * OldHeap: table to rebuild --- must be opened and exclusive-locked!
- * index: index to cluster by, or NULL to rewrite in physical order. Must be
- * opened and locked.
+ * OldHeap: table to rebuild.
+ * index: index to cluster by, or NULL to rewrite in physical order.
*
- * On exit, the heap (and also the index, if one was passed) are closed, but
- * still locked with AccessExclusiveLock.
+ * On entry, heap and index (if one is given) must be open, and
+ * AccessExclusiveLock held on them.
+ * On exit, they are closed, but locks on them are not released.
*/
static void
rebuild_relation(Relation OldHeap, Relation index, bool verbose)
@@ -636,6 +639,9 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
TransactionId frozenXid;
MultiXactId cutoffMulti;

+ Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
+ (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+
if (index)
/* Mark the correct index as clustered */
mark_index_clustered(OldHeap, RelationGetRelid(index), true);
@@ -647,18 +653,15 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
/*
* Create the transient table that will receive the re-ordered data.
*
- * NoLock for the old heap because we already have it locked and want to
- * keep unlocking straightforward. The new heap (and its TOAST if one
- * exists) will be locked in AccessExclusiveMode on return. Since others
- * can't see it yet, we do not care.
+ * OldHeap is already locked, so no need to lock it again. make_new_heap
+ * obtains AccessExclusiveLock on the new heap and its toast table.
*/
OIDNewHeap = make_new_heap(tableOid, tableSpace,
accessMethod,
relpersistence,
NoLock);
+ Assert(CheckRelationOidLockedByMe(OIDNewHeap, AccessExclusiveLock, false));
NewHeap = table_open(OIDNewHeap, NoLock);
- /* NewHeap already locked by make_new_heap */
- Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));

/* Copy the heap data into the new table in the desired order */
copy_table_data(NewHeap, OldHeap, index, verbose,
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 4b3d4822872..c12817091ed 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -319,7 +319,7 @@ RefreshMatViewByOid(Oid matviewOid, bool is_create, bool skipData,
OIDNewHeap = make_new_heap(matviewOid, tableSpace,
matviewRel->rd_rel->relam,
relpersistence, ExclusiveLock);
- LockRelationOid(OIDNewHeap, AccessExclusiveLock);
+ Assert(CheckRelationOidLockedByMe(OIDNewHeap, AccessExclusiveLock, false));

/* Generate the data, if wanted. */
if (!skipData)

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-10 09:31:47
Message-ID:	2532.1736501507@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> Hi
>
> čt 9. 1. 2025 v 14:35 odesílatel Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> napsal:
>
> On 2024-Dec-11, Antonin Houska wrote:
>
> > Oh, it was too messy. I think I was thinking of too many things at once (such
> > as locking the old heap, the new heap and the new heap's TOAST). Also, one
> > thing that might have contributed to the confusion is that make_new_heap() has
> > the 'lockmode' argument, which receives various values from various
> > callers. However, both the new heap and its TOAST relation are eventually
> > created by heap_create_with_catalog(), and this function always leaves the new
> > relation locked in AccessExclusiveMode. Maybe this needs some refactoring.
> >
> > Therefore I reverted the changes arount make_new_heap() and simply pass NoLock
> > for lockmode in cluster.c
>
> Cool, thanks, I have pushed this. I made some additional minor changes,
> nothing earth-shattering.
>
> Meanwhile the patch 0004 has some seemingly trivial conflicts. If you
> want to rebase, I'd appreciate that. In the meantime I'll give a look
> at the next two other API changes.
>
> I'm not happy with the idea of having this new command be VACUUM (FULL
> CONCURRENTLY). It's a bit of an absurd name if you ask me. Heck, even
> VACUUM (FULL) seems a bit absurd nowadays.
>
> Although it can sound absurd - it makes perfect sense for me - both "FULL" and "CONCURRENTLY" are years used terms.
>
> Maybe we can introduce a synonym like COMPACT for FULL.

Yes, at first glance, FULL might indicate to users that it processes the whole
table, however VACUUM does that regardless this option. COMPACT would be more
accurate because it would tell that, besides removing dead tuples, unused
space is removed properly.

However I'm not sure if the FULL option should have been added to VACUUM at
all. Note that, internally, it uses completely different approach to the
problem of garbage collection. As a consequence, there are several options
which are not compatible with the FULL option: PARALLEL,
DISABLE_PAGE_SKIPPING, BUFFER_USAGE_LIMIT, and maybe some more.

Thus I understand Alvaro's objections against VACUUM (FULL, CONCURRENTLY).

> I don't see a strong benefit for introducing a new command (with almost all
> identical functionality) just because the words sound strange.

If we turn the FULL option into an alias for the new command, and remove that
after "some time", then there is no identical functionality anymore.

The new functionality overlaps with CLUSTER, except that it works
CONCURRENTLY. However, invoking the new functionality via CLUSTER
(CONCURRENTLY) is not a complete solution because it's also usable w/o
ordering. That's why a new command makes sense to me.

After all, the new code aims primarily at bloat removal rather than at
ordering. Note that it only orders the existing rows, but does not even try to
order the rows inserted into the table while the data is being copied to the
new file.

Therefore I can imagine adding a new command that acts like VACUUM (FULL,
CONCURRENTLY), but does not try to be CLUSTER (CONCURRENTL).

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-10 11:10:57
Message-ID:	CAB-JLwZn7Qai1KZks9cuBNoJa9iZjKgG_xjvFyO-VQMFdH=7xg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Em sex., 10 de jan. de 2025 às 06:31, Antonin Houska <ah(at)cybertec(dot)at>
escreveu:

> Thus I understand Alvaro's objections against VACUUM (FULL, CONCURRENTLY).
>

> Therefore I can imagine adding a new command that acts like VACUUM (FULL,
> CONCURRENTLY), but does not try to be CLUSTER (CONCURRENTL).
>

If VACUUM FULL and CLUSTER do the same, why not have a single command ?

--VACUUM FULL would be
RECREATE TABLE [ CONCURRENTLY ] table_name
--CLUSTER would be
RECREATE TABLE [ CONCURRENTLY ] table_name CLUSTERED [ON index_name]
--Maybe someday reordering fields would be
RECREATE TABLE [ CONCURRENTLY ] table_name CLUSTERED [ON index_name] [USING
FIELDS (FLD4,FLD5,FLD3,FLD1,FLD2)]

regards
Marcos

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-11 14:01:54
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-01-09 Th 8:35 AM, Alvaro Herrera wrote:
>
> I'm not happy with the idea of having this new command be VACUUM (FULL
> CONCURRENTLY). It's a bit of an absurd name if you ask me. Heck, even
> VACUUM (FULL) seems a bit absurd nowadays.
>
> Maybe we should have a new toplevel command. Some ideas that have been
> thrown around:
>
> - RETABLE (it's like REINDEX, but for tables)
> - ALTER TABLE <tab> SQUEEZE
> - SQUEEZE <table>
> - VACUUM (SQUEEZE)
> - VACUUM (COMPACT)
> - MAINTAIN <tab> COMPACT
> - MAINTAIN <tab> SQUEEZE
>

My $0.02:

COMPACT tablename ...

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

From:	Sami Imseih <samimseih(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-11 15:01:16
Message-ID:	CAA5RZ0v3tV-Oveanzkcg_E0GpVaJMJYAtCbSeMb-CiaghCtF7g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> > - RETABLE (it's like REINDEX, but for tables)
> > - ALTER TABLE <tab> SQUEEZE
> > - SQUEEZE <table>
> > - VACUUM (SQUEEZE)
> > - VACUUM (COMPACT)
> > - MAINTAIN <tab> COMPACT
> > - MAINTAIN <tab> SQUEEZE
> >
>
> My $0.02:
>
> COMPACT tablename ...

+1 to "COMPACT tablename"

Also the current pg_stat_progress_cluster also tracks
progress for both CLUSTER and VACUUM FULL. pg_stat_progress_vacuum
tracks progress for VACUUM. I suggest we create a new progress
view for VACUUM FULL ( or the new command agreed upon ).

Regards,

Sami

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-13 13:48:31
Message-ID:	6250.1736776111@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2025-Jan-09, Antonin Houska wrote:
>
> > It seems you accidentally fixed another problem :-) I was referring to the
> > 'lockmode' argument of make_new_heap(). I can try to write a patch for that
> > but ...
> >
> > > Meanwhile the patch 0004 has some seemingly trivial conflicts. If you
> > > want to rebase, I'd appreciate that. In the meantime I'll give a look
> > > at the next two other API changes.

This is the patch series rebased on top of the commit cc811f92ba.

I haven't addressed the problem of a new command yet - for that I'd like to
see some sort of consensus, so that I do not have to do all the related
changes many times.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v07-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch	text/x-diff	6.6 KB
v07-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v07-0004-Add-CONCURRENTLY-option-to-both-VACUUM-FULL-and-CLUS.patch	text/plain	171.4 KB
v07-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	39.7 KB
v07-0006-Add-regression-tests.patch	text/x-diff	10.4 KB
v07-0007-Introduce-cluster_max_xlock_time-configuration-varia.patch	text/x-diff	20.4 KB

From:	Michael Banck <mbanck(at)gmx(dot)net>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-13 13:55:48
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Sat, Jan 11, 2025 at 09:01:54AM -0500, Andrew Dunstan wrote:
> On 2025-01-09 Th 8:35 AM, Alvaro Herrera wrote:
> > Maybe we should have a new toplevel command. Some ideas that have been
> > thrown around:
> >
> > - RETABLE (it's like REINDEX, but for tables)
> > - ALTER TABLE <tab> SQUEEZE
> > - SQUEEZE <table>
> > - VACUUM (SQUEEZE)
> > - VACUUM (COMPACT)
> > - MAINTAIN <tab> COMPACT
> > - MAINTAIN <tab> SQUEEZE

I don't like any of them a lot :-/

> COMPACT tablename ...

That sounds like it would compress content rather than just rewrite it
normally to get rid of bloat.

I think REORG (or REPACK, but that has not history elsewhere) would fit
best, we don't need to emulate the myriad of DB2 options...

Michael

From:	Robert Treat <rob(at)xzilla(dot)net>
To:	Michael Banck <mbanck(at)gmx(dot)net>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-15 16:13:34
Message-ID:	CABV9wwPmHGchWyZ-DGWyvXdeiJa0A9b21aqJxNH22yy=kfjLUQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 13, 2025 at 8:56 AM Michael Banck <mbanck(at)gmx(dot)net> wrote:
> Hi,
> On Sat, Jan 11, 2025 at 09:01:54AM -0500, Andrew Dunstan wrote:
> > On 2025-01-09 Th 8:35 AM, Alvaro Herrera wrote:
> > > Maybe we should have a new toplevel command. Some ideas that have been
> > > thrown around:
> > >
> > > - RETABLE (it's like REINDEX, but for tables)
> > > - ALTER TABLE <tab> SQUEEZE
> > > - SQUEEZE <table>
> > > - VACUUM (SQUEEZE)
> > > - VACUUM (COMPACT)
> > > - MAINTAIN <tab> COMPACT
> > > - MAINTAIN <tab> SQUEEZE
>
> I don't like any of them a lot :-/
>

Agreed, though I do believe there would be a positive gain from
eliminating the overloaded CLUSTER term.

> > COMPACT tablename ...
>
> That sounds like it would compress content rather than just rewrite it
> normally to get rid of bloat.
>
> I think REORG (or REPACK, but that has not history elsewhere) would fit
> best, we don't need to emulate the myriad of DB2 options...
>

I would like REPACK if I didn't believe it would lead to confusion
with pg_repack (which, afaict, seems to have better performance
characteristics, so will probably hang around).

Actually, I wonder if we are too focused on the idea this is a
vaccum/bloat related tool. The original idea behind CLUSTER was not
related to vacuum or bloat management, but performance. There are
other reasons to want to rewrite a table as well (think dropped
columns or new column defaults). Is ALTER TABLE <table> REWRITE an
option? Current needed options would be for clustering or running
concurrently, but even without those options sometimes you just want
to rewrite the table, and this is probably the most straightforward
than making something up.

Robert Treat
https://xzilla.net

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Treat <rob(at)xzilla(dot)net>, Michael Banck <mbanck(at)gmx(dot)net>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-15 16:45:32
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-01-15 We 11:13 AM, Robert Treat wrote:
> On Mon, Jan 13, 2025 at 8:56 AM Michael Banck <mbanck(at)gmx(dot)net> wrote:
>> Hi,
>> On Sat, Jan 11, 2025 at 09:01:54AM -0500, Andrew Dunstan wrote:
>>> On 2025-01-09 Th 8:35 AM, Alvaro Herrera wrote:
>>>> Maybe we should have a new toplevel command. Some ideas that have been
>>>> thrown around:
>>>>
>>>> - RETABLE (it's like REINDEX, but for tables)
>>>> - ALTER TABLE <tab> SQUEEZE
>>>> - SQUEEZE <table>
>>>> - VACUUM (SQUEEZE)
>>>> - VACUUM (COMPACT)
>>>> - MAINTAIN <tab> COMPACT
>>>> - MAINTAIN <tab> SQUEEZE
>> I don't like any of them a lot :-/
>>
> Agreed, though I do believe there would be a positive gain from
> eliminating the overloaded CLUSTER term.
>
>>> COMPACT tablename ...
>> That sounds like it would compress content rather than just rewrite it
>> normally to get rid of bloat.
>>
>> I think REORG (or REPACK, but that has not history elsewhere) would fit
>> best, we don't need to emulate the myriad of DB2 options...
>>
> I would like REPACK if I didn't believe it would lead to confusion
> with pg_repack (which, afaict, seems to have better performance
> characteristics, so will probably hang around).
>
> Actually, I wonder if we are too focused on the idea this is a
> vaccum/bloat related tool. The original idea behind CLUSTER was not
> related to vacuum or bloat management, but performance. There are
> other reasons to want to rewrite a table as well (think dropped
> columns or new column defaults). Is ALTER TABLE <table> REWRITE an
> option? Current needed options would be for clustering or running
> concurrently, but even without those options sometimes you just want
> to rewrite the table, and this is probably the most straightforward
> than making something up.
>
>

I really don't like any of the ALTER TABLE variants, because that's
about changing the table's definition, and this operation doesn't do
that. I could live with REORG as a top level verb if you don't like COMPACT.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

From:	Michael Banck <mbanck(at)gmx(dot)net>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-30 08:34:01
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Mon, Jan 13, 2025 at 02:48:31PM +0100, Antonin Houska wrote:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > On 2025-Jan-09, Antonin Houska wrote:
> > > It seems you accidentally fixed another problem :-) I was referring to the
> > > 'lockmode' argument of make_new_heap(). I can try to write a patch for that
> > > but ...
> > >
> > > > Meanwhile the patch 0004 has some seemingly trivial conflicts. If you
> > > > want to rebase, I'd appreciate that. In the meantime I'll give a look
> > > > at the next two other API changes.
>
> This is the patch series rebased on top of the commit cc811f92ba.
>
> I haven't addressed the problem of a new command yet - for that I'd like to
> see some sort of consensus, so that I do not have to do all the related
> changes many times.

Well, looks like this patch-set is blocked on the bikeshedding part?

Somebody should call a shot here, then.

Michael

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Michael Banck <mbanck(at)gmx(dot)net>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-30 15:29:35
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-Jan-30, Michael Banck wrote:

> > I haven't addressed the problem of a new command yet - for that I'd like to
> > see some sort of consensus, so that I do not have to do all the related
> > changes many times.
>
> Well, looks like this patch-set is blocked on the bikeshedding part?
>
> Somebody should call a shot here, then.

A bunch of people discussed this patch in today's developer meeting in
Brussels. There's pretty much a consensus on using the verb REPACK
CONCURRENTLY for this new command -- where unadorned REPACK would be
VACUUM FULL, and we'd have something like REPACK WITH INDEX or maybe
REPACK USING INDEX to take the CLUSTER place.

For the record, there was an observation that 1) if logical decoding is
not enabled, REPACK CONCURRENTLY would not work, and 2) that sites being
forced to enable logical decoding (even if transiently) to allow this,
might take a considerable performance hit, and that we shouldn't
entangle our features in that way. I don't have an opinion on these
things at this point; knowing more about exactly what the performance
impact is would be good. Regarding logical decoding, the conversation
continued that maybe it'd be good if the feature can be automatically
enabled transiently for that particular table for as long as needed, and
disabled afterwards. But like with the previous concern, I don't really
have an opinion without understanding it more deeply.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-31 06:27:50
Message-ID:	10749.1738304870@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2025-Jan-30, Michael Banck wrote:
>
> > > I haven't addressed the problem of a new command yet - for that I'd like to
> > > see some sort of consensus, so that I do not have to do all the related
> > > changes many times.
> >
> > Well, looks like this patch-set is blocked on the bikeshedding part?
> >
> > Somebody should call a shot here, then.
>
> A bunch of people discussed this patch in today's developer meeting in
> Brussels. There's pretty much a consensus on using the verb REPACK
> CONCURRENTLY for this new command -- where unadorned REPACK would be
> VACUUM FULL, and we'd have something like REPACK WITH INDEX or maybe
> REPACK USING INDEX to take the CLUSTER place.

Thanks for discussing the patch.

I assume the patch should mark CLUSTER deprecated rather than removing it
immediately.

> For the record, there was an observation that 1) if logical decoding is
> not enabled, REPACK CONCURRENTLY would not work, and 2) that sites being
> forced to enable logical decoding (even if transiently) to allow this,
> might take a considerable performance hit, and that we shouldn't
> entangle our features in that way. I don't have an opinion on these
> things at this point; knowing more about exactly what the performance
> impact is would be good. Regarding logical decoding, the conversation
> continued that maybe it'd be good if the feature can be automatically
> enabled transiently for that particular table for as long as needed, and
> disabled afterwards. But like with the previous concern, I don't really
> have an opinion without understanding it more deeply.

Enabling the logical decoding transiently makes sense to me.

I also agree that tables not being REPACKed should be treated as not being
logically decoded, i.e. the logical decoding specific information should not
be written to WAL for them. Neither time nor energy should be wasted :-)

I'll try to implement these requirements the next version.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-31 09:32:00
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-Jan-31, Antonin Houska wrote:

> I assume the patch should mark CLUSTER deprecated rather than removing it
> immediately.

Yeah, we should certainly not make any statements fail that work today.
Same goes for VACUUM FULL.

> I also agree that tables not being REPACKed should be treated as not being
> logically decoded, i.e. the logical decoding specific information should not
> be written to WAL for them. Neither time nor energy should be wasted :-)

Cool.

Something that Robert Haas just mentioned to me is handling of row
locks: if concurrent transactions are keeping rows in the original table
locked (especially SELECT FOR KEY SHARE, since that's not considered by
logical decoding at present and it would be possible to break foreign
keys if we just do nothing), them we need these to be "transferred" to
the new table somehow.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"¿Qué importan los años? Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-31 10:15:24
Message-ID:	24725.1738318524@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2025-Jan-31, Antonin Houska wrote:
>
> Something that Robert Haas just mentioned to me is handling of row
> locks: if concurrent transactions are keeping rows in the original table
> locked (especially SELECT FOR KEY SHARE, since that's not considered by
> logical decoding at present and it would be possible to break foreign
> keys if we just do nothing), them we need these to be "transferred" to
> the new table somehow.

The current implementation acquires AccessExclusiveLock on the table
(supposedly for very short time) so it can swap the table and index
files. Once we have that lock, I think the transactions holding the row locks
should no longer be running. Or can the row lock "survive" the table lock
somehow?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-31 10:29:22
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-Jan-31, Antonin Houska wrote:

> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> > Something that Robert Haas just mentioned to me is handling of row
> > locks: if concurrent transactions are keeping rows in the original table
> > locked (especially SELECT FOR KEY SHARE, since that's not considered by
> > logical decoding at present and it would be possible to break foreign
> > keys if we just do nothing), them we need these to be "transferred" to
> > the new table somehow.
>
> The current implementation acquires AccessExclusiveLock on the table
> (supposedly for very short time) so it can swap the table and index
> files. Once we have that lock, I think the transactions holding the row locks
> should no longer be running. Or can the row lock "survive" the table lock
> somehow?

Oh right, I forgot about this step. That seems like it should be
sufficient to protect against that problem.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-31 10:38:57
Message-ID:	CAEze2Wjt40zdD+2+YTkJBxcFgAVURmvv1uP8wpF=tGLcWiEmfw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 30 Jan 2025 at 16:29, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2025-Jan-30, Michael Banck wrote:
>
> > > I haven't addressed the problem of a new command yet - for that I'd like to
> > > see some sort of consensus, so that I do not have to do all the related
> > > changes many times.
> >
> > Well, looks like this patch-set is blocked on the bikeshedding part?
> >
> > Somebody should call a shot here, then.
>
> A bunch of people discussed this patch in today's developer meeting in
> Brussels. There's pretty much a consensus on using the verb REPACK
> CONCURRENTLY for this new command -- where unadorned REPACK would be
> VACUUM FULL, and we'd have something like REPACK WITH INDEX or maybe
> REPACK USING INDEX to take the CLUSTER place.
>
> For the record, there was an observation that [...]

Further observations:

First, due to the XLog-based change detection this feature can't work
for unlogged tables without first changing them to logged (which
implies first writing the whole table to XLog, to not cause issues on
any replicas). However, documentation for this limitation seems to be
missing from the patches, and I hope a solution can be found without
requiring LOGGED.

Second, I'm concerned about long-running snapshots: While I've not
read the patches fully, I think they work something like the
following:

1. Mark some start LSN as start for decoding changes
2. Do the usual REPACK operations, but with reduced locking
3. Apply the decoded changes
4. Switch the relfilenodes over

For (2), I think the scan needs a snapshot to guarantee we keep the
original tuples of updates around, wich will hold back any other
VACUUM activity in the database. For CIC/RIC, a solution is being
created [0], but I'm not sure the same can be applied to this REPACK
CONCURRENTLY: while CIC/RIC doesn't care much about cross-page update
chains (it's only interested in TID+field values for possibly-live
tuples), REPACK seems to require access to the fields of the old
versions of updated tuples to correctly apply updates, thus requiring
a single snapshot for the full scan.

Maybe that's something that can be further improved upon, maybe not.
REPACK CONCURRENTLY is an improvement over the current situation
w.r.t. locks, but it'd be nice if this new system does not impact the
visibility horizons of the cluster by more than the current.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://www.postgresql.org/message-id/flat/CANtu0oiLc-%2B7h9zfzOVy2cv2UuYk_5MUReVLnVbOay6OgD_KGg%40mail.gmail.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-31 12:32:29
Message-ID:	26221.1738326749@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:

> Further observations:
>
> First, due to the XLog-based change detection this feature can't work
> for unlogged tables without first changing them to logged (which
> implies first writing the whole table to XLog, to not cause issues on
> any replicas). However, documentation for this limitation seems to be
> missing from the patches, and I hope a solution can be found without
> requiring LOGGED.

Currently I've got no idea how to handle UNLOGGED table. I'll at least fix the
documentation.

> Second, I'm concerned about long-running snapshots: While I've not
> read the patches fully, I think they work something like the
> following:
>
> 1. Mark some start LSN as start for decoding changes
> 2. Do the usual REPACK operations, but with reduced locking
> 3. Apply the decoded changes
> 4. Switch the relfilenodes over
>
> For (2), I think the scan needs a snapshot to guarantee we keep the
> original tuples of updates around, wich will hold back any other
> VACUUM activity in the database. For CIC/RIC, a solution is being
> created [0], but I'm not sure the same can be applied to this REPACK
> CONCURRENTLY: while CIC/RIC doesn't care much about cross-page update
> chains (it's only interested in TID+field values for possibly-live
> tuples), REPACK seems to require access to the fields of the old
> versions of updated tuples to correctly apply updates, thus requiring
> a single snapshot for the full scan.
>
> Maybe that's something that can be further improved upon, maybe not.
> REPACK CONCURRENTLY is an improvement over the current situation
> w.r.t. locks, but it'd be nice if this new system does not impact the
> visibility horizons of the cluster by more than the current.

A single snapshot is used because there is a single stream of decoded data
changes. Thus a new version of a tuple is either visible to the snapshot or it
appears in the stream, but not both.

If part of the table was scanned using one snapshot, and another part with
another one, it'd be difficult to "put things together". For example, if the
first scan does not see a tuple for which the corresponding stream contains an
UPDATE change (because the old version is in the not-yet-scanned part of the
table), that UPDATE needs to be moved to the stream associated with another
snapshot. But that snapshot might not see that tuple either because it was
either deleted in between, or should be found by yet another scan.

Doing the repacking in several steps might be interesting, but I admit I
haven't yet thought that far.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-31 13:34:48
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-Jan-31, Antonin Houska wrote:

> Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:

> > First, due to the XLog-based change detection this feature can't work
> > for unlogged tables without first changing them to logged (which
> > implies first writing the whole table to XLog, to not cause issues on
> > any replicas). However, documentation for this limitation seems to be
> > missing from the patches, and I hope a solution can be found without
> > requiring LOGGED.
>
> Currently I've got no idea how to handle UNLOGGED table. I'll at least fix the
> documentation.

Yeah, I think it should be possible, but it's going to require
complicated additional changes to support. I suggest that in the first
version we leave this out, and we can implement it afterwards.

> > For (2), I think the scan needs a snapshot to guarantee we keep the
> > original tuples of updates around, wich will hold back any other
> > VACUUM activity in the database.

> A single snapshot is used because there is a single stream of decoded data
> changes. Thus a new version of a tuple is either visible to the snapshot or it
> appears in the stream, but not both.

I agree with Matthias that this is going to be a problem. In fact, if
we need to keep the snapshot for long enough (depending on how long it
takes to scan the table), then the snapshot that it needs to keep would
disrupt vacuuming on all the other tables, causing more bloat. If it's
bad enough (say because the table is big enough to take hours to repack
and recreate the indexes on), the bloat situation might be worse after
REPACK has completed than it was before.

But -- again -- I think we need to limit the complexity of this patch,
or otherwise we're never going to get it done. So I propose that in our
first implementation we continue to use a single snapshot, and we can
try to find ways to grab fresh snapshots from time to time as a later
improvement on the patch. Customers in situations so bad that they
can't use REPACK to fix their bloat in 18, are already unable to fix it
in earlier versions, so this would not be a regression.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No tengo por qué estar de acuerdo con lo que pienso"
(Carlos Caszeli)

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-01-31 20:02:47
Message-ID:	26884.1738353767@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2025-Jan-31, Antonin Houska wrote:
>
> > Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:
>
> > > First, due to the XLog-based change detection this feature can't work
> > > for unlogged tables without first changing them to logged (which
> > > implies first writing the whole table to XLog, to not cause issues on
> > > any replicas). However, documentation for this limitation seems to be
> > > missing from the patches, and I hope a solution can be found without
> > > requiring LOGGED.
> >
> > Currently I've got no idea how to handle UNLOGGED table. I'll at least fix the
> > documentation.
>
> Yeah, I think it should be possible, but it's going to require
> complicated additional changes to support. I suggest that in the first
> version we leave this out, and we can implement it afterwards.
>
> > > For (2), I think the scan needs a snapshot to guarantee we keep the
> > > original tuples of updates around, wich will hold back any other
> > > VACUUM activity in the database.
>
> > A single snapshot is used because there is a single stream of decoded data
> > changes. Thus a new version of a tuple is either visible to the snapshot or it
> > appears in the stream, but not both.
>
> I agree with Matthias that this is going to be a problem. In fact, if
> we need to keep the snapshot for long enough (depending on how long it
> takes to scan the table), then the snapshot that it needs to keep would
> disrupt vacuuming on all the other tables, causing more bloat. If it's
> bad enough (say because the table is big enough to take hours to repack
> and recreate the indexes on), the bloat situation might be worse after
> REPACK has completed than it was before.
>
> But -- again -- I think we need to limit the complexity of this patch,
> or otherwise we're never going to get it done. So I propose that in our
> first implementation we continue to use a single snapshot, and we can
> try to find ways to grab fresh snapshots from time to time as a later
> improvement on the patch. Customers in situations so bad that they
> can't use REPACK to fix their bloat in 18, are already unable to fix it
> in earlier versions, so this would not be a regression.

I thought about it more during the afternoon. I think that in this case
(i.e. snapshot created by the logical replication system), the xmin horizon is
controlled by the xmin of the replication slot rather than that of the
snapshot. And I think that the slot we use for REPACK can have xmin set to
invalid (unlike catalog_xmin) as long as we ensure that (even "lazy") VACUUM
ignores table that is being processed by REPACK. In other words, REPACK does
not have to disrupt vacuuming of the other tables. Please correct me if I'm
wrong.

Since the current proposal of REPACK already stores the relation OID in the
shared memory (so that all backends know that they should write enough
information to WAL when doing changes in the table), disabling VACUUM for that
table should not be difficult.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-02-02 13:21:33
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> From bf2ec8c5d753de340140839f1b061044ec4c1149 Mon Sep 17 00:00:00 2001
> From: Antonin Houska <ah(at)cybertec(dot)at>
> Date: Mon, 13 Jan 2025 14:29:54 +0100
> Subject: [PATCH 4/8] Add CONCURRENTLY option to both VACUUM FULL and CLUSTER
> commands.

> @@ -950,8 +1412,46 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb

> + if (concurrent)
> + {
> + PgBackendProgress progress;
> +
> + /*
> + * Command progress reporting gets terminated at subtransaction
> + * end. Save the status so it can be eventually restored.
> + */
> + memcpy(&progress, &MyBEEntry->st_progress,
> + sizeof(PgBackendProgress));
> +
> + /* Release the locks by aborting the subtransaction. */
> + RollbackAndReleaseCurrentSubTransaction();
> +
> + /* Restore the progress reporting status. */
> + pgstat_progress_restore_state(&progress);
> +
> + CurrentResourceOwner = oldowner;
> + }

I was looking at 0002 to see if it'd make sense to commit it ahead of a
fuller review of the rest, and I find that the reason for that patch is
this hunk you have here in copy_table_data -- you want to avoid a
subtransaction abort (which you use to release planner lock) clobbering
the status. I think this a bad idea. It might be better to handle this
in a different way, for instance

1) maybe have a flag that says "do not reset progress status during
subtransaction abort"; REPACK would set that flag, so it'd be able to
continue its business without having to memcpy the current status (which
seems like quite a hack) or restoring it afterwards.

2) maybe subtransaction abort is not the best way to release the
planning locks anyway. I think it might be better to have a
ResourceOwner that owns those locks, and we do ResourceOwnerRelease()
which would release them. I think this would be a novel usage of
ResourceOwner so it needs more research. But if this works, then we
don't need the subtransaction at all, and therefore we don't need
backend progress restore at all either.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-02-03 07:01:45
Message-ID:	2879.1738566105@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

>
>
> > From bf2ec8c5d753de340140839f1b061044ec4c1149 Mon Sep 17 00:00:00 2001
> > From: Antonin Houska <ah(at)cybertec(dot)at>
> > Date: Mon, 13 Jan 2025 14:29:54 +0100
> > Subject: [PATCH 4/8] Add CONCURRENTLY option to both VACUUM FULL and CLUSTER
> > commands.
>
> > @@ -950,8 +1412,46 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
>
> > + if (concurrent)
> > + {
> > + PgBackendProgress progress;
> > +
> > + /*
> > + * Command progress reporting gets terminated at subtransaction
> > + * end. Save the status so it can be eventually restored.
> > + */
> > + memcpy(&progress, &MyBEEntry->st_progress,
> > + sizeof(PgBackendProgress));
> > +
> > + /* Release the locks by aborting the subtransaction. */
> > + RollbackAndReleaseCurrentSubTransaction();
> > +
> > + /* Restore the progress reporting status. */
> > + pgstat_progress_restore_state(&progress);
> > +
> > + CurrentResourceOwner = oldowner;
> > + }
>
> I was looking at 0002 to see if it'd make sense to commit it ahead of a
> fuller review of the rest, and I find that the reason for that patch is
> this hunk you have here in copy_table_data -- you want to avoid a
> subtransaction abort (which you use to release planner lock) clobbering
> the status. I think this a bad idea. It might be better to handle this
> in a different way, for instance
>
> 1) maybe have a flag that says "do not reset progress status during
> subtransaction abort"; REPACK would set that flag, so it'd be able to
> continue its business without having to memcpy the current status (which
> seems like quite a hack) or restoring it afterwards.
>
> 2) maybe subtransaction abort is not the best way to release the
> planning locks anyway. I think it might be better to have a
> ResourceOwner that owns those locks, and we do ResourceOwnerRelease()
> which would release them. I think this would be a novel usage of
> ResourceOwner so it needs more research. But if this works, then we
> don't need the subtransaction at all, and therefore we don't need
> backend progress restore at all either.

If this needs change, I prefer 2) because it's less invasive: 1) still affects
the progress monitoring code. I'll look at it.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-02-03 13:07:55
Message-ID:	20612.1738588075@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> >
> >
> > > From bf2ec8c5d753de340140839f1b061044ec4c1149 Mon Sep 17 00:00:00 2001
> > > From: Antonin Houska <ah(at)cybertec(dot)at>
> > > Date: Mon, 13 Jan 2025 14:29:54 +0100
> > > Subject: [PATCH 4/8] Add CONCURRENTLY option to both VACUUM FULL and CLUSTER
> > > commands.
> >
> > > @@ -950,8 +1412,46 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
> >
> > > + if (concurrent)
> > > + {
> > > + PgBackendProgress progress;
> > > +
> > > + /*
> > > + * Command progress reporting gets terminated at subtransaction
> > > + * end. Save the status so it can be eventually restored.
> > > + */
> > > + memcpy(&progress, &MyBEEntry->st_progress,
> > > + sizeof(PgBackendProgress));
> > > +
> > > + /* Release the locks by aborting the subtransaction. */
> > > + RollbackAndReleaseCurrentSubTransaction();
> > > +
> > > + /* Restore the progress reporting status. */
> > > + pgstat_progress_restore_state(&progress);
> > > +
> > > + CurrentResourceOwner = oldowner;
> > > + }
> >
> > I was looking at 0002 to see if it'd make sense to commit it ahead of a
> > fuller review of the rest, and I find that the reason for that patch is
> > this hunk you have here in copy_table_data -- you want to avoid a
> > subtransaction abort (which you use to release planner lock) clobbering
> > the status. I think this a bad idea. It might be better to handle this
> > in a different way, for instance
> >
> > 1) maybe have a flag that says "do not reset progress status during
> > subtransaction abort"; REPACK would set that flag, so it'd be able to
> > continue its business without having to memcpy the current status (which
> > seems like quite a hack) or restoring it afterwards.
> >
> > 2) maybe subtransaction abort is not the best way to release the
> > planning locks anyway. I think it might be better to have a
> > ResourceOwner that owns those locks, and we do ResourceOwnerRelease()
> > which would release them. I think this would be a novel usage of
> > ResourceOwner so it needs more research. But if this works, then we
> > don't need the subtransaction at all, and therefore we don't need
> > backend progress restore at all either.
>
> If this needs change, I prefer 2) because it's less invasive: 1) still affects
> the progress monitoring code. I'll look at it.

Below is what I suggest now. It resembles the use of PortalData.resowner in
the sense that it's a resource owner separate from the resource owner of the
transaction.

Although it's better to use a resource owner than a subtransaction here, we
still need to restore the progress state in
cluster_decode_concurrent_changes() (see v07-0004-) because a subtransaction
aborts that clear it can take place during the decoding.

My preference would still be to save and restore the progress state in this
case (although a new function like pgstat_progress_save_state() would be
better than memcpy()). What do you think?

@@ -950,8 +1412,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
* provided, else plain seqscan.
*/
if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+ {
+ ResourceOwner oldowner = NULL;
+ ResourceOwner resowner = NULL;
+
+ /*
+ * In the CONCURRENT case, use a dedicated resource owner so we don't
+ * leave any additional locks behind us that we cannot release easily.
+ */
+ if (concurrent)
+ {
+ Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+ false));
+ Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+ false));
+
+ resowner = ResourceOwnerCreate(CurrentResourceOwner,
+ "plan_cluster_use_sort");
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = resowner;
+ }
+
use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
RelationGetRelid(OldIndex));
+
+ if (concurrent)
+ {
+ CurrentResourceOwner = oldowner;
+
+ /*
+ * We are primarily concerned about locks, but if the planner
+ * happened to allocate any other resources, we should release
+ * them too because we're going to delete the whole resowner.
+ */
+ ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+ false, false);
+ ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+ false, false);
+ ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+ false, false);
+ ResourceOwnerDelete(resowner);
+ }
+ }
else
use_sort = false;

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Michael Banck <mbanck(at)gmx(dot)net>, Antonin Houska <ah(at)cybertec(dot)at>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-02-03 20:27:07
Message-ID:	Z6Emm1Sj8Xz1hoFO@nathan
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 30, 2025 at 04:29:35PM +0100, Alvaro Herrera wrote:
> A bunch of people discussed this patch in today's developer meeting in
> Brussels. There's pretty much a consensus on using the verb REPACK
> CONCURRENTLY for this new command -- where unadorned REPACK would be
> VACUUM FULL, and we'd have something like REPACK WITH INDEX or maybe
> REPACK USING INDEX to take the CLUSTER place.

One small thing I thought of after the meeting was that this effectively
forces users to always specify an index if they want to REPACK WITH INDEX.
Today, CLUSTER will use the same index as before if one is not specified.
IMHO requiring users to specify the index is entirely reasonable, but I
figured I'd at least note the behavior change.

--
nathan

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-02-19 17:08:02
Message-ID:	63871.1739984882@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

This is a patch that adds the REPACK command (w/o CONCURRENTLY). I'll
incorporate it into the patch series but it'd be great if this part was a
little bit stable before I start to rebase the depending patches. Thanks.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte,
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are
confidential and may be privileged or otherwise protected from disclosure
and solely for the use of the person(s) or entity to whom it is intended.
If you have received this message in error and are not the intended
recipient, please notify the sender immediately and delete this message and
any attachment from your system. If you are not the intended recipient, be
advised that any use of this message is prohibited and may be unlawful, and
you must not copy this message or attachment or disclose the contents to
any other person.

Attachment	Content-Type	Size
0001-Add-REPACK-command.patch	text/x-diff	89.7 KB

From:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-02-19 17:40:32
Message-ID:	CAB-JLwYSy+oquVkiwubhqs-rHfjvF6p5PRmQHRg4_1vtaL_E7A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

You mentioned fillfactor only on cluster notes, would be good to mention it
on refsynopsisdiv, I think.

+ <para>
+ <command>REPACK</command> reclaims storage occupied by dead
+ tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+ entire contents of the table specified
+ by <replaceable class="parameter">table_name</replaceable> into a new
disk

- file with no extra space, allowing unused space to be returned to the
+ file with no extra space, obeying fillfactor settings, allowing unused
space to be returned to the

+ operating
+ system.
+ </para>

regards
Marcos

Em qua., 19 de fev. de 2025 às 14:08, Antonin Houska <ah(at)cybertec(dot)at>
escreveu:

> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> > On 2025-Jan-30, Michael Banck wrote:
> >
> > > > I haven't addressed the problem of a new command yet - for that I'd
> like to
> > > > see some sort of consensus, so that I do not have to do all the
> related
> > > > changes many times.
> > >
> > > Well, looks like this patch-set is blocked on the bikeshedding part?
> > >
> > > Somebody should call a shot here, then.
> >
> > A bunch of people discussed this patch in today's developer meeting in
> > Brussels. There's pretty much a consensus on using the verb REPACK
> > CONCURRENTLY for this new command -- where unadorned REPACK would be
> > VACUUM FULL, and we'd have something like REPACK WITH INDEX or maybe
> > REPACK USING INDEX to take the CLUSTER place.
>
> This is a patch that adds the REPACK command (w/o CONCURRENTLY). I'll
> incorporate it into the patch series but it'd be great if this part was a
> little bit stable before I start to rebase the depending patches. Thanks.
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>
>
> --
>
> *E-Mail Disclaimer*
> Der Inhalt dieser E-Mail ist ausschliesslich fuer den
> bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
> dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte,
> dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung
> oder
> Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich
> in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.
>
> *CONFIDENTIALITY NOTICE & DISCLAIMER
> *This message and any attachment are
> confidential and may be privileged or otherwise protected from disclosure
> and solely for the use of the person(s) or entity to whom it is intended.
> If you have received this message in error and are not the intended
> recipient, please notify the sender immediately and delete this message
> and
> any attachment from your system. If you are not the intended recipient, be
> advised that any use of this message is prohibited and may be unlawful,
> and
> you must not copy this message or attachment or disclose the contents to
> any other person.
>

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-02-26 08:48:08
Message-ID:	127361.1740559688@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> wrote:

> You mentioned fillfactor only on cluster notes, would be good to mention it
> on refsynopsisdiv, I think.

ok, I've added a note to the first paragraph.

Attached here is the REPACK command as well as the patch set that adds the
CONCURRENTLY option. The new symbols have been renamed so they resemble REPACK
rather than CLUSTER.

Please note that 0008 is a new part which makes the setting wal_leve=logical
unnecessary.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v08-0001-Add-REPACK-command.patch	text/x-diff	89.7 KB
v08-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch	text/x-diff	7.8 KB
v08-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v08-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch	text/plain	168.4 KB
v08-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	39.6 KB
v08-0006-Add-regression-tests.patch	text/x-diff	10.6 KB
v08-0007-Introduce-repack_max_xlock_time-configuration-variab.patch	text/x-diff	20.3 KB
v08-0008-Enable-logical-decoding-transiently-only-for-REPACK-.patch	text/x-diff	24.2 KB
v08-0009-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	26.1 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-03 18:07:00
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-Feb-26, Antonin Houska wrote:

> @@ -403,39 +381,38 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
> * would work in most respects, but the index would only get marked as
> * indisclustered in the current database, leading to unexpected behavior
> * if CLUSTER were later invoked in another database.
> + *
> + * REPACK does not set indisclustered. XXX Not sure I understand the
> + * comment above: how can an attribute be set "only in the current
> + * database"?
> */

Regarding this XXX comment, what's going on here is this: a CLUSTER
command needs to remember the index that a table is clustered on. We
keep track of this in pg_index.indisclustered. But pg_index is a local
relation, not shared across databases -- so the current CLUSTER command
can effect the update on the current database's pg_index only, not on
other databases. So if the user were to run CLUSTER on one database
specifying an index, then connect to another one and expect CLUSTER
without specifying an index to honor the previously specified index,
that would not work. Naturally this is only a problem for shared
catalogs. Not being able to handle this for shared catalogs is not a
big loss.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-03 18:40:27
Message-ID:	49792.1741027227@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2025-Feb-26, Antonin Houska wrote:
>
> > @@ -403,39 +381,38 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
> > * would work in most respects, but the index would only get marked as
> > * indisclustered in the current database, leading to unexpected behavior
> > * if CLUSTER were later invoked in another database.
> > + *
> > + * REPACK does not set indisclustered. XXX Not sure I understand the
> > + * comment above: how can an attribute be set "only in the current
> > + * database"?
> > */
>
> Regarding this XXX comment, what's going on here is this: a CLUSTER
> command needs to remember the index that a table is clustered on. We
> keep track of this in pg_index.indisclustered. But pg_index is a local
> relation, not shared across databases -- so the current CLUSTER command
> can effect the update on the current database's pg_index only, not on
> other databases. So if the user were to run CLUSTER on one database
> specifying an index, then connect to another one and expect CLUSTER
> without specifying an index to honor the previously specified index,
> that would not work. Naturally this is only a problem for shared
> catalogs. Not being able to handle this for shared catalogs is not a
> big loss.

Thanks for explanation. The reason I failed to understand this was probably
that I tried to imagine something worse.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-20 17:32:02
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

I rebased this patch series; here's v09. No substantive changes from v08.
I made sure the tree still compiles after each commit.

I did look at 0002 again (and renamed the members of the new struct by
adding a p_ prefix, as well as fixing the references to the old names
that were in a few code comments here and there; I don't think these
changes are "substantive"), and ended up wondering why do we need that
change in the first place. According to the comment where the progress
restore function is called, it's because reorderbuffer.c uses a
subtransaction internally. But I went to look at reorderbuffer.c and
noticed that the subtransaction is only used "when using the SQL
function interface, because that creates a transaction already". So
maybe we should look into making REPACK use reorderbuffer without having
to open a transaction block.

I didn't do anything about that, in particular I didn't actually try to
run REPACK to see whether the transaction is needed. I'll be looking at
that in the next couple of days.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Once again, thank you and all of the developers for your hard work on
PostgreSQL. This is by far the most pleasant management experience of
any database I've worked on." (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php

Attachment	Content-Type	Size
v09-0001-Add-REPACK-command.patch	text/x-diff	89.8 KB
v09-0002-Move-progress-related-fields-from-PgBackendStatu.patch	text/x-diff	9.9 KB
v09-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patch	text/x-diff	5.4 KB
v09-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch	text/x-diff	166.8 KB
v09-0005-Preserve-visibility-information-of-the-concurren.patch	text/x-diff	39.6 KB
v09-0006-Add-regression-tests.patch	text/x-diff	10.6 KB
v09-0007-Introduce-repack_max_xlock_time-configuration-va.patch	text/x-diff	20.4 KB
v09-0008-Enable-logical-decoding-transiently-only-for-REP.patch	text/x-diff	24.2 KB
v09-0009-Call-logical_rewrite_heap_tuple-when-applying-co.patch	text/x-diff	26.1 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-20 17:45:59
Message-ID:	CA+TgmobtxK1sNDULBJjNRF3bb3NPA_kwQK=ZctUf3i-3gkt1Sw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 20, 2025 at 1:32 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> I rebased this patch series; here's v09. No substantive changes from v08.
> I made sure the tree still compiles after each commit.
>
> I did look at 0002 again (and renamed the members of the new struct by
> adding a p_ prefix, as well as fixing the references to the old names
> that were in a few code comments here and there; I don't think these
> changes are "substantive"), and ended up wondering why do we need that
> change in the first place. According to the comment where the progress
> restore function is called, it's because reorderbuffer.c uses a
> subtransaction internally. But I went to look at reorderbuffer.c and
> noticed that the subtransaction is only used "when using the SQL
> function interface, because that creates a transaction already". So
> maybe we should look into making REPACK use reorderbuffer without having
> to open a transaction block.
>
> I didn't do anything about that, in particular I didn't actually try to
> run REPACK to see whether the transaction is needed. I'll be looking at
> that in the next couple of days.

Is there a README or a long comment in here someplace that is a good
place to read to understand the overall design of this feature?

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-20 18:09:53
Message-ID:	128545.1742494193@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Is there a README or a long comment in here someplace that is a good
> place to read to understand the overall design of this feature?

I tried to explain how it works in the commit messages. The one in 0004 is
probably the most important one.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-20 18:32:49
Message-ID:	CA+TgmobUZ0g==SZv-OSFCQTGFPis5Qz1UsiMn18HGOWzsiyOLQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 20, 2025 at 2:09 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Is there a README or a long comment in here someplace that is a good
> > place to read to understand the overall design of this feature?
>
> I tried to explain how it works in the commit messages. The one in 0004 is
> probably the most important one.

Thanks. A couple of comments/questions:

- I don't understand why this commit message seems to think that we
can't acquire a stronger lock while already holding a weaker one. We
can do that, and in some cases we do precisely that. Such locking
patterns can result in deadlock e.g. if I take AccessShareLock and you
take AccessShareLock and then I tried to upgrade to
AccessExclusiveLock and then you try to upgrade to
AccessExclusiveLock, somebody is going to have to ERROR out. But that
doesn't keep us from doing that in some places where it seems better
than the alternatives, and the alternative chosen by the patch
(possibly discovering at the very end that all our work has been in
vain) does not seem better than risking a deadlock.

- On what basis do you make the statement in the last paragraph that
the decoding-related lag should not exceed one WAL segment? I guess
logical decoding probably keeps up pretty well most of the time but
this seems like a very strong guarantee for something I didn't know we
had any kind of guarantee about.

- What happens if we crash?

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-22 09:43:29
Message-ID:	7221.1742636609@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Mar 20, 2025 at 2:09 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > Is there a README or a long comment in here someplace that is a good
> > > place to read to understand the overall design of this feature?
> >
> > I tried to explain how it works in the commit messages. The one in 0004 is
> > probably the most important one.
>
> Thanks. A couple of comments/questions:
>
> - I don't understand why this commit message seems to think that we
> can't acquire a stronger lock while already holding a weaker one. We
> can do that, and in some cases we do precisely that.

Can you please give me an example? I don't recall seeing a lock upgrade in the
tree. That's the reason I tried rather hard to avoid that.

> Such locking
> patterns can result in deadlock e.g. if I take AccessShareLock and you
> take AccessShareLock and then I tried to upgrade to
> AccessExclusiveLock and then you try to upgrade to
> AccessExclusiveLock, somebody is going to have to ERROR out. But that
> doesn't keep us from doing that in some places where it seems better
> than the alternatives, and the alternative chosen by the patch
> (possibly discovering at the very end that all our work has been in
> vain) does not seem better than risking a deadlock.

I see. Only the backends that do upgrade their lock are exposed to the risk of
deadlock, e.g. two backends running REPACK CONCURRENTLY on the same table, and
that should not happen too often.

I'll consider your objection - it should make the patch a bit simpler.

> - On what basis do you make the statement in the last paragraph that
> the decoding-related lag should not exceed one WAL segment? I guess
> logical decoding probably keeps up pretty well most of the time but
> this seems like a very strong guarantee for something I didn't know we
> had any kind of guarantee about.

The patch itself does guarantee that by checking the amount of unprocessed WAL
regularly when it's copying the data into the new table. If too much WAL
appears to be unprocessed, it enforces the decoding before the copying is
resumed.

The WAL decoding during the "initial load" phase can actually be handled by a
background worker (not sure it's necessary in the initial implementation),
which would make a significant lag rather unlikely. But even then we should
probably enforce certain limit on the lag (e.g. because background worker is
not guaranteed to start).

> - What happens if we crash?

The replication slot we create is RS_TEMPORARY, so it disappears after
restart. Everything else is as if the current implementation of CLUSTER ends
due to crash.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-22 10:14:14
Message-ID:	9622.1742638454@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> I rebased this patch series; here's v09. No substantive changes from v08.
> I made sure the tree still compiles after each commit.

Thanks.

> I did look at 0002 again (and renamed the members of the new struct by
> adding a p_ prefix, as well as fixing the references to the old names
> that were in a few code comments here and there; I don't think these
> changes are "substantive"), and ended up wondering why do we need that
> change in the first place. According to the comment where the progress
> restore function is called, it's because reorderbuffer.c uses a
> subtransaction internally. But I went to look at reorderbuffer.c and
> noticed that the subtransaction is only used "when using the SQL
> function interface, because that creates a transaction already". So
> maybe we should look into making REPACK use reorderbuffer without having
> to open a transaction block.

Which part of reorderbuffer.c do you mean? ISTM that the use of subransactions
is more extensive. At least ReorderBufferImmediateInvalidation() appears to
rely on it, which in turn is called by xact_decode().

(I don't claim that saving and restoring the progress state is perfect, but I
don't have better idea right now.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-25 14:32:46
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-Mar-22, Antonin Houska wrote:

> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> > I rebased this patch series; here's v09. No substantive changes from v08.
> > I made sure the tree still compiles after each commit.

I rebased again, fixing a compiler warning reported by CI and applying
pgindent to each individual patch. I'm slowly starting to become more
familiar with the whole of this new code.

> > I did look at 0002 again [...], and ended up wondering why do we need that
> > change in the first place. According to the comment where the progress
> > restore function is called, it's because reorderbuffer.c uses a
> > subtransaction internally. But I went to look at reorderbuffer.c and
> > noticed that the subtransaction is only used "when using the SQL
> > function interface, because that creates a transaction already". So
> > maybe we should look into making REPACK use reorderbuffer without having
> > to open a transaction block.
>
> Which part of reorderbuffer.c do you mean? ISTM that the use of subransactions
> is more extensive. At least ReorderBufferImmediateInvalidation() appears to
> rely on it, which in turn is called by xact_decode().

Ah, right, I was not looking hard enough. Something to keep in mind --
though I'm still not convinced that it's best to achieve this by
introducng a mechanism to restore progress state. Maybe allowing a
transaction to abort without clobbering the progress state somehow (not
trivial to implement at present though, because of layers of functions
you need to traverse with such a flag; maybe have a global in xact.c
that you set by calling a function? Not sure -- might be worse.) Not a
super critical consideration, but this point prevents me from pushing
patch 0002 here, as it may turn out that it's not needed.

But nothing prevents me from pushing 0003, so I'll see about doing that
soon, unless I see some other problem.

I also noticed that CI is complaining of a problem in Windows, which is
easily reproducible in non-Windows by defining EXEC_BACKEND. The
backtrace is this:

#0 0x000055d4fc24fe96 in hash_search (hashp=0x5606dc2a8c88, keyPtr=0x7ffeab341928, action=HASH_FIND, foundPtr=0x0)
at ../../../../../../../../../pgsql/source/master/src/backend/utils/hash/dynahash.c:960
960 return hash_search_with_hash_value(hashp,
(gdb) bt
#0 0x000055d4fc24fe96 in hash_search (hashp=0x5606dc2a8c88, keyPtr=0x7ffeab341928, action=HASH_FIND, foundPtr=0x0)
at ../../../../../../../../../pgsql/source/master/src/backend/utils/hash/dynahash.c:960
#1 0x000055d4fbea0a46 in is_concurrent_repack_in_progress (relid=21973)
at ../../../../../../../../pgsql/source/master/src/backend/commands/cluster.c:2729
#2 is_concurrent_repack_in_progress (relid=relid(at)entry=2964)
at ../../../../../../../../pgsql/source/master/src/backend/commands/cluster.c:2706
#3 0x000055d4fc237a87 in RelationBuildDesc (targetRelId=2964, insertIt=insertIt(at)entry=true)
at ../../../../../../../../../pgsql/source/master/src/backend/utils/cache/relcache.c:1257
#4 0x000055d4fc239456 in RelationIdGetRelation (relationId=<optimized out>, relationId(at)entry=2964)
at ../../../../../../../../../pgsql/source/master/src/backend/utils/cache/relcache.c:2105

So apparently we're trying to dereference a hash table which isn't
properly set up in the child process.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos. Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)

Attachment	Content-Type	Size
v10-0001-Add-REPACK-command.patch	text/x-diff	89.8 KB
v10-0002-Move-progress-related-fields-from-PgBackendStatu.patch	text/x-diff	10.3 KB
v10-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patch	text/x-diff	5.4 KB
v10-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch	text/x-diff	167.6 KB
v10-0005-Preserve-visibility-information-of-the-concurren.patch	text/x-diff	39.7 KB
v10-0006-Add-regression-tests.patch	text/x-diff	10.6 KB
v10-0007-Introduce-repack_max_xlock_time-configuration-va.patch	text/x-diff	20.4 KB
v10-0008-Enable-logical-decoding-transiently-only-for-REP.patch	text/x-diff	24.7 KB
v10-0009-Call-logical_rewrite_heap_tuple-when-applying-co.patch	text/x-diff	26.1 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-25 18:47:52
Message-ID:	CA+TgmoY2YtMZQ+OeDz8_crFou132nPSGa+6=x7QHsKMK46tZ2A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 22, 2025 at 5:43 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Can you please give me an example? I don't recall seeing a lock upgrade in the
> tree. That's the reason I tried rather hard to avoid that.

VACUUM has to upgrade the lock in order to truncate away pages at the
end of the table.

Or just:
BEGIN;
SELECT * FROM sometable;
VACUUM FULL sometable;
COMMIT;

I don't think we should commit something that handles locking the way
this patch does. I mean, it would be one thing if you had a strategy
for avoiding erroring out when a deadlock would otherwise occur by
doing something clever. But it seems like you would just need to
detect the same problem in a different way. Doing something
non-standard doesn't make sense unless we get a clear benefit from it.
(Even then it might be unsafe, of course, but at least then you have a
motivation to take the risk.)

> > - On what basis do you make the statement in the last paragraph that
> > the decoding-related lag should not exceed one WAL segment? I guess
> > logical decoding probably keeps up pretty well most of the time but
> > this seems like a very strong guarantee for something I didn't know we
> > had any kind of guarantee about.
>
> The patch itself does guarantee that by checking the amount of unprocessed WAL
> regularly when it's copying the data into the new table. If too much WAL
> appears to be unprocessed, it enforces the decoding before the copying is
> resumed.

Hmm. If the source table is not locked against writes, it seems like
we could always get into a situation where this doesn't converge --
you just need to modify the table faster than those changes can be
decoded and applied. Maybe that's different from what we're talking
about here, though.

> > - What happens if we crash?
>
> The replication slot we create is RS_TEMPORARY, so it disappears after
> restart. Everything else is as if the current implementation of CLUSTER ends
> due to crash.

Cool.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-26 11:39:39
Message-ID:	80297.1742989179@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2025-Mar-22, Antonin Houska wrote:
>
> > Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > > I rebased this patch series; here's v09. No substantive changes from v08.
> > > I made sure the tree still compiles after each commit.
>
> I rebased again, fixing a compiler warning reported by CI and applying
> pgindent to each individual patch. I'm slowly starting to become more
> familiar with the whole of this new code.

Thanks.

> > > I did look at 0002 again [...], and ended up wondering why do we need that
> > > change in the first place. According to the comment where the progress
> > > restore function is called, it's because reorderbuffer.c uses a
> > > subtransaction internally. But I went to look at reorderbuffer.c and
> > > noticed that the subtransaction is only used "when using the SQL
> > > function interface, because that creates a transaction already". So
> > > maybe we should look into making REPACK use reorderbuffer without having
> > > to open a transaction block.
> >
> > Which part of reorderbuffer.c do you mean? ISTM that the use of subransactions
> > is more extensive. At least ReorderBufferImmediateInvalidation() appears to
> > rely on it, which in turn is called by xact_decode().
>
> Ah, right, I was not looking hard enough. Something to keep in mind --
> though I'm still not convinced that it's best to achieve this by
> introducng a mechanism to restore progress state. Maybe allowing a
> transaction to abort without clobbering the progress state somehow (not
> trivial to implement at present though, because of layers of functions
> you need to traverse with such a flag; maybe have a global in xact.c
> that you set by calling a function? Not sure -- might be worse.)

The problem seems to be specific to the use of BeginInternalSubTransaction():
if it is not called at given code paths, it means that there is no top-level
transaction. However REPACK CONCURRENTLY should always run in a transaction,
so it should always run BeginInternalSubTransaction(). Thus I think we can use
this function to set the flag.

> I also noticed that CI is complaining of a problem in Windows, which is
> easily reproducible in non-Windows by defining EXEC_BACKEND. The
> backtrace is this:
>
> #0 0x000055d4fc24fe96 in hash_search (hashp=0x5606dc2a8c88, keyPtr=0x7ffeab341928, action=HASH_FIND, foundPtr=0x0)
> at ../../../../../../../../../pgsql/source/master/src/backend/utils/hash/dynahash.c:960
> 960 return hash_search_with_hash_value(hashp,
> (gdb) bt
> #0 0x000055d4fc24fe96 in hash_search (hashp=0x5606dc2a8c88, keyPtr=0x7ffeab341928, action=HASH_FIND, foundPtr=0x0)
> at ../../../../../../../../../pgsql/source/master/src/backend/utils/hash/dynahash.c:960
> #1 0x000055d4fbea0a46 in is_concurrent_repack_in_progress (relid=21973)
> at ../../../../../../../../pgsql/source/master/src/backend/commands/cluster.c:2729
> #2 is_concurrent_repack_in_progress (relid=relid(at)entry=2964)
> at ../../../../../../../../pgsql/source/master/src/backend/commands/cluster.c:2706
> #3 0x000055d4fc237a87 in RelationBuildDesc (targetRelId=2964, insertIt=insertIt(at)entry=true)
> at ../../../../../../../../../pgsql/source/master/src/backend/utils/cache/relcache.c:1257
> #4 0x000055d4fc239456 in RelationIdGetRelation (relationId=<optimized out>, relationId(at)entry=2964)
> at ../../../../../../../../../pgsql/source/master/src/backend/utils/cache/relcache.c:2105
>
>
> So apparently we're trying to dereference a hash table which isn't
> properly set up in the child process.

I'll fix that.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-27 07:20:30
Message-ID:	6059.1743060030@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

I'm trying to reflect Robert's suggestions about locking [1]. The next version
should be a bit simpler, so maybe wait for it before you continue studying the
code.

[1] https://www.postgresql.org/message-id/CA%2BTgmobUZ0g%3D%3DSZv-OSFCQTGFPis5Qz1UsiMn18HGOWzsiyOLQ%40mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-03-31 14:26:25
Message-ID:	104580.1743431185@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> > On 2025-Mar-22, Antonin Houska wrote:
> >
> > > Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > >
> > > > I rebased this patch series; here's v09. No substantive changes from v08.
> > > > I made sure the tree still compiles after each commit.
> >
> > I rebased again, fixing a compiler warning reported by CI and applying
> > pgindent to each individual patch. I'm slowly starting to become more
> > familiar with the whole of this new code.
>
> I'm trying to reflect Robert's suggestions about locking [1]. The next version
> should be a bit simpler, so maybe wait for it before you continue studying the
> code.

This is it. A few notes:

* Since there's no unlocking during the processing now, the code to check for
catalog changes was removed.

* Some code moved from 0004 to 0005, where it fits better. (Also the commit
messages of these two parts elaborated a bit more.)

* Regarding the progress monitoring, I do in 0004 what I proposed in [1]:
BeginInternalSubTransaction() now sets a flag that avoids clearing of the
progress state in AbortSubTransaction()

* Although I consider 0005 (Preserve visibility information of the concurrent
data changes.) important, it occurs to me now that it might introduce quite
some overhead.

* 0003 is new in the series. I thought it'll be needed, then I realized it's
not. It might be useful as refactoring though. Please let me know if I
should maintain it or drop it.

[1] https://www.postgresql.org/message-id/80297.1742989179%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v11-0001-Add-REPACK-command.patch	text/x-diff	89.8 KB
v11-0002-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v11-0003-Move-the-recheck-branch-to-a-separate-function.patch	text/x-diff	4.8 KB
v11-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch	text/plain	137.0 KB
v11-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	57.3 KB
v11-0006-Add-regression-tests.patch	text/x-diff	10.6 KB
v11-0007-Introduce-repack_max_xlock_time-configuration-variab.patch	text/x-diff	20.4 KB
v11-0008-Enable-logical-decoding-transiently-only-for-REPACK-.patch	text/x-diff	22.6 KB
v11-0009-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	26.0 KB

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-01 13:31:31
Message-ID:	178741.1743514291@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> > Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > > On 2025-Mar-22, Antonin Houska wrote:
> > >
> > > > Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > > >
> > > > > I rebased this patch series; here's v09. No substantive changes from v08.
> > > > > I made sure the tree still compiles after each commit.
> > >
> > > I rebased again, fixing a compiler warning reported by CI and applying
> > > pgindent to each individual patch. I'm slowly starting to become more
> > > familiar with the whole of this new code.
> >
> > I'm trying to reflect Robert's suggestions about locking [1]. The next version
> > should be a bit simpler, so maybe wait for it before you continue studying the
> > code.
>
> This is it.

One more version, hopefully to make cfbot happy (I missed the bug because I
did not set the RELCACHE_FORCE_RELEASE macro in my environment.)

Besides that, it occurred to me that 0005 ("Preserve visibility information of
the concurrent data changes.") will probably introduce significant
overhead. The problem is that the table we're repacking is treated like a
catalog, for reorderbuffer.c to generate snapshots that we need to replay
UPDATE / DELETE commands on the new table.

contrib/test_decoding can be used to demonstrate the difference between
ordinary and catalog tables:

CREATE TABLE t(i int);
SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');
INSERT INTO t SELECT n FROM generate_series(1, 1000000) g(n);
DELETE FROM t;
EXPLAIN ANALYZE SELECT * FROM pg_logical_slot_get_binary_changes('test_slot', null, null);
...
Execution Time: 3521.190 ms

ALTER TABLE t SET (user_catalog_table = true);

INSERT INTO t SELECT n FROM generate_series(1, 1000000) g(n);
DELETE FROM t;
EXPLAIN ANALYZE SELECT * FROM pg_logical_slot_get_binary_changes('test_slot', null, null);
...
Execution Time: 6561.634 ms

I wanted to avoid the "MVCC unsafety" [1], so that both REPACK and REPACK
CONCURRENTLY both work "cleanly". We can try to optimize the logical decoding
for REPACK CONCURRENTLY, or implement 0005 in a different way, but not sure
how much effort that would require. Or implement REPACK CONCURRENTLY as
MVCC-unsafe for now? (The pg_squeeze extension also is not MVCC-safe.)

[1] https://www.postgresql.org/docs/17/mvcc-caveats.html

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v12-0001-Add-REPACK-command.patch	text/x-diff	89.8 KB
v12-0002-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v12-0003-Move-the-recheck-branch-to-a-separate-function.patch	text/x-diff	4.8 KB
v12-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch	text/plain	137.1 KB
v12-0005-Preserve-visibility-information-of-the-concurrent-da.patch	text/x-diff	57.3 KB
v12-0006-Add-regression-tests.patch	text/x-diff	10.6 KB
v12-0007-Introduce-repack_max_xlock_time-configuration-variab.patch	text/x-diff	20.4 KB
v12-0008-Enable-logical-decoding-transiently-only-for-REPACK-.patch	text/x-diff	22.6 KB
v12-0009-Call-logical_rewrite_heap_tuple-when-applying-concur.patch	text/x-diff	26.0 KB

From:	"Euler Taveira" <euler(at)eulerto(dot)com>
To:	"Antonin Houska" <ah(at)cybertec(dot)at>, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Marcos Pegoraro" <marcos(at)f10(dot)com(dot)br>, "Michael Banck" <mbanck(at)gmx(dot)net>, "Junwang Zhao" <zhjwpku(at)gmail(dot)com>, "Kirill Reshke" <reshkekirill(at)gmail(dot)com>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-03 14:38:04
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 1, 2025, at 10:31 AM, Antonin Houska wrote:
> One more version, hopefully to make cfbot happy (I missed the bug because I
> did not set the RELCACHE_FORCE_RELEASE macro in my environment.)

I started reviewing this patch. It was in my radar to review it but I didn't
have spare time until now.

The syntax is fine for me. It is really close to CLUSTER syntax but it adds one
detail: INDEX keyword. It is a good approach because USING clause can be
expanded in the future if required.

+ <refnamediv>
+ <refname>REPACK</refname>
+ <refpurpose>cluster a table according to an index</refpurpose>
+ </refnamediv>

This description is not accurate because the index is optional. It means if the
index is not specified, it is a "rewrite" instead of a "cluster". One
suggestion is to use "rewrite" because it says nothing about the tuple order. A
"rewrite a table" or "rewrite a table to reclaim space" are good candidates. On
the other hand, the command is called "repack" and it should be a strong
candidate for the verb in this description. (I'm surprised that repack is not a
recent term [1]). It seems a natural choice.

+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX<replaceable class="parameter">index_name</replaceable> ] ]

You missed a space after INDEX.

+ <para>
+ Because the planner records statistics about the ordering of tables, it is
+ advisable to
+ run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+ newly repacked table. Otherwise, the planner might make poor choices of
+ query plans.
+ </para>

If we decide for another term (other than "repacked") then it should reflect
here and in some other parts ("repacking" is also used) too.

+ <para>
+ When an index scan or a sequential scan without sort is used, a temporary
+ copy of the table is created that contains the table data in the index
+ order.

That's true for CLUSTER but not for REPACK. Index is optional.

+ Prints a progress report as each table is clustered
+ at <literal>INFO</literal> level.

s/clustered/repacked/?

+ Repacking a partitioned table repacks each of its partitions. If an index
+ is specified, each partition is clustered using the partition of that
+ index. <command>REPACK</command> on a partitioned table cannot be executed
+ inside a transaction block.

Ditto.

+ <para>
+ Cluster the table <literal>employees</literal> on the basis of its
+ index <literal>employees_ind</literal>:
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+ </para>

It sounds strange to use "Repack" in the other examples but this one it says
"Cluster". Let's use the same terminology.

+
+ <warning>
+ <para>
+ The <command>FULL</command> parameter is deprecated in favor of
+ <xref linkend="sql-repack"/>.
+ </para>
+ </warning>
+

The warnings, notes, and tips are usually placed *after* the description.

--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,

Is it worth to rename table_relation_copy_for_cluster() and
heapam_relation_copy_for_cluster() to replace cluster with repack?

+ SELECT
+ S.pid AS pid,
+ S.datid AS datid,
+ D.datname AS datname,
+ S.relid AS relid,
+ CASE S.param1 WHEN 1 THEN 'REPACK'
+ END AS command,

Do you really need command? IIUC REPACK is the only command that will used by
this view. There is no need to differentiate commands here.

+ *
+ * 'cmd' indicates which commands is being executed. REPACK should be the only
+ * caller of this function in the future.

command.

+ *
+ * REPACK does not set indisclustered. XXX Not sure I understand the
+ * comment above: how can an attribute be set "only in the current
+ * database"?
*/

pg_index is a local catalog. To be consistent while clustering a shared
catalog, it should set indisclustered in all existent databases because in each
pg_index table there is a tuple for the referred index. As the comment says it
is not possible.

euler=# select relname, relkind, pg_relation_filepath(oid) from pg_class where relname = 'pg_index';
relname | relkind | pg_relation_filepath
----------+---------+----------------------
pg_index | r | base/16424/2610
(1 row)

euler=# select indexrelid::regclass, indexrelid::regclass, indisclustered from pg_index where indrelid = 'pg_database'::regclass;
indexrelid | indexrelid | indisclustered
---------------------------+---------------------------+----------------
pg_database_datname_index | pg_database_datname_index | f
pg_database_oid_index | pg_database_oid_index | f
(2 rows)

euler=# \c postgres
You are now connected to database "postgres" as user "euler".
postgres=# select relname, relkind, pg_relation_filepath(oid) from pg_class where relname = 'pg_index';
relname | relkind | pg_relation_filepath
----------+---------+----------------------
pg_index | r | base/5/2610
(1 row)

postgres=# select indexrelid::regclass, indexrelid::regclass, indisclustered from pg_index where indrelid = 'pg_database'::regclass;
indexrelid | indexrelid | indisclustered
---------------------------+---------------------------+----------------
pg_database_datname_index | pg_database_datname_index | f
pg_database_oid_index | pg_database_oid_index | f
(2 rows)

- if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+ if (cmd == CLUSTER_COMMAND_CLUSTER && OldHeap->rd_rel->relisshared)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot cluster a shared catalog")));
+ errmsg("cannot %s a shared catalog", cmd_str)));

I'm confused about this change. Why is it required?

If it prints this message only for CLUSTER command, you don't need to have a
generic message. This kind of message is not good for translation. If you need
multiple verbs here, I advise you to break it into multiple messages.

- {
- if (OidIsValid(indexOid))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot cluster temporary tables of other sessions")));
- else
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot vacuum temporary tables of other sessions")));
- }
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot %s temporary tables of other sessions",
+ cmd_str)));

Ditto.

- CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+ CheckTableNotInUse(OldHeap, asc_toupper(cmd_str, strlen(cmd_str)));

If the idea is to remove CLUSTER and VACUUM from this routine in the future, I
wouldn't include formatting.h just for asc_toupper(). Instead, I would use an
if condition. I think it will be easy to remove this code path when the time
comes.

- errmsg("cannot cluster on index \"%s\" because access method does not support clustering",
- RelationGetRelationName(OldIndex))));
+ errmsg("cannot %s on index \"%s\" because access method does not support clustering",
+ cmd_str, RelationGetRelationName(OldIndex))));

Ditto. I don't think check_index_is_clusterable() should be changed. The action
is "cluster" independently of the command. You can keep "cluster" until we
completely remove CLUSTER command and then we can replace this term with
"repack". It also applies to cluster_is_permitted_for_relation().

- errmsg("cannot cluster on partial index \"%s\"",
+ errmsg("cannot %s on partial index \"%s\"",
+ cmd_str,
RelationGetRelationName(OldIndex))));

Ditto.

- errmsg("cannot cluster on invalid index \"%s\"",
- RelationGetRelationName(OldIndex))));
+ errmsg("cannot %s on invalid index \"%s\"",
+ cmd_str, RelationGetRelationName(OldIndex))));

Ditto.

- (errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
+ (errmsg("%sing \"%s.%s\" using index scan on \"%s\"",
+ cmd_str,
nspname,
RelationGetRelationName(OldHeap),
RelationGetRelationName(OldIndex))));

This is bad for translation. Use complete sentences.

- (errmsg("clustering \"%s.%s\" using sequential scan and sort",
+ (errmsg("%sing \"%s.%s\" using sequential scan and sort",
+ cmd_str,
nspname,
RelationGetRelationName(OldHeap))));

Ditto.

- (errmsg("vacuuming \"%s.%s\"",
+ (errmsg("%sing \"%s.%s\"",
+ cmd_str,
nspname,
RelationGetRelationName(OldHeap))));

Ditto.

/*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Like get_tables_to_cluster(), but do not care about indexes.
+ */

Since the goal is to remove CLUSTER in the future, provide a comment that
doesn't mention routines that will certainly be removed. Hence, there is no
need to fix them in the future.

+ /*
+ * Get all indexes that have indisclustered set and that the current user
+ * has the appropriate privileges for.
+ */

This comment is not true.

ereport(WARNING,
- (errmsg("permission denied to cluster \"%s\", skipping it",
+ (errmsg("permission denied to %s \"%s\", skipping it",
+ CLUSTER_COMMAND_STR(cmd),
get_rel_name(relid))));

Fix for translation.

+ if (stmt->relation != NULL)
+ {
+ rel = process_single_relation(stmt->relation, stmt->indexname,
+ CLUSTER_COMMAND_REPACK, &params,
+ &indexOid);
+ if (rel == NULL)
+ return;
+ }

This code path is confusing. It took me some time (after reading
process_single_relation() that could have a better name) to understand it. I
don't have a good suggestion but it should have at least one comment explaining
what the purpose is.

+/*
+ * REPACK a single relation.
+ *
+ * Return NULL if done, relation reference if the caller needs to process it
+ * (because the relation is partitioned).
+ */

This comment should be expanded. As I said in the previous hunk, there isn't
sufficient information to understand how process_single_relation() works.

+ | REPACK
+ {
+ RepackStmt *n = makeNode(RepackStmt);
+
+ n->relation = NULL;
+ n->indexname = NULL;
+ n->params = NIL;
+ $$ = (Node *) n;
+ }
+
+ | REPACK '(' utility_option_list ')'
+ {
+ RepackStmt *n = makeNode(RepackStmt);
+
+ n->relation = NULL;
+ n->indexname = NULL;
+ n->params = $3;
+ $$ = (Node *) n;
+ }

I'm wondering if there is an easy way to avoid these rules.

PROGRESS_COMMAND_VACUUM,
PROGRESS_COMMAND_ANALYZE,
PROGRESS_COMMAND_CLUSTER,
+ PROGRESS_COMMAND_REPACK,
PROGRESS_COMMAND_CREATE_INDEX,
PROGRESS_COMMAND_BASEBACKUP,
PROGRESS_COMMAND_COPY,

It is just a matter of style but I have the habit to include new stuff at the
end.

+-- Yet another code path: REPACK w/o index.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works

You forgot to remove the USING INDEX here.

I'm still review the other patches (that is basically the implementation of
CONCURRENTLY) and to avoid a long review, I'm sending the 0001 review. Anyway,
0001 is independent of the other patches and should be applied separately.

[1] https://www.merriam-webster.com/dictionary/repack

--
Euler Taveira
EDB https://www.enterprisedb.com/

From:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-03 20:08:17
Message-ID:	CAFY6G8cQs-SY0Lkv6E7YdZaCpZF0gctKW_SvqmBXB-dTXV-h2g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Tue, Apr 1, 2025 at 10:31 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> One more version, hopefully to make cfbot happy (I missed the bug because I
> did not set the RELCACHE_FORCE_RELEASE macro in my environment.)

Thanks for the new version! I'm starting to study this patch series and
I just want to share some points about the documentation on v12-0004:

+ Thus the tuples inserted into the old file during the copying are
+ also stored in separately in a temporary file, so they can eventually

Maybe "stored separately in a temporary file"?

+ <row>
+ <entry><literal>catch-up</literal></entry>
+ <entry>
+ <command>REPACK</command> is currently processing the DML commands that
+ other transactions executed during any of the preceding phase.
+ </entry>
+ </row>

This catch-up phase only happens when CONCURRENTLY is used right? Maybe
it would be good to mention this?

The commit message say:
"Of course, more data changes can take place while we are waiting for
the lock - these will be applied to the new file after we have acquired
the lock, before we swap the files."

But the documentation say:
+ <para>
+ With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+ EXCLUSIVE</literal> lock is only acquired to swap the table and index
+ files. The data changes that took place during the creation of the new
+ table and index files are captured using logical decoding
+ (<xref linkend="logicaldecoding"/>) and applied before
+ the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+ is typically held only for the time needed to swap the files, which
+ should be pretty short.
+ </para>

I've noticed that you've included on 0007 that the ACCESS EXCLUSIVE may
be used to apply changes that occurred while waiting for the lock, but
IIUC this behaviour is implemented on 0004 right? If that's the case I
think that it would be good to move this part of the documentation to
0004 instead of 0007, what do you think?

--
Matheus Alcantara

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-04 07:33:54
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025-Apr-01, Antonin Houska wrote:

> Besides that, it occurred to me that 0005 ("Preserve visibility
> information of the concurrent data changes.") will probably introduce
> significant overhead. The problem is that the table we're repacking is
> treated like a catalog, for reorderbuffer.c to generate snapshots that
> we need to replay UPDATE / DELETE commands on the new table.
>
> contrib/test_decoding can be used to demonstrate the difference
> between ordinary and catalog tables:
>
> [.. ordinary ..]
> Execution Time: 3521.190 ms
> [.. catalog ..]
> Execution Time: 6561.634 ms

Significant indeed. Thinking about the scenarios in which I envision
people using REPACK CONCURRENTLY (mostly, cases where very large tables
have accumulated considerable amounts of bloat) and considering the size
of the patch, I think the case for treating it as concurrent-safe is not
credible, at least not at this stage -- not only because of this
performance impact, but also because of the additional code complexity,
which I'm really doubtful we can address at this stage. I would suggest
to put that patch aside for now, maybe with a doc warning that
"repacking a table would cause visibility information to be lost"; and
then address that aspect later on, after this feature has gone through
some battle-hardening.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Las navajas y los monos deben estar siempre distantes" (Germán Poo)

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-04 10:28:36
Message-ID:	13028.1743762516@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> On 2025-Apr-01, Antonin Houska wrote:
>
> > Besides that, it occurred to me that 0005 ("Preserve visibility
> > information of the concurrent data changes.") will probably introduce
> > significant overhead. The problem is that the table we're repacking is
> > treated like a catalog, for reorderbuffer.c to generate snapshots that
> > we need to replay UPDATE / DELETE commands on the new table.
> >
> > contrib/test_decoding can be used to demonstrate the difference
> > between ordinary and catalog tables:
> >
> > [.. ordinary ..]
> > Execution Time: 3521.190 ms
> > [.. catalog ..]
> > Execution Time: 6561.634 ms
>
> Significant indeed. Thinking about the scenarios in which I envision
> people using REPACK CONCURRENTLY (mostly, cases where very large tables
> have accumulated considerable amounts of bloat) and considering the size
> of the patch, I think the case for treating it as concurrent-safe is not
> credible, at least not at this stage -- not only because of this
> performance impact, but also because of the additional code complexity,
> which I'm really doubtful we can address at this stage. I would suggest
> to put that patch aside for now, maybe with a doc warning that
> "repacking a table would cause visibility information to be lost"; and
> then address that aspect later on, after this feature has gone through
> some battle-hardening.

ok, I'll adjust the patch set.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	"Euler Taveira" <euler(at)eulerto(dot)com>
Cc:	"Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Marcos Pegoraro" <marcos(at)f10(dot)com(dot)br>, "Michael Banck" <mbanck(at)gmx(dot)net>, "Junwang Zhao" <zhjwpku(at)gmail(dot)com>, "Kirill Reshke" <reshkekirill(at)gmail(dot)com>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-04 16:38:54
Message-ID:	55563.1743784734@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Euler Taveira <euler(at)eulerto(dot)com> wrote:

> I started reviewing this patch. It was in my radar to review it but I didn't
> have spare time until now.

Thanks!

> + <refnamediv>
> + <refname>REPACK</refname>
> + <refpurpose>cluster a table according to an index</refpurpose>
> + </refnamediv>
>
> This description is not accurate because the index is optional. It means if the
> index is not specified, it is a "rewrite" instead of a "cluster". One
> suggestion is to use "rewrite" because it says nothing about the tuple order. A
> "rewrite a table" or "rewrite a table to reclaim space" are good candidates. On
> the other hand, the command is called "repack" and it should be a strong
> candidate for the verb in this description. (I'm surprised that repack is not a
> recent term [1]). It seems a natural choice.

This reveals that I used the documentation of CLUSTER for "inspiration" :-) I
prefer "rewrite" because it can help users which don't know what REPACK means
in this context.

> +<synopsis>
> +REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable>
> [ USING INDEX<replaceable class="parameter">index_name</replaceable> ] ]

ok, fixed

> You missed a space after INDEX.
>
> + <para>
> + Because the planner records statistics about the ordering of tables, it is
> + advisable to
> + run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
> + newly repacked table. Otherwise, the planner might make poor choices of
> + query plans.
> + </para>
>
> If we decide for another term (other than "repacked") then it should reflect
> here and in some other parts ("repacking" is also used) too.

I think it's ok to say "repack" as long as we explained it above. The
documentation of CLUSTER command also uses the verb "cluster".

> + <para>
> + When an index scan or a sequential scan without sort is used, a temporary
> + copy of the table is created that contains the table data in the index
> + order.
>
> That's true for CLUSTER but not for REPACK. Index is optional.

ok, removed the mention of index.

> + Prints a progress report as each table is clustered
> + at <literal>INFO</literal> level.
>
> s/clustered/repacked/?

right

> + Repacking a partitioned table repacks each of its partitions. If an index
> + is specified, each partition is clustered using the partition of that
> + index. <command>REPACK</command> on a partitioned table cannot be executed
> + inside a transaction block.
>
> Ditto.

fixed

> + <para>
> + Cluster the table <literal>employees</literal> on the basis of its
> + index <literal>employees_ind</literal>:
> +<programlisting>
> +REPACK employees USING INDEX employees_ind;
> +</programlisting>
> + </para>
>
> It sounds strange to use "Repack" in the other examples but this one it says
> "Cluster". Let's use the same terminology.

It's explained above on the page that if index is specified, it's
clustering. I changed it to "Repack", but added a note that this is
effectively clustering.

> +
> + <warning>
> + <para>
> + The <command>FULL</command> parameter is deprecated in favor of
> + <xref linkend="sql-repack"/>.
> + </para>
> + </warning>
> +
>
> The warnings, notes, and tips are usually placed *after* the description.

You probably mean the subsecions "Notes on Clustering" and "Notes on
Resources". I moved them into the "Notes" section.

> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
>
> Is it worth to rename table_relation_copy_for_cluster() and
> heapam_relation_copy_for_cluster() to replace cluster with repack?

I had thought about it and concluded that it'd make the patch too
invasive. Note that the CLUSTER still uses these functions. We can do the
renaming when removing the CLUSTER command someday.

> + SELECT
> + S.pid AS pid,
> + S.datid AS datid,
> + D.datname AS datname,
> + S.relid AS relid,
> + CASE S.param1 WHEN 1 THEN 'REPACK'
> + END AS command,
>
> Do you really need command? IIUC REPACK is the only command that will used by
> this view. There is no need to differentiate commands here.

REPACK is a regular command, so why shouldn't it have its view? Just like
CLUSTER has one (pg_stat_progress_cluster).

> + *
> + * 'cmd' indicates which commands is being executed. REPACK should be the only
> + * caller of this function in the future.
>
> command.

Not sure I understand this comment.

> + *
> + * REPACK does not set indisclustered. XXX Not sure I understand the
> + * comment above: how can an attribute be set "only in the current
> + * database"?
> */
>
> pg_index is a local catalog. To be consistent while clustering a shared
> catalog, it should set indisclustered in all existent databases because in each
> pg_index table there is a tuple for the referred index. As the comment says it
> is not possible.

Yes, Alvaro already explained this to me [1] :-)

> - if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
> + if (cmd == CLUSTER_COMMAND_CLUSTER && OldHeap->rd_rel->relisshared)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> - errmsg("cannot cluster a shared catalog")));
> + errmsg("cannot %s a shared catalog", cmd_str)));
>
> I'm confused about this change. Why is it required?
>
> If it prints this message only for CLUSTER command, you don't need to have a
> generic message. This kind of message is not good for translation. If you need
> multiple verbs here, I advise you to break it into multiple messages.

Good point, I didn't think about translation. Fixed.

> - {
> - if (OidIsValid(indexOid))
> - ereport(ERROR,
> - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> - errmsg("cannot cluster temporary tables of other sessions")));
> - else
> - ereport(ERROR,
> - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> - errmsg("cannot vacuum temporary tables of other sessions")));
> - }
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot %s temporary tables of other sessions",
> + cmd_str)));
>
> Ditto.

Fixed.

> - CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
> + CheckTableNotInUse(OldHeap, asc_toupper(cmd_str, strlen(cmd_str)));
>
> If the idea is to remove CLUSTER and VACUUM from this routine in the future, I
> wouldn't include formatting.h just for asc_toupper(). Instead, I would use an
> if condition. I think it will be easy to remove this code path when the time
> comes.

Fixed.

> - errmsg("cannot cluster on index \"%s\" because access method does not support clustering",
> - RelationGetRelationName(OldIndex))));
> + errmsg("cannot %s on index \"%s\" because access method does not support clustering",
> + cmd_str, RelationGetRelationName(OldIndex))));
>
> Ditto. I don't think check_index_is_clusterable() should be changed. The action
> is "cluster" independently of the command. You can keep "cluster" until we
> completely remove CLUSTER command and then we can replace this term with
> "repack". It also applies to cluster_is_permitted_for_relation().
>
> - errmsg("cannot cluster on partial index \"%s\"",
> + errmsg("cannot %s on partial index \"%s\"",
> + cmd_str,
> RelationGetRelationName(OldIndex))));
>
> Ditto.
>
> - errmsg("cannot cluster on invalid index \"%s\"",
> - RelationGetRelationName(OldIndex))));
> + errmsg("cannot %s on invalid index \"%s\"",
> + cmd_str, RelationGetRelationName(OldIndex))));
>
> Ditto.
>
> - (errmsg("clustering \"%s.%s\" using index scan on \"%s\"",
> + (errmsg("%sing \"%s.%s\" using index scan on \"%s\"",
> + cmd_str,
> nspname,
> RelationGetRelationName(OldHeap),
> RelationGetRelationName(OldIndex))));
>
> This is bad for translation. Use complete sentences.
>
> - (errmsg("clustering \"%s.%s\" using sequential scan and sort",
> + (errmsg("%sing \"%s.%s\" using sequential scan and sort",
> + cmd_str,
> nspname,
> RelationGetRelationName(OldHeap))));
>
> Ditto.
>
> - (errmsg("vacuuming \"%s.%s\"",
> + (errmsg("%sing \"%s.%s\"",
> + cmd_str,
> nspname,
> RelationGetRelationName(OldHeap))));
>
> Ditto.

fixed

> /*
> - * Given an index on a partitioned table, return a list of RelToCluster for
> + * Like get_tables_to_cluster(), but do not care about indexes.
> + */

> Since the goal is to remove CLUSTER in the future, provide a comment that
> doesn't mention routines that will certainly be removed. Hence, there is no
> need to fix them in the future.

It'd be almost duplicate of the header comment of get_tables_to_cluster() and
I don't like duplication. Let's do that at removal time.

> + /*
> + * Get all indexes that have indisclustered set and that the current user
> + * has the appropriate privileges for.
> + */
>
> This comment is not true.

Fixed.

> ereport(WARNING,
> - (errmsg("permission denied to cluster \"%s\", skipping it",
> + (errmsg("permission denied to %s \"%s\", skipping it",
> + CLUSTER_COMMAND_STR(cmd),
> get_rel_name(relid))));
>
> Fix for translation.

Fixed.

> + if (stmt->relation != NULL)
> + {
> + rel = process_single_relation(stmt->relation, stmt->indexname,
> + CLUSTER_COMMAND_REPACK, &params,
> + &indexOid);
> + if (rel == NULL)
> + return;
> + }
>
> This code path is confusing. It took me some time (after reading
> process_single_relation() that could have a better name) to understand it. I
> don't have a good suggestion but it should have at least one comment explaining
> what the purpose is.

ok, added the comment that I lost when moving the code from cluster() to
process_single_relation().

> +/*
> + * REPACK a single relation.
> + *
> + * Return NULL if done, relation reference if the caller needs to process it
> + * (because the relation is partitioned).
> + */
>
> This comment should be expanded. As I said in the previous hunk, there isn't
> sufficient information to understand how process_single_relation() works.

This function only contains code that I moved from cluster_rel(). The header
comment is and additional information. I tried to rephrase it a bit anyway.

> + | REPACK
> + {
> + RepackStmt *n = makeNode(RepackStmt);
> +
> + n->relation = NULL;
> + n->indexname = NULL;
> + n->params = NIL;
> + $$ = (Node *) n;
> + }
> +
> + | REPACK '(' utility_option_list ')'
> + {
> + RepackStmt *n = makeNode(RepackStmt);
> +
> + n->relation = NULL;
> + n->indexname = NULL;
> + n->params = $3;
> + $$ = (Node *) n;
> + }

Maybe, will think about it.

> I'm wondering if there is an easy way to avoid these rules.
>
> PROGRESS_COMMAND_VACUUM,
> PROGRESS_COMMAND_ANALYZE,
> PROGRESS_COMMAND_CLUSTER,
> + PROGRESS_COMMAND_REPACK,
> PROGRESS_COMMAND_CREATE_INDEX,
> PROGRESS_COMMAND_BASEBACKUP,
> PROGRESS_COMMAND_COPY,
>
> It is just a matter of style but I have the habit to include new stuff at the
> end.

Yes, it seems so. Fixed.

> +-- Yet another code path: REPACK w/o index.
> +REPACK clstr_tst USING INDEX clstr_tst_c;
> +-- Verify that inheritance link still works
>
> You forgot to remove the USING INDEX here.

Good catch. I removed the test because w/o index the output order can be
unstable. (Whether index is used or not should not affect the catalog changes
related to inheritance or FKs anyway.)

> I'm still review the other patches (that is basically the implementation of
> CONCURRENTLY) and to avoid a long review, I'm sending the 0001 review. Anyway,
> 0001 is independent of the other patches and should be applied separately.

Attached is a new version of 0001.

As for the other patches, please skip the parts > 0004 - most of this code
will be removed [2]. I'll try to post the next version of the patch set next
week.

(Regarding next reviews, please try to keep hunk headers in the text.)

[1] https://www.postgresql.org/message-id/[email protected]
[2] https://www.postgresql.org/message-id/13028.1743762516%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
0001-Add-REPACK-command.patch	text/x-diff	78.0 KB

From:	"Euler Taveira" <euler(at)eulerto(dot)com>
To:	"Antonin Houska" <ah(at)cybertec(dot)at>
Cc:	"Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Marcos Pegoraro" <marcos(at)f10(dot)com(dot)br>, "Michael Banck" <mbanck(at)gmx(dot)net>, "Junwang Zhao" <zhjwpku(at)gmail(dot)com>, "Kirill Reshke" <reshkekirill(at)gmail(dot)com>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-04 18:21:41
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 4, 2025, at 1:38 PM, Antonin Houska wrote:
> Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> > +
> > + <warning>
> > + <para>
> > + The <command>FULL</command> parameter is deprecated in favor of
> > + <xref linkend="sql-repack"/>.
> > + </para>
> > + </warning>
> > +
> >
> > The warnings, notes, and tips are usually placed *after* the description.
>
> You probably mean the subsecions "Notes on Clustering" and "Notes on
> Resources". I moved them into the "Notes" section.

No. I said that it should be put after the <para> not before.

@@ -98,6 +98,14 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
<varlistentry>
<term><literal>FULL</literal></term>
<listitem>
+
+ <warning>
+ <para>
+ The <command>FULL</command> parameter is deprecated in favor of
+ <xref linkend="sql-repack"/>.
+ </para>
+ </warning>
+
<para>
Selects <quote>full</quote> vacuum, which can reclaim more
space, but takes much longer and exclusively locks the table.

> > + SELECT
> > + S.pid AS pid,
> > + S.datid AS datid,
> > + D.datname AS datname,
> > + S.relid AS relid,
> > + CASE S.param1 WHEN 1 THEN 'REPACK'
> > + END AS command,
> >
> > Do you really need command? IIUC REPACK is the only command that will used by
> > this view. There is no need to differentiate commands here.
>
> REPACK is a regular command, so why shouldn't it have its view? Just like
> CLUSTER has one (pg_stat_progress_cluster).

You missed my point. IIRC the command is relevant in the
pg_stat_progress_cluster because there are multiple commands (CLUSTER, VACUUM
FULL). However, in this new view there will be only one command so it is not
necessary to inform it.

> > + *
> > + * 'cmd' indicates which commands is being executed. REPACK should be the only
> > + * caller of this function in the future.
> >
> > command.
>
> Not sure I understand this comment.

Singular form. ... which command is ...

--
Euler Taveira
EDB https://www.enterprisedb.com/

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	"Euler Taveira" <euler(at)eulerto(dot)com>
Cc:	"Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Marcos Pegoraro" <marcos(at)f10(dot)com(dot)br>, "Michael Banck" <mbanck(at)gmx(dot)net>, "Junwang Zhao" <zhjwpku(at)gmail(dot)com>, "Kirill Reshke" <reshkekirill(at)gmail(dot)com>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-11 09:25:22
Message-ID:	97795.1744363522@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Euler Taveira <euler(at)eulerto(dot)com> wrote:

> On Fri, Apr 4, 2025, at 1:38 PM, Antonin Houska wrote:
>
> Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> > +
> > + <warning>
> > + <para>
> > + The <command>FULL</command> parameter is deprecated in favor of
> > + <xref linkend="sql-repack"/>.
> > + </para>
> > + </warning>
> > +
> >
> > The warnings, notes, and tips are usually placed *after* the description.
>
> You probably mean the subsecions "Notes on Clustering" and "Notes on
> Resources". I moved them into the "Notes" section.
>
> No. I said that it should be put after the <para> not before.
>
> @@ -98,6 +98,14 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
> <varlistentry>
> <term><literal>FULL</literal></term>
> <listitem>
> +
> + <warning>
> + <para>
> + The <command>FULL</command> parameter is deprecated in favor of
> + <xref linkend="sql-repack"/>.
> + </para>
> + </warning>
> +
> <para>
> Selects <quote>full</quote> vacuum, which can reclaim more
> space, but takes much longer and exclusively locks the table.
>
> > + SELECT
> > + S.pid AS pid,
> > + S.datid AS datid,
> > + D.datname AS datname,
> > + S.relid AS relid,
> > + CASE S.param1 WHEN 1 THEN 'REPACK'
> > + END AS command,
> >
> > Do you really need command? IIUC REPACK is the only command that will used by
> > this view. There is no need to differentiate commands here.
>
> REPACK is a regular command, so why shouldn't it have its view? Just like
> CLUSTER has one (pg_stat_progress_cluster).
>
> You missed my point. IIRC the command is relevant in the
> pg_stat_progress_cluster because there are multiple commands (CLUSTER, VACUUM
> FULL). However, in this new view there will be only one command so it is not
> necessary to inform it.
>
> > + *
> > + * 'cmd' indicates which commands is being executed. REPACK should be the only
> > + * caller of this function in the future.
> >
> > command.
>
> Not sure I understand this comment.
>
> Singular form. ... which command is ...
>

This is the next version. It addresses these remaining concerns and also gets
rid of the unnecessary rules in gram.y (which complained about earlier).

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v13-0001-Add-REPACK-command.patch	text/x-diff	81.8 KB
v13-0002-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v13-0003-Move-the-recheck-branch-to-a-separate-function.patch	text/x-diff	4.8 KB
v13-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch	text/plain	150.1 KB
v13-0005-Add-regression-tests.patch	text/x-diff	10.4 KB
v13-0006-Introduce-repack_max_xlock_time-configuration-variab.patch	text/x-diff	20.1 KB
v13-0007-Enable-logical-decoding-transiently-only-for-REPACK-.patch	text/x-diff	34.9 KB

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-04-11 09:28:02
Message-ID:	97992.1744363682@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Matheus Alcantara <matheusssilv97(at)gmail(dot)com> wrote:

> Hi,
>
> On Tue, Apr 1, 2025 at 10:31 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > One more version, hopefully to make cfbot happy (I missed the bug because I
> > did not set the RELCACHE_FORCE_RELEASE macro in my environment.)
>
> Thanks for the new version! I'm starting to study this patch series and
> I just want to share some points about the documentation on v12-0004:

Please check the next version [1]. Thanks for your input.

[1] https://www.postgresql.org/message-id/97795.1744363522%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-06-07 14:41:35
Message-ID:	CACJufxFfm=E1PA80Uxj26rqaRFVrHNiTycdADD1z4F+c8nQFDw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 11, 2025 at 5:28 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Please check the next version [1]. Thanks for your input.
>
> [1] https://www.postgresql.org/message-id/97795.1744363522%40localhost
>

Hi, I’ve briefly experimented with v13-0001.

EXPLAIN tab complete:
explain (verbose O
OFF ON

since we already touched the tab-complete for repack.
We can do it similarly.
you may see src/bin/psql/tab-complete.in.c line 4288.

------------------------------
Currently REPACK Synopsis section
looks like the screenshot attached.

make it one line
REPACK [ ( option [, ...] ) ] [ table_name [ USING INDEX index_name ] ]
would look intuitive, IMHO.
------------------------------
+repack_index_specification:
+ USING INDEX name { $$ = $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;

in gram.y line 4685, we have
ExistingIndex: USING INDEX name { $$ = $3; }
;

so here, we can change it to

repack_index_specification:
ExistingIndex
| /*EMPTY*/ { $$ = NULL; }

-------------------------------------------------
+static List *
+get_tables_to_repack(MemoryContext repack_context)
+{
+ relrelation = table_open(RelationRelationId, AccessShareLock);
+ scan = table_beginscan_catalog(relrelation, 0, NULL);
+ while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+ {
+ RelToCluster *rtc;
+ Form_pg_class relrelation = (Form_pg_class) GETSTRUCT(tuple);
+ Oid relid = relrelation->oid;
+
+ /* Only interested in relations. */
+ if (get_rel_relkind(relid) != RELKIND_RELATION)
+ continue;

The doc said (Without a table_name, REPACK processes every table and
materialized view...)
but seems plain ``REPACK(verbose) ; ``
will not process materialized view?

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-06-08 16:05:59
Message-ID:	CACJufxGE4vDVBd_KDFsTnXoMPKobx-Qqk=f8boZYKJNvUVb3AQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

hi.
some more minor comments about v13-0001.

GetCommandLogLevel also needs to specify LogStmtLevel for T_RepackStmt?

/*
* (CLUSTER might change the order of
* rows on disk, which could affect the ordering of pg_dump
* output, but that's not semantically significant.)
*/
do we need adjust this comment in ClassifyUtilityCommandAsReadOnly
for the REPACK statement?

<para>
<productname>PostgreSQL</productname> has the ability to report the
progress of
certain commands during command execution. Currently, the only commands
which support progress reporting are <command>ANALYZE</command>,
<command>CLUSTER</command>,
<command>CREATE INDEX</command>, <command>VACUUM</command>,
<command>COPY</command>,
and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
command that <xref linkend="app-pgbasebackup"/> issues to take
a base backup).
This may be expanded in the future.
</para>
also need to mention <command>REPACK</command>?

"The CLUSTER command is deprecated",
then do we need to say something
in doc/src/sgml/ref/clusterdb.sgml?

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-06-09 10:19:15
Message-ID:	117560.1749464355@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> On Fri, Apr 11, 2025 at 5:28 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >
> > Please check the next version [1]. Thanks for your input.
> >
> > [1] https://www.postgresql.org/message-id/97795.1744363522%40localhost
> >
>
> Hi, I’ve briefly experimented with v13-0001.

Thanks! v14 addresses your comments.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v14-0001-Add-REPACK-command.patch	text/x-diff	82.7 KB
v14-0002-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v14-0003-Move-the-recheck-branch-to-a-separate-function.patch	text/x-diff	4.8 KB
v14-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch	text/plain	150.0 KB
v14-0005-Add-regression-tests.patch	text/x-diff	10.4 KB
v14-0006-Introduce-repack_max_xlock_time-configuration-variab.patch	text/x-diff	20.1 KB
v14-0007-Enable-logical-decoding-transiently-only-for-REPACK-.patch	text/x-diff	34.9 KB

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-06-09 10:35:57
Message-ID:	118471.1749465357@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> hi.
> some more minor comments about v13-0001.
>
> GetCommandLogLevel also needs to specify LogStmtLevel for T_RepackStmt?

Fixed in [1].

> /*
> * (CLUSTER might change the order of
> * rows on disk, which could affect the ordering of pg_dump
> * output, but that's not semantically significant.)
> */
> do we need adjust this comment in ClassifyUtilityCommandAsReadOnly
> for the REPACK statement?

Not sure. The current version does not mention VACUUM. (Note that VACUUM FULL
does almost the same as CLUSTER.) We can adjust the comment during the removal
of CLUSTER sometime in the future.

> <para>
> <productname>PostgreSQL</productname> has the ability to report the
> progress of
> certain commands during command execution. Currently, the only commands
> which support progress reporting are <command>ANALYZE</command>,
> <command>CLUSTER</command>,
> <command>CREATE INDEX</command>, <command>VACUUM</command>,
> <command>COPY</command>,
> and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
> command that <xref linkend="app-pgbasebackup"/> issues to take
> a base backup).
> This may be expanded in the future.
> </para>
> also need to mention <command>REPACK</command>?

Fixed in [1].

> "The CLUSTER command is deprecated",
> then do we need to say something
> in doc/src/sgml/ref/clusterdb.sgml?

I'm not convinced at the moment. The page contains a link to the documentation
of CLUSTER, which does contain the deprecation note.

It's not even clear to me whether this utility must be removed. We can adjust
it so it calls REPACK instead of CLUSTER. And while doing that, we may or may
not rename it.

[1] https://www.postgresql.org/message-id/117560.1749464355%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-06-30 18:22:05
Message-ID:	152010.1751307725@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> > On Fri, Apr 11, 2025 at 5:28 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > >
> > > Please check the next version [1]. Thanks for your input.
> > >
> > > [1] https://www.postgresql.org/message-id/97795.1744363522%40localhost
> > >
> >
> > Hi, I’ve briefly experimented with v13-0001.
>
> Thanks! v14 addresses your comments.

v15 is attached. Just rebased so it applies to the current HEAD.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
v15-0001-Add-REPACK-command.patch	text/x-diff	82.7 KB
v15-0002-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch	text/x-diff	5.4 KB
v15-0003-Move-the-recheck-branch-to-a-separate-function.patch	text/x-diff	4.8 KB
v15-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch	text/plain	150.0 KB
v15-0005-Add-regression-tests.patch	text/x-diff	10.4 KB
v15-0006-Introduce-repack_max_xlock_time-configuration-variab.patch	text/x-diff	20.1 KB
v15-0007-Enable-logical-decoding-transiently-only-for-REPACK-.patch	text/x-diff	35.0 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: why there is not VACUUM FULL CONCURRENTLY?
Date:	2025-07-21 16:48:52
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello

I started to read through 0001 and my first reaction is that I would
like to make a few more breaking changes. It appears that the current
patch tries to keep things unchanged, or to keep some things with the
CLUSTER name. I'm going to try and get rid of that. For instance, in
the grammar, I'm going to change the productions for CLUSTER so that
they produce a RepackStmt of the appropriate form; ClusterStmt as a
parse node should just be removed as no longer useful. Also, the
cluster() function becomes ExecRepack(), and things flow down from
there.

This leads to a small problem: right now you can say "CLUSTER;" which
processes all tables for which a clustered index has been defined.
In the submitted patch, REPACK doesn't support that mode, and we need a
way to distinguish the mode where it VACUUM FULL all tables from the
mode where it does the all-table CLUSTER. So I propose we allow that
mode with simply
CLUSTER USING INDEX;
which is consistent with the idea of "REPACK table USING INDEX foo"
being "CLUSTER table USING foo".

Implementation-wise, I'm toying with adding a new "command" for REPACK
called REPACK_COMMAND_CLUSTER_ALL, supporting this mode of operation.

That leads to a grammar like

> RepackStmt:
> REPACK USING INDEX
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER_ALL;
> n->relation = NULL;
> n->indexname = NULL;
> n->params = NIL;
> $$ = (Node *) n;
> }
> | REPACK qualified_name opt_using_index
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_REPACK;
> n->relation = $2;
> n->indexname = $3;
> n->params = NIL;
> $$ = (Node *) n;
> }
> | REPACK '(' utility_option_list ')' qualified_name opt_using_index
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_REPACK;
> n->relation = $5;
> n->indexname = $6;
> n->params = $3;
> $$ = (Node *) n;
> }
> | CLUSTER '(' utility_option_list ')'
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER_ALL;
> n->relation = NULL;
> n->indexname = NULL;
> n->params = $3;
> $$ = (Node *) n;
> }
> | CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER;
> n->relation = $5;
> n->indexname = $6;
> n->params = $3;
> $$ = (Node *) n;
> }
> /* unparenthesized VERBOSE kept for pre-14 compatibility */
> | CLUSTER opt_verbose qualified_name cluster_index_specification
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER;
> n->relation = $3;
> n->indexname = $4;
> n->params = NIL;
> if ($2)
> n->params = lappend(n->params, makeDefElem("verbose", NULL, @2));
> $$ = (Node *) n;
> }
> /* unparenthesized VERBOSE kept for pre-17 compatibility */
> | CLUSTER opt_verbose
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER_ALL;
> n->relation = NULL;
> n->indexname = NULL;
> n->params = NIL;
> if ($2)
> n->params = lappend(n->params, makeDefElem("verbose", NULL, @2));
> $$ = (Node *) n;
> }
> /* kept for pre-8.3 compatibility */
> | CLUSTER opt_verbose name ON qualified_name
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER;
> n->relation = $5;
> n->indexname = $3;
> n->params = NIL;
> if ($2)
> n->params = lappend(n->params, makeDefElem("verbose", NULL, @2));
> $$ = (Node *) n;
> }
> ;
>
> opt_using_index:
> ExistingIndex { $$ = $1; }
> | /*EMPTY*/ { $$ = NULL; }
> ;
>
> cluster_index_specification:
> USING name { $$ = $2; }
> | /*EMPTY*/ { $$ = NULL; }
> ;

It's a bit weird that CLUSTER uses just "USING" while REPACK uses
"USING INDEX", but of course we cannot change CLUSTER now; and I think
it's better to have the noise word INDEX for REPACK because it allows
the case of not specifying an index name as described above.

In the current patch we don't yet have a way to use REPACK for an
unadorned "VACUUM FULL" (which processes all tables), but I think I'll
add that as well, if only so that we can claim that the REPACK commands
handles all possible legacy command modes, even if they are not useful
in practice. That would probably be unadorned "REPACK;". With this, we
can easily state that all legacy CLUSTER and VACUUM FULL commands have
an equivalent REPACK formulation.

We're of course not going to _remove_ support for any of those legacy
commands. At the same time, we're not planning to add support for
CONCURRENTLY in the all-tables modes (patch 0004), but I don't think
that's a concern. Somebody could later implement that if they wanted
to, but I think it's pretty useless so IMO it's a waste of time.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)