Return pg_control from pg_backup_stop().

Lists: pgsql-hackers
From: David Steele <david(at)pgmasters(dot)net>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Return pg_control from pg_backup_stop().
Date: 2024-05-17 02:46:49
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

This is greatly simplified implementation of the patch proposed in [1]
and hopefully it addresses the concerns expressed there. Since the
implementation is quite different it seemed like a new thread was
appropriate, especially since the old thread title would be very
misleading regarding the new functionality.

The basic idea is to harden recovery by returning a copy of pg_control
from pg_backup_stop() that has a flag set to prevent recovery if the
backup_label file is missing. Instead of backup software copying
pg_control from PGDATA, it stores an updated version that is returned
from pg_backup_stop(). This is better for the following reasons:

* The user can no longer remove backup_label and get what looks like a
successful recovery (while almost certainly causing corruption). If
backup_label is removed the cluster will not start. The user may try
pg_resetwal, but that tool makes it pretty clear that corruption will
result from its use.

* We don't need to worry about backup software seeing a torn copy of
pg_control, since Postgres can safely read it out of memory and provide
a valid copy via pg_backup_stop(). This solves torn reads without
needing to write pg_control via a temp file, which may affect
performance on a standby.

* For backup from standby, we no longer need to instruct the backup
software to copy pg_control last. In fact the backup software should not
copy pg_control from PGDATA at all.

These changes have no impact on current backup software and they are
free to use the pg_control available from pg_stop_backup() or continue
to use pg_control from PGDATA. Of course they will miss the benefits of
getting a consistent copy of pg_control and the backup_label checking,
but will be no worse off than before.

I'll register this in the July CF.

Regards,
-David

[1]
https://www.postgresql.org/message-id/[email protected]

Attachment Content-Type Size
pgcontrol-from-backupstop-v1.patch text/plain 22.1 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Return pg_control from pg_backup_stop().
Date: 2024-10-02 07:15:55
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 17, 2024 at 12:46:49PM +1000, David Steele wrote:
> This is greatly simplified implementation of the patch proposed in [1] and
> hopefully it addresses the concerns expressed there. Since the
> implementation is quite different it seemed like a new thread was
> appropriate, especially since the old thread title would be very misleading
> regarding the new functionality.

- /* No backup_label file has been found if we are here. */
+ /*
+ * No backup_label file has been found if we are here. Error if the
+ * control file requires backup_label.
+ */
+ if (ControlFile->backupLabelRequired)
+ ereport(FATAL,
+ (errmsg("could not find backup_label required for recovery"),
+ errhint("backup_label must be present for recovery to succeed")));

I thought that this had some similarities with my last fight in this
area, where xlogrecovery.c would fail hard if there was a backup_label
file but no signal files, but nope that's not the case:
https://www.postgresql.org/message-id/flat/ZArVOMifjzE7f8W7%40paquier.xyz

> The basic idea is to harden recovery by returning a copy of pg_control from
> pg_backup_stop() that has a flag set to prevent recovery if the backup_label
> file is missing. Instead of backup software copying pg_control from PGDATA,
> it stores an updated version that is returned from pg_backup_stop().

Hmm, okay. There is also a slight impact for BASE_BACKUP, requiring
basebackup_progress_wait_wal_archive() and do_pg_backup_stop() to be
called earlier when sending the main data directory which is the last
one in the list of tablespaces. As far as I can see, this does not
change the logic because do_pg_backup_stop() does not touch the
control file, but it seems to me that we'd rather keep these two
calls as they are now, and send the control file once we are out of
the for loop that processes all the tablespaces. That seems slightly
cleaner to me, still I agree that both are the same things.

Anyway, finishing do_pg_backup_stop() and then sending the control
file is just a consequence of the implementation choice driven by the
output required for the SQL function so as this is stored in the
backup state to get it down to pg_backup_stop() in xlogfuncs.c, so we
could also take one step back and forget about the SQL function,
setting only the new flag when sending a BASE_BACKUP, or just not use
the backupState to store this data as the exercise involves forcing
one boolean and recalculate a CRC32.

> * We don't need to worry about backup software seeing a torn copy of
> pg_control, since Postgres can safely read it out of memory and provide a
> valid copy via pg_backup_stop(). This solves torn reads without needing to
> write pg_control via a temp file, which may affect performance on a standby.

We're talking about a 8kB file which has a size of 512B
(PG_CONTROL_MAX_SAFE_SIZE) to avoid such issues. So I'm not sure to
see your point here?

> * For backup from standby, we no longer need to instruct the backup software
> to copy pg_control last. In fact the backup software should not copy
> pg_control from PGDATA at all.

Yep. Not from PGDATA, but from the function.

> These changes have no impact on current backup software and they are free to
> use the pg_control available from pg_stop_backup() or continue to use
> pg_control from PGDATA. Of course they will miss the benefits of getting a
> consistent copy of pg_control and the backup_label checking, but will be no
> worse off than before.

There is a large comment block in do_pg_backup_stop() around
backup_stopped_in_recovery. Perhaps it should be improved based on
this patch.

The main concern that I have over this patch is: who is actually going
to use this extension of the SQL stop function? Perhaps existing
backup solutions are good enough risk vs reward is not worth it? The
label_file and the tablespace map are text, this is a series of bytes
that has no visibility for the end-user unless checked on the
client-side. This adds a new step where backups would need to copy
the control file to the data folder.
--
Michael


From: David Steele <david(at)pgbackrest(dot)org>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Return pg_control from pg_backup_stop().
Date: 2024-10-02 09:03:27
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 10/2/24 10:11, Michael Paquier wrote:
> On Fri, May 17, 2024 at 12:46:49PM +1000, David Steele wrote:
>
>> The basic idea is to harden recovery by returning a copy of pg_control from
>> pg_backup_stop() that has a flag set to prevent recovery if the backup_label
>> file is missing. Instead of backup software copying pg_control from PGDATA,
>> it stores an updated version that is returned from pg_backup_stop().
>
> Hmm, okay. There is also a slight impact for BASE_BACKUP, requiring
> basebackup_progress_wait_wal_archive() and do_pg_backup_stop() to be
> called earlier when sending the main data directory which is the last
> one in the list of tablespaces. As far as I can see, this does not
> change the logic because do_pg_backup_stop() does not touch the
> control file, but it seems to me that we'd rather keep these two
> calls as they are now, and send the control file once we are out of
> the for loop that processes all the tablespaces. That seems slightly
> cleaner to me, still I agree that both are the same things.

Sending pg_control later results in even more code churn because of how
tar files are terminated. I've updated it that way in v2 so you can see
what I mean. I don't have a strong preference, though, so if you prefer
the implementation in v2 then that's fine with me.

> Anyway, finishing do_pg_backup_stop() and then sending the control
> file is just a consequence of the implementation choice driven by the
> output required for the SQL function so as this is stored in the
> backup state to get it down to pg_backup_stop() in xlogfuncs.c, so we
> could also take one step back and forget about the SQL function,
> setting only the new flag when sending a BASE_BACKUP, or just not use
> the backupState to store this data as the exercise involves forcing
> one boolean and recalculate a CRC32.

I can definitely see us making other updates to pg_control so I would
rather keep this logic centralized, even though it is not too
complicated at this point. Still, even 8 lines of code (as it is now)
seems better not to duplicate.

>> * We don't need to worry about backup software seeing a torn copy of
>> pg_control, since Postgres can safely read it out of memory and provide a
>> valid copy via pg_backup_stop(). This solves torn reads without needing to
>> write pg_control via a temp file, which may affect performance on a standby.
>
> We're talking about a 8kB file which has a size of 512B
> (PG_CONTROL_MAX_SAFE_SIZE) to avoid such issues. So I'm not sure to
> see your point here?

Even at 512B it is possible to see tears in pg_control and they happen
in the build farm right now. In fact, this thread [1] trying to fix the
problem was what got me thinking about alternate solutions to preventing
tears in pg_control. Thomas' proposed fixes have not been committed to
my knowledge so the problem remains, but would be fixed by this commit.

> There is a large comment block in do_pg_backup_stop() around
> backup_stopped_in_recovery. Perhaps it should be improved based on
> this patch.

I added a sentence to this comment block in v2.

> The main concern that I have over this patch is: who is actually going
> to use this extension of the SQL stop function?

Primarily existing backup software, I would imagine. The idea is that it
would give them feature parity with pg_basebackup, rather than the new
protections being exclusive to pg_basebackup.

> Perhaps existing
> backup solutions are good enough risk vs reward is not worth it?

I'm not sure I see the risk here. Saving out pg_control is optional so
no changes to current software is required. Of course they miss the
benefit of the protection against tears and missing backup_label, but
that is a choice.

Also, no matter what current backup solutions do, they cannot prevent a
user from removing backup_label after restore. This patch prevents
invalid recovery when that happens.

> The
> label_file and the tablespace map are text, this is a series of bytes
> that has no visibility for the end-user unless checked on the
> client-side. This adds a new step where backups would need to copy
> the control file to the data folder.

Again, optional, but if I was able to manage these saves using the psql
interface in the TAP tests then I'd say it would be pretty easy for
anyone with a normal connection to Postgres. Also, we require users to
treat tabelspace_map and backup_label as binary so not too big a change
here.

[1]
https://www.postgresql.org/message-id/CA%2BhUKG%2Bjig%2BQdBETj_ab%2B%2BVWSoJjbwT3sLNCnk0wFsY_6tRqoA%40mail.gmail.com

Attachment Content-Type Size
pgcontrol-from-backupstop-v2.patch text/plain 23.8 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Steele <david(at)pgbackrest(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Return pg_control from pg_backup_stop().
Date: 2024-10-03 04:45:43
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 02, 2024 at 09:03:27AM +0000, David Steele wrote:
> On 10/2/24 10:11, Michael Paquier wrote:
>> Hmm, okay. There is also a slight impact for BASE_BACKUP, requiring
>> basebackup_progress_wait_wal_archive() and do_pg_backup_stop() to be
>> called earlier when sending the main data directory which is the last
>> one in the list of tablespaces. As far as I can see, this does not
>> change the logic because do_pg_backup_stop() does not touch the
>> control file, but it seems to me that we'd rather keep these two
>> calls as they are now, and send the control file once we are out of
>> the for loop that processes all the tablespaces. That seems slightly
>> cleaner to me, still I agree that both are the same things.
>
> Sending pg_control later results in even more code churn because of how tar
> files are terminated. I've updated it that way in v2 so you can see what I
> mean. I don't have a strong preference, though, so if you prefer the
> implementation in v2 then that's fine with me.

It does not make much of a difference, indeed.

>> Anyway, finishing do_pg_backup_stop() and then sending the control
>> file is just a consequence of the implementation choice driven by the
>> output required for the SQL function so as this is stored in the
>> backup state to get it down to pg_backup_stop() in xlogfuncs.c, so we
>> could also take one step back and forget about the SQL function,
>> setting only the new flag when sending a BASE_BACKUP, or just not use
>> the backupState to store this data as the exercise involves forcing
>> one boolean and recalculate a CRC32.
>
> I can definitely see us making other updates to pg_control so I would rather
> keep this logic centralized, even though it is not too complicated at this
> point. Still, even 8 lines of code (as it is now) seems better not to
> duplicate.

I was wondering if the field update should be hidden behind a macro
that uses an offsetof() on ControlFileData, with the name of the field
and a pointer to the value to update to. If you include the CRC32
calculation in that, that makes for less chunks of code when updating
one field of the control file.

The full CRC calculation could also be hidden inside a macro, as there
are a couple of places where we do the same things, like pg_rewind.c,
etc.

>> We're talking about a 8kB file which has a size of 512B
>> (PG_CONTROL_MAX_SAFE_SIZE) to avoid such issues. So I'm not sure to
>> see your point here?
>
> Even at 512B it is possible to see tears in pg_control and they happen in
> the build farm right now. In fact, this thread [1] trying to fix the problem
> was what got me thinking about alternate solutions to preventing tears in
> pg_control. Thomas' proposed fixes have not been committed to my knowledge
> so the problem remains, but would be fixed by this commit.

Ah, right. That rings a bell. Thomas has done some work with
c558e6fd92ff and 63a582222c6b. And we're still not taking the
ControlFileLock while copying it over.. It looks like we should do it
separately, and backpatch. That's not something for this thread to
worry about.

>> Perhaps existing
>> backup solutions are good enough risk vs reward is not worth it?
>
> I'm not sure I see the risk here. Saving out pg_control is optional so no
> changes to current software is required. Of course they miss the benefit of
> the protection against tears and missing backup_label, but that is a choice.
>
> Again, optional, but if I was able to manage these saves using the psql
> interface in the TAP tests then I'd say it would be pretty easy for anyone
> with a normal connection to Postgres. Also, we require users to treat
> tabelspace_map and backup_label as binary so not too big a change here.

Maintenance cost for a limited user impact overall. With incremental
backups being a thing in v18 only available through the replication
protocol, the SQL functions have less advantages these days. My point
would be to see this thread as a two-step process:
1) Update the flag in the control file when sending it across in
replication stream.
2) Do the SQL function thing with the bytea for the control file, if
necessary.

1) is something that has more value than 2), IMO, because there is no
need for a manual step when a backup is taken by the replication
protocol. Well, custom backup solutions that rely on the replication
protocol to copy the data would need to make sure that they have a
backup_label, but that's something they should do anyway and what this
patch wants to protect users from. The SQL part is optional IMO. It
can be done, but it has less impact overall and makes backups more
complicated by requiring the manual copy of the control file.
--
Michael


From: David Steele <david(at)pgbackrest(dot)org>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Return pg_control from pg_backup_stop().
Date: 2024-10-03 09:11:14
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 10/3/24 07:45, Michael Paquier wrote:
> On Wed, Oct 02, 2024 at 09:03:27AM +0000, David Steele wrote:
>> On 10/2/24 10:11, Michael Paquier wrote:
>
>> I can definitely see us making other updates to pg_control so I would rather
>> keep this logic centralized, even though it is not too complicated at this
>> point. Still, even 8 lines of code (as it is now) seems better not to
>> duplicate.
>
> I was wondering if the field update should be hidden behind a macro
> that uses an offsetof() on ControlFileData, with the name of the field
> and a pointer to the value to update to. If you include the CRC32
> calculation in that, that makes for less chunks of code when updating
> one field of the control file.
>
> The full CRC calculation could also be hidden inside a macro, as there
> are a couple of places where we do the same things, like pg_rewind.c,
> etc.

This seems to be a different case than pg_rewind, especially since we
need a ControlFileLock. I think that is a bit much to do in a macro, so
I split the functionality out into a function instead. This simplifies
the logic in basebackup.c but has little impact elsewhere.

>>> We're talking about a 8kB file which has a size of 512B
>>> (PG_CONTROL_MAX_SAFE_SIZE) to avoid such issues. So I'm not sure to
>>> see your point here?
>>
>> Even at 512B it is possible to see tears in pg_control and they happen in
>> the build farm right now. In fact, this thread [1] trying to fix the problem
>> was what got me thinking about alternate solutions to preventing tears in
>> pg_control. Thomas' proposed fixes have not been committed to my knowledge
>> so the problem remains, but would be fixed by this commit.
>
> Ah, right. That rings a bell. Thomas has done some work with
> c558e6fd92ff and 63a582222c6b. And we're still not taking the
> ControlFileLock while copying it over.. It looks like we should do it
> separately, and backpatch. That's not something for this thread to
> worry about.

I'd be happy to adapt patch 01 to be back-patched (without the new flag)
if we decide it is a good idea. Just locking and making a copy of
pg_control is easy enough, but if we accept the backup_control_file()
function for new versions then we could keep that for the back patch to
reduce churn between versions.

>>> Perhaps existing
>>> backup solutions are good enough risk vs reward is not worth it?
>>
>> I'm not sure I see the risk here. Saving out pg_control is optional so no
>> changes to current software is required. Of course they miss the benefit of
>> the protection against tears and missing backup_label, but that is a choice.
>>
>> Again, optional, but if I was able to manage these saves using the psql
>> interface in the TAP tests then I'd say it would be pretty easy for anyone
>> with a normal connection to Postgres. Also, we require users to treat
>> tabelspace_map and backup_label as binary so not too big a change here.
>
> Maintenance cost for a limited user impact overall. With incremental
> backups being a thing in v18 only available through the replication
> protocol, the SQL functions have less advantages these days. My point
> would be to see this thread as a two-step process:
> 1) Update the flag in the control file when sending it across in
> replication stream.
> 2) Do the SQL function thing with the bytea for the control file, if
> necessary.

OK, I have split the patch into two parts along these lines.

> 1) is something that has more value than 2), IMO, because there is no
> need for a manual step when a backup is taken by the replication
> protocol. Well, custom backup solutions that rely on the replication
> protocol to copy the data would need to make sure that they have a
> backup_label, but that's something they should do anyway and what this
> patch wants to protect users from. The SQL part is optional IMO. It
> can be done, but it has less impact overall and makes backups more
> complicated by requiring the manual copy of the control file.

I don't think having incremental backup in pg_basebackup means alternate
backup solutions are going away or that we should deprecate the SQL
interface. If nothing else, third-party solutions need a way to get an
untorn copy of pg_control and in general I think the new flag will be
universally useful.

Regards,
-David

Attachment Content-Type Size
pgcontrol-flag-v3-01-basebackup.patch text/plain 11.4 KB
pgcontrol-flag-v3-02-sql.patch text/plain 11.3 KB

From: David Steele <david(at)pgbackrest(dot)org>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Return pg_control from pg_backup_stop().
Date: 2024-11-20 22:44:36
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 10/3/24 05:11, David Steele wrote:
> On 10/3/24 07:45, Michael Paquier wrote:
>
>> 1) is something that has more value than 2), IMO, because there is no
>> need for a manual step when a backup is taken by the replication
>> protocol.  Well, custom backup solutions that rely on the replication
>> protocol to copy the data would need to make sure that they have a
>> backup_label, but that's something they should do anyway and what this
>> patch wants to protect users from.  The SQL part is optional IMO.  It
>> can be done, but it has less impact overall and makes backups more
>> complicated by requiring the manual copy of the control file.
>
> I don't think having incremental backup in pg_basebackup means alternate
> backup solutions are going away or that we should deprecate the SQL
> interface. If nothing else, third-party solutions need a way to get an
> untorn copy of pg_control and in general I think the new flag will be
> universally useful.

I updated this patch to fix an issue with -fsanitize=alignment. I'm not
entirely happy with copying twice but not sure of another way to do it.
As far as I can see VARDATA() will not return aligned data on 64-bit
architectures.

Regards,
-David

Attachment Content-Type Size
pgcontrol-flag-v4-02-sql.patch text/plain 11.4 KB
pgcontrol-flag-v4-01-basebackup.patch text/plain 11.4 KB

From: David Steele <david(at)pgbackrest(dot)org>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Return pg_control from pg_backup_stop().
Date: 2025-01-24 18:43:02
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 11/20/24 17:44, David Steele wrote:
> On 10/3/24 05:11, David Steele wrote:
>> On 10/3/24 07:45, Michael Paquier wrote:
>>
>>> 1) is something that has more value than 2), IMO, because there is no
>>> need for a manual step when a backup is taken by the replication
>>> protocol.  Well, custom backup solutions that rely on the replication
>>> protocol to copy the data would need to make sure that they have a
>>> backup_label, but that's something they should do anyway and what this
>>> patch wants to protect users from.  The SQL part is optional IMO.  It
>>> can be done, but it has less impact overall and makes backups more
>>> complicated by requiring the manual copy of the control file.
>>
>> I don't think having incremental backup in pg_basebackup means
>> alternate backup solutions are going away or that we should deprecate
>> the SQL interface. If nothing else, third-party solutions need a way
>> to get an untorn copy of pg_control and in general I think the new
>> flag will be universally useful.
>
> I updated this patch to fix an issue with -fsanitize=alignment. I'm not
> entirely happy with copying twice but not sure of another way to do it.
> As far as I can see VARDATA() will not return aligned data on 64-bit
> architectures.

Rebased and improved a comment and an error.

Regards,
-David

Attachment Content-Type Size
pgcontrol-flag-v5-02-sql.patch text/plain 11.4 KB
pgcontrol-flag-v5-01-basebackup.patch text/plain 11.4 KB