Skip to content

Commit d2729f2

Browse files
nathan-bossartCommitfest Bot
authored and
Commitfest Bot
committed
pg_upgrade: Add --swap for faster file transfer.
This new option instructs pg_upgrade to move the data directories from the old cluster to the new cluster and then to replace the catalog files with those generated for the new cluster. This mode can outperform --link, --clone, --copy, and --copy-file-range, especially on clusters with many relations. However, this mode creates many garbage files in the old cluster, which can prolong the file synchronization step. To handle that, we use "initdb --sync-only --no-sync-data-files" for file synchronization, and we synchronize the catalog files as they are transferred. We assume that the database files transferred from the old cluster were synchronized prior to upgrade. This mode also complicates reverting to the old cluster, so we recommend restoring from backup upon failure during or after file transfer. We did consider teaching pg_upgrade how to generate a revert script for such failures, but we decided against it due to the rarity of failing during file transfer, the complexity of generating the script, and the potential for misusing the script. The new mode is limited to clusters located in the same file system. With some effort, we could probably support upgrades between different file systems, but this mode is unlikely to offer much benefit if we have to copy the files across file system boundaries. It is also limited to upgrades from version 10 or newer. There are a few known obstacles for using swap mode to upgrade from older versions. For example, the visibility map format changed in v9.6, and the sequence tuple format changed in v10. In fact, swap mode omits the --sequence-data option in its uses of pg_dump and instead reuses the old cluster's sequence data files. While teaching swap mode to deal with these kinds of changes is surely possible (and we may have to deal with similar problems in the future, anyway), it doesn't seem worth the effort to support upgrades from long-unsupported versions. Reviewed-by: Greg Sabino Mullane <[email protected]> Reviewed-by: Bruce Momjian <[email protected]> Reviewed-by: Robert Haas <[email protected]> Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
1 parent 93bdf66 commit d2729f2

File tree

14 files changed

+531
-34
lines changed

14 files changed

+531
-34
lines changed

doc/src/sgml/ref/pgupgrade.sgml

+58-1
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,8 @@ PostgreSQL documentation
244244
<listitem>
245245
<para>
246246
Copy files to the new cluster. This is the default. (See also
247-
<option>--link</option> and <option>--clone</option>.)
247+
<option>--link</option>, <option>--clone</option>,
248+
<option>--copy-file-range</option>, and <option>--swap</option>.)
248249
</para>
249250
</listitem>
250251
</varlistentry>
@@ -262,6 +263,32 @@ PostgreSQL documentation
262263
</listitem>
263264
</varlistentry>
264265

266+
<varlistentry>
267+
<term><option>--swap</option></term>
268+
<listitem>
269+
<para>
270+
Move the data directories from the old cluster to the new cluster.
271+
Then, replace the catalog files with those generated for the new
272+
cluster. This mode can outperform <option>--link</option>,
273+
<option>--clone</option>, <option>--copy</option>, and
274+
<option>--copy-file-range</option>, especially on clusters with many
275+
relations.
276+
</para>
277+
<para>
278+
However, this mode creates many garbage files in the old cluster, which
279+
can prolong the file synchronization step if
280+
<option>--sync-method=syncfs</option> is used. Therefore, it is
281+
recommended to use <option>--sync-method=fsync</option> with
282+
<option>--swap</option>.
283+
</para>
284+
<para>
285+
Additionally, once the file transfer step begins, the old cluster will
286+
be destructively modified and therefore will no longer be safe to
287+
start. See <xref linkend="pgupgrade-step-revert"/> for details.
288+
</para>
289+
</listitem>
290+
</varlistentry>
291+
265292
<varlistentry>
266293
<term><option>--sync-method=</option><replaceable>method</replaceable></term>
267294
<listitem>
@@ -530,6 +557,10 @@ NET STOP postgresql-&majorversion;
530557
is started. Clone mode also requires that the old and new data
531558
directories be in the same file system. This mode is only available
532559
on certain operating systems and file systems.
560+
Swap mode may be the fastest if there are many relations, but you will not
561+
be able to access your old cluster once the file transfer step begins.
562+
Swap mode also requires that the old and new cluster data directories be
563+
in the same file system.
533564
</para>
534565

535566
<para>
@@ -889,6 +920,32 @@ psql --username=postgres --file=script.sql postgres
889920

890921
</itemizedlist></para>
891922
</listitem>
923+
924+
<listitem>
925+
<para>
926+
If the <option>--swap</option> option was used, the old cluster might
927+
be destructively modified:
928+
929+
<itemizedlist>
930+
<listitem>
931+
<para>
932+
If <command>pg_upgrade</command> aborts before reporting that the
933+
old cluster is no longer safe to start, the old cluster was
934+
unmodified; it can be restarted.
935+
</para>
936+
</listitem>
937+
938+
<listitem>
939+
<para>
940+
If <command>pg_upgrade</command> has reported that the old cluster
941+
is no longer safe to start, the old cluster was destructively
942+
modified. The old cluster will need to be restored from backup in
943+
this case.
944+
</para>
945+
</listitem>
946+
</itemizedlist>
947+
</para>
948+
</listitem>
892949
</itemizedlist></para>
893950
</step>
894951
</procedure>

src/bin/pg_upgrade/TESTING

+3-3
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ export oldinstall=...otherversion/ (old version's install base path)
2020
See DETAILS below for more information about creation of the dump.
2121

2222
You can also test the different transfer modes (--copy, --link,
23-
--clone, --copy-file-range) by setting the environment variable
23+
--clone, --copy-file-range, --swap) by setting the environment variable
2424
PG_TEST_PG_UPGRADE_MODE to the respective command-line option, like
2525

2626
make check PG_TEST_PG_UPGRADE_MODE=--link
2727

28-
The default is --copy. Note that the other modes are not supported on
29-
all operating systems.
28+
The default is --copy. Note that not all modes are supported on all
29+
operating systems.
3030

3131
DETAILS
3232
-------

src/bin/pg_upgrade/check.c

+28-1
Original file line numberDiff line numberDiff line change
@@ -709,7 +709,34 @@ check_new_cluster(void)
709709
check_copy_file_range();
710710
break;
711711
case TRANSFER_MODE_LINK:
712-
check_hard_link();
712+
check_hard_link(TRANSFER_MODE_LINK);
713+
break;
714+
case TRANSFER_MODE_SWAP:
715+
716+
/*
717+
* We do the hard link check for --swap, too, since it's an easy
718+
* way to verify the clusters are in the same file system. This
719+
* allows us to take some shortcuts in the file synchronization
720+
* step. With some more effort, we could probably support the
721+
* separate-file-system use case, but this mode is unlikely to
722+
* offer much benefit if we have to copy the files across file
723+
* system boundaries.
724+
*/
725+
check_hard_link(TRANSFER_MODE_SWAP);
726+
727+
/*
728+
* There are a few known issues with using --swap to upgrade from
729+
* versions older than 10. For example, the sequence tuple format
730+
* changed in v10, and the visibility map format changed in 9.6.
731+
* While such problems are not insurmountable (and we may have to
732+
* deal with similar problems in the future, anyway), it doesn't
733+
* seem worth the effort to support swap mode for upgrades from
734+
* long-unsupported versions.
735+
*/
736+
if (GET_MAJOR_VERSION(old_cluster.major_version) < 1000)
737+
pg_fatal("Swap mode can only upgrade clusters from PostgreSQL version %s and later.",
738+
"10");
739+
713740
break;
714741
}
715742

src/bin/pg_upgrade/controldata.c

+14-7
Original file line numberDiff line numberDiff line change
@@ -751,7 +751,7 @@ check_control_data(ControlData *oldctrl,
751751

752752

753753
void
754-
disable_old_cluster(void)
754+
disable_old_cluster(transferMode transfer_mode)
755755
{
756756
char old_path[MAXPGPATH],
757757
new_path[MAXPGPATH];
@@ -766,10 +766,17 @@ disable_old_cluster(void)
766766
old_path, new_path);
767767
check_ok();
768768

769-
pg_log(PG_REPORT, "\n"
770-
"If you want to start the old cluster, you will need to remove\n"
771-
"the \".old\" suffix from %s/global/pg_control.old.\n"
772-
"Because \"link\" mode was used, the old cluster cannot be safely\n"
773-
"started once the new cluster has been started.",
774-
old_cluster.pgdata);
769+
if (transfer_mode == TRANSFER_MODE_LINK)
770+
pg_log(PG_REPORT, "\n"
771+
"If you want to start the old cluster, you will need to remove\n"
772+
"the \".old\" suffix from %s/global/pg_control.old.\n"
773+
"Because \"link\" mode was used, the old cluster cannot be safely\n"
774+
"started once the new cluster has been started.",
775+
old_cluster.pgdata);
776+
else if (transfer_mode == TRANSFER_MODE_SWAP)
777+
pg_log(PG_REPORT, "\n"
778+
"Because \"swap\" mode was used, the old cluster can no longer be\n"
779+
"safely started.");
780+
else
781+
pg_fatal("unrecognized transfer mode");
775782
}

src/bin/pg_upgrade/dump.c

+3-1
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,11 @@ generate_old_dump(void)
5252
snprintf(log_file_name, sizeof(log_file_name), DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
5353

5454
parallel_exec_prog(log_file_name, NULL,
55-
"\"%s/pg_dump\" %s --no-data %s --sequence-data --quote-all-identifiers "
55+
"\"%s/pg_dump\" %s --no-data %s %s --quote-all-identifiers "
5656
"--binary-upgrade --format=custom %s --no-sync --file=\"%s/%s\" %s",
5757
new_cluster.bindir, cluster_conn_opts(&old_cluster),
58+
(user_opts.transfer_mode == TRANSFER_MODE_SWAP) ?
59+
"" : "--sequence-data",
5860
log_opts.verbose ? "--verbose" : "",
5961
user_opts.do_statistics ? "" : "--no-statistics",
6062
log_opts.dumpdir,

src/bin/pg_upgrade/file.c

+11-3
Original file line numberDiff line numberDiff line change
@@ -434,7 +434,7 @@ check_copy_file_range(void)
434434
}
435435

436436
void
437-
check_hard_link(void)
437+
check_hard_link(transferMode transfer_mode)
438438
{
439439
char existing_file[MAXPGPATH];
440440
char new_link_file[MAXPGPATH];
@@ -444,8 +444,16 @@ check_hard_link(void)
444444
unlink(new_link_file); /* might fail */
445445

446446
if (link(existing_file, new_link_file) < 0)
447-
pg_fatal("could not create hard link between old and new data directories: %m\n"
448-
"In link mode the old and new data directories must be on the same file system.");
447+
{
448+
if (transfer_mode == TRANSFER_MODE_LINK)
449+
pg_fatal("could not create hard link between old and new data directories: %m\n"
450+
"In link mode the old and new data directories must be on the same file system.");
451+
else if (transfer_mode == TRANSFER_MODE_SWAP)
452+
pg_fatal("could not create hard link between old and new data directories: %m\n"
453+
"In swap mode the old and new data directories must be on the same file system.");
454+
else
455+
pg_fatal("unrecognized transfer mode");
456+
}
449457

450458
unlink(new_link_file);
451459
}

src/bin/pg_upgrade/info.c

+3-1
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,7 @@ get_rel_infos_query(void)
490490
" FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n "
491491
" ON c.relnamespace = n.oid "
492492
" WHERE relkind IN (" CppAsString2(RELKIND_RELATION) ", "
493-
CppAsString2(RELKIND_MATVIEW) ") AND "
493+
CppAsString2(RELKIND_MATVIEW) "%s) AND "
494494
/* exclude possible orphaned temp tables */
495495
" ((n.nspname !~ '^pg_temp_' AND "
496496
" n.nspname !~ '^pg_toast_temp_' AND "
@@ -499,6 +499,8 @@ get_rel_infos_query(void)
499499
" c.oid >= %u::pg_catalog.oid) OR "
500500
" (n.nspname = 'pg_catalog' AND "
501501
" relname IN ('pg_largeobject') ))), ",
502+
(user_opts.transfer_mode == TRANSFER_MODE_SWAP) ?
503+
", " CppAsString2(RELKIND_SEQUENCE) : "",
502504
FirstNormalObjectId);
503505

504506
/*

src/bin/pg_upgrade/option.c

+7
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ parseCommandLine(int argc, char *argv[])
6262
{"sync-method", required_argument, NULL, 4},
6363
{"no-statistics", no_argument, NULL, 5},
6464
{"set-char-signedness", required_argument, NULL, 6},
65+
{"swap", no_argument, NULL, 7},
6566

6667
{NULL, 0, NULL, 0}
6768
};
@@ -228,6 +229,11 @@ parseCommandLine(int argc, char *argv[])
228229
else
229230
pg_fatal("invalid argument for option %s", "--set-char-signedness");
230231
break;
232+
233+
case 7:
234+
user_opts.transfer_mode = TRANSFER_MODE_SWAP;
235+
break;
236+
231237
default:
232238
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
233239
os_info.progname);
@@ -325,6 +331,7 @@ usage(void)
325331
printf(_(" --no-statistics do not import statistics from old cluster\n"));
326332
printf(_(" --set-char-signedness=OPTION set new cluster char signedness to \"signed\" or\n"
327333
" \"unsigned\"\n"));
334+
printf(_(" --swap move data directories to new cluster\n"));
328335
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
329336
printf(_(" -?, --help show this help, then exit\n"));
330337
printf(_("\n"

src/bin/pg_upgrade/pg_upgrade.c

+10-6
Original file line numberDiff line numberDiff line change
@@ -170,12 +170,14 @@ main(int argc, char **argv)
170170

171171
/*
172172
* Most failures happen in create_new_objects(), which has completed at
173-
* this point. We do this here because it is just before linking, which
174-
* will link the old and new cluster data files, preventing the old
175-
* cluster from being safely started once the new cluster is started.
173+
* this point. We do this here because it is just before file transfer,
174+
* which for --link will make it unsafe to start the old cluster once the
175+
* new cluster is started, and for --swap will make it unsafe to start the
176+
* old cluster at all.
176177
*/
177-
if (user_opts.transfer_mode == TRANSFER_MODE_LINK)
178-
disable_old_cluster();
178+
if (user_opts.transfer_mode == TRANSFER_MODE_LINK ||
179+
user_opts.transfer_mode == TRANSFER_MODE_SWAP)
180+
disable_old_cluster(user_opts.transfer_mode);
179181

180182
transfer_all_new_tablespaces(&old_cluster.dbarr, &new_cluster.dbarr,
181183
old_cluster.pgdata, new_cluster.pgdata);
@@ -212,8 +214,10 @@ main(int argc, char **argv)
212214
{
213215
prep_status("Sync data directory to disk");
214216
exec_prog(UTILITY_LOG_FILE, NULL, true, true,
215-
"\"%s/initdb\" --sync-only \"%s\" --sync-method %s",
217+
"\"%s/initdb\" --sync-only %s \"%s\" --sync-method %s",
216218
new_cluster.bindir,
219+
(user_opts.transfer_mode == TRANSFER_MODE_SWAP) ?
220+
"--no-sync-data-files" : "",
217221
new_cluster.pgdata,
218222
user_opts.sync_method);
219223
check_ok();

src/bin/pg_upgrade/pg_upgrade.h

+3-2
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,7 @@ typedef enum
262262
TRANSFER_MODE_COPY,
263263
TRANSFER_MODE_COPY_FILE_RANGE,
264264
TRANSFER_MODE_LINK,
265+
TRANSFER_MODE_SWAP,
265266
} transferMode;
266267

267268
/*
@@ -391,7 +392,7 @@ void create_script_for_old_cluster_deletion(char **deletion_script_file_name);
391392

392393
void get_control_data(ClusterInfo *cluster);
393394
void check_control_data(ControlData *oldctrl, ControlData *newctrl);
394-
void disable_old_cluster(void);
395+
void disable_old_cluster(transferMode transfer_mode);
395396

396397

397398
/* dump.c */
@@ -423,7 +424,7 @@ void rewriteVisibilityMap(const char *fromfile, const char *tofile,
423424
const char *schemaName, const char *relName);
424425
void check_file_clone(void);
425426
void check_copy_file_range(void);
426-
void check_hard_link(void);
427+
void check_hard_link(transferMode transfer_mode);
427428

428429
/* fopen_priv() is no longer different from fopen() */
429430
#define fopen_priv(path, mode) fopen(path, mode)

0 commit comments

Comments
 (0)