Skip to content

Commit fb0c90f

Browse files
author
Commitfest Bot
committed
[CF 5369] v9 - optimize file transfer in pg_upgrade
This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/5369 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/Z9x5MVjWO7zVwrJ0@nathan Author(s): Nathan Bossart
2 parents ca3067c + d2729f2 commit fb0c90f

File tree

25 files changed

+642
-80
lines changed

25 files changed

+642
-80
lines changed

doc/src/sgml/ref/initdb.sgml

+27
Original file line numberDiff line numberDiff line change
@@ -527,6 +527,33 @@ PostgreSQL documentation
527527
</listitem>
528528
</varlistentry>
529529

530+
<varlistentry id="app-initdb-option-no-sync-data-files">
531+
<term><option>--no-sync-data-files</option></term>
532+
<listitem>
533+
<para>
534+
By default, <command>initdb</command> safely writes all database files
535+
to disk. This option instructs <command>initdb</command> to skip
536+
synchronizing all files in the individual database directories, the
537+
database directories themselves, and the tablespace directories, i.e.,
538+
everything in the <filename>base</filename> subdirectory and any other
539+
tablespace directories. Other files, such as those in
540+
<literal>pg_wal</literal> and <literal>pg_xact</literal>, will still be
541+
synchronized unless the <option>--no-sync</option> option is also
542+
specified.
543+
</para>
544+
<para>
545+
Note that if <option>--no-sync-data-files</option> is used in
546+
conjuction with <option>--sync-method=syncfs</option>, some or all of
547+
the aforementioned files and directories will be synchronized because
548+
<literal>syncfs</literal> processes entire file systems.
549+
</para>
550+
<para>
551+
This option is primarily intended for internal use by tools that
552+
separately ensure the skipped files are synchronized to disk.
553+
</para>
554+
</listitem>
555+
</varlistentry>
556+
530557
<varlistentry id="app-initdb-option-no-instructions">
531558
<term><option>--no-instructions</option></term>
532559
<listitem>

doc/src/sgml/ref/pg_dump.sgml

+11
Original file line numberDiff line numberDiff line change
@@ -1298,6 +1298,17 @@ PostgreSQL documentation
12981298
</listitem>
12991299
</varlistentry>
13001300

1301+
<varlistentry>
1302+
<term><option>--sequence-data</option></term>
1303+
<listitem>
1304+
<para>
1305+
Include sequence data in the dump. This is the default behavior except
1306+
when <option>--no-data</option>, <option>--schema-only</option>, or
1307+
<option>--statistics-only</option> is specified.
1308+
</para>
1309+
</listitem>
1310+
</varlistentry>
1311+
13011312
<varlistentry>
13021313
<term><option>--serializable-deferrable</option></term>
13031314
<listitem>

doc/src/sgml/ref/pgupgrade.sgml

+58-1
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,8 @@ PostgreSQL documentation
244244
<listitem>
245245
<para>
246246
Copy files to the new cluster. This is the default. (See also
247-
<option>--link</option> and <option>--clone</option>.)
247+
<option>--link</option>, <option>--clone</option>,
248+
<option>--copy-file-range</option>, and <option>--swap</option>.)
248249
</para>
249250
</listitem>
250251
</varlistentry>
@@ -262,6 +263,32 @@ PostgreSQL documentation
262263
</listitem>
263264
</varlistentry>
264265

266+
<varlistentry>
267+
<term><option>--swap</option></term>
268+
<listitem>
269+
<para>
270+
Move the data directories from the old cluster to the new cluster.
271+
Then, replace the catalog files with those generated for the new
272+
cluster. This mode can outperform <option>--link</option>,
273+
<option>--clone</option>, <option>--copy</option>, and
274+
<option>--copy-file-range</option>, especially on clusters with many
275+
relations.
276+
</para>
277+
<para>
278+
However, this mode creates many garbage files in the old cluster, which
279+
can prolong the file synchronization step if
280+
<option>--sync-method=syncfs</option> is used. Therefore, it is
281+
recommended to use <option>--sync-method=fsync</option> with
282+
<option>--swap</option>.
283+
</para>
284+
<para>
285+
Additionally, once the file transfer step begins, the old cluster will
286+
be destructively modified and therefore will no longer be safe to
287+
start. See <xref linkend="pgupgrade-step-revert"/> for details.
288+
</para>
289+
</listitem>
290+
</varlistentry>
291+
265292
<varlistentry>
266293
<term><option>--sync-method=</option><replaceable>method</replaceable></term>
267294
<listitem>
@@ -530,6 +557,10 @@ NET STOP postgresql-&majorversion;
530557
is started. Clone mode also requires that the old and new data
531558
directories be in the same file system. This mode is only available
532559
on certain operating systems and file systems.
560+
Swap mode may be the fastest if there are many relations, but you will not
561+
be able to access your old cluster once the file transfer step begins.
562+
Swap mode also requires that the old and new cluster data directories be
563+
in the same file system.
533564
</para>
534565

535566
<para>
@@ -889,6 +920,32 @@ psql --username=postgres --file=script.sql postgres
889920

890921
</itemizedlist></para>
891922
</listitem>
923+
924+
<listitem>
925+
<para>
926+
If the <option>--swap</option> option was used, the old cluster might
927+
be destructively modified:
928+
929+
<itemizedlist>
930+
<listitem>
931+
<para>
932+
If <command>pg_upgrade</command> aborts before reporting that the
933+
old cluster is no longer safe to start, the old cluster was
934+
unmodified; it can be restarted.
935+
</para>
936+
</listitem>
937+
938+
<listitem>
939+
<para>
940+
If <command>pg_upgrade</command> has reported that the old cluster
941+
is no longer safe to start, the old cluster was destructively
942+
modified. The old cluster will need to be restored from backup in
943+
this case.
944+
</para>
945+
</listitem>
946+
</itemizedlist>
947+
</para>
948+
</listitem>
892949
</itemizedlist></para>
893950
</step>
894951
</procedure>

src/bin/initdb/initdb.c

+8-2
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ static bool data_checksums = true;
168168
static char *xlog_dir = NULL;
169169
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
170170
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
171+
static bool sync_data_files = true;
171172

172173

173174
/* internal vars */
@@ -2566,6 +2567,7 @@ usage(const char *progname)
25662567
printf(_(" -L DIRECTORY where to find the input files\n"));
25672568
printf(_(" -n, --no-clean do not clean up after errors\n"));
25682569
printf(_(" -N, --no-sync do not wait for changes to be written safely to disk\n"));
2570+
printf(_(" --no-sync-data-files do not sync files within database directories\n"));
25692571
printf(_(" --no-instructions do not print instructions for next steps\n"));
25702572
printf(_(" -s, --show show internal settings, then exit\n"));
25712573
printf(_(" --sync-method=METHOD set method for syncing files to disk\n"));
@@ -3208,6 +3210,7 @@ main(int argc, char *argv[])
32083210
{"icu-rules", required_argument, NULL, 18},
32093211
{"sync-method", required_argument, NULL, 19},
32103212
{"no-data-checksums", no_argument, NULL, 20},
3213+
{"no-sync-data-files", no_argument, NULL, 21},
32113214
{NULL, 0, NULL, 0}
32123215
};
32133216

@@ -3402,6 +3405,9 @@ main(int argc, char *argv[])
34023405
case 20:
34033406
data_checksums = false;
34043407
break;
3408+
case 21:
3409+
sync_data_files = false;
3410+
break;
34053411
default:
34063412
/* getopt_long already emitted a complaint */
34073413
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -3453,7 +3459,7 @@ main(int argc, char *argv[])
34533459

34543460
fputs(_("syncing data to disk ... "), stdout);
34553461
fflush(stdout);
3456-
sync_pgdata(pg_data, PG_VERSION_NUM, sync_method);
3462+
sync_pgdata(pg_data, PG_VERSION_NUM, sync_method, sync_data_files);
34573463
check_ok();
34583464
return 0;
34593465
}
@@ -3516,7 +3522,7 @@ main(int argc, char *argv[])
35163522
{
35173523
fputs(_("syncing data to disk ... "), stdout);
35183524
fflush(stdout);
3519-
sync_pgdata(pg_data, PG_VERSION_NUM, sync_method);
3525+
sync_pgdata(pg_data, PG_VERSION_NUM, sync_method, sync_data_files);
35203526
check_ok();
35213527
}
35223528
else

src/bin/initdb/t/001_initdb.pl

+1
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@
7676
'checksums are enabled in control file');
7777

7878
command_ok([ 'initdb', '--sync-only', $datadir ], 'sync only');
79+
command_ok([ 'initdb', '--sync-only', '--no-sync-data-files', $datadir ], '--no-sync-data-files');
7980
command_fails([ 'initdb', $datadir ], 'existing data directory');
8081

8182
if ($supports_syncfs)

src/bin/pg_basebackup/pg_basebackup.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -2310,7 +2310,7 @@ BaseBackup(char *compression_algorithm, char *compression_detail,
23102310
}
23112311
else
23122312
{
2313-
(void) sync_pgdata(basedir, serverVersion, sync_method);
2313+
(void) sync_pgdata(basedir, serverVersion, sync_method, true);
23142314
}
23152315
}
23162316

src/bin/pg_checksums/pg_checksums.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -633,7 +633,7 @@ main(int argc, char *argv[])
633633
if (do_sync)
634634
{
635635
pg_log_info("syncing data directory");
636-
sync_pgdata(DataDir, PG_VERSION_NUM, sync_method);
636+
sync_pgdata(DataDir, PG_VERSION_NUM, sync_method, true);
637637
}
638638

639639
pg_log_info("updating control file");

src/bin/pg_combinebackup/pg_combinebackup.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -424,7 +424,7 @@ main(int argc, char *argv[])
424424
else
425425
{
426426
pg_log_debug("recursively fsyncing \"%s\"", opt.output);
427-
sync_pgdata(opt.output, version * 10000, opt.sync_method);
427+
sync_pgdata(opt.output, version * 10000, opt.sync_method, true);
428428
}
429429
}
430430

src/bin/pg_dump/pg_dump.c

+2-8
Original file line numberDiff line numberDiff line change
@@ -518,6 +518,7 @@ main(int argc, char **argv)
518518
{"sync-method", required_argument, NULL, 15},
519519
{"filter", required_argument, NULL, 16},
520520
{"exclude-extension", required_argument, NULL, 17},
521+
{"sequence-data", no_argument, &dopt.sequence_data, 1},
521522

522523
{NULL, 0, NULL, 0}
523524
};
@@ -801,14 +802,6 @@ main(int argc, char **argv)
801802
if (dopt.column_inserts && dopt.dump_inserts == 0)
802803
dopt.dump_inserts = DUMP_DEFAULT_ROWS_PER_INSERT;
803804

804-
/*
805-
* Binary upgrade mode implies dumping sequence data even in schema-only
806-
* mode. This is not exposed as a separate option, but kept separate
807-
* internally for clarity.
808-
*/
809-
if (dopt.binary_upgrade)
810-
dopt.sequence_data = 1;
811-
812805
if (data_only && schema_only)
813806
pg_fatal("options -s/--schema-only and -a/--data-only cannot be used together");
814807
if (schema_only && statistics_only)
@@ -1275,6 +1268,7 @@ help(const char *progname)
12751268
printf(_(" --quote-all-identifiers quote all identifiers, even if not key words\n"));
12761269
printf(_(" --rows-per-insert=NROWS number of rows per INSERT; implies --inserts\n"));
12771270
printf(_(" --section=SECTION dump named section (pre-data, data, or post-data)\n"));
1271+
printf(_(" --sequence-data include sequence data in dump\n"));
12781272
printf(_(" --serializable-deferrable wait until the dump can run without anomalies\n"));
12791273
printf(_(" --snapshot=SNAPSHOT use given snapshot for the dump\n"));
12801274
printf(_(" --statistics-only dump only the statistics, not schema or data\n"));

src/bin/pg_dump/t/002_pg_dump.pl

+1
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@
6666
'--file' => "$tempdir/binary_upgrade.dump",
6767
'--no-password',
6868
'--no-data',
69+
'--sequence-data',
6970
'--binary-upgrade',
7071
'--dbname' => 'postgres', # alternative way to specify database
7172
],

src/bin/pg_rewind/file_ops.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,7 @@ sync_target_dir(void)
296296
if (!do_sync || dry_run)
297297
return;
298298

299-
sync_pgdata(datadir_target, PG_VERSION_NUM, sync_method);
299+
sync_pgdata(datadir_target, PG_VERSION_NUM, sync_method, true);
300300
}
301301

302302

src/bin/pg_upgrade/TESTING

+3-3
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ export oldinstall=...otherversion/ (old version's install base path)
2020
See DETAILS below for more information about creation of the dump.
2121

2222
You can also test the different transfer modes (--copy, --link,
23-
--clone, --copy-file-range) by setting the environment variable
23+
--clone, --copy-file-range, --swap) by setting the environment variable
2424
PG_TEST_PG_UPGRADE_MODE to the respective command-line option, like
2525

2626
make check PG_TEST_PG_UPGRADE_MODE=--link
2727

28-
The default is --copy. Note that the other modes are not supported on
29-
all operating systems.
28+
The default is --copy. Note that not all modes are supported on all
29+
operating systems.
3030

3131
DETAILS
3232
-------

src/bin/pg_upgrade/check.c

+28-1
Original file line numberDiff line numberDiff line change
@@ -709,7 +709,34 @@ check_new_cluster(void)
709709
check_copy_file_range();
710710
break;
711711
case TRANSFER_MODE_LINK:
712-
check_hard_link();
712+
check_hard_link(TRANSFER_MODE_LINK);
713+
break;
714+
case TRANSFER_MODE_SWAP:
715+
716+
/*
717+
* We do the hard link check for --swap, too, since it's an easy
718+
* way to verify the clusters are in the same file system. This
719+
* allows us to take some shortcuts in the file synchronization
720+
* step. With some more effort, we could probably support the
721+
* separate-file-system use case, but this mode is unlikely to
722+
* offer much benefit if we have to copy the files across file
723+
* system boundaries.
724+
*/
725+
check_hard_link(TRANSFER_MODE_SWAP);
726+
727+
/*
728+
* There are a few known issues with using --swap to upgrade from
729+
* versions older than 10. For example, the sequence tuple format
730+
* changed in v10, and the visibility map format changed in 9.6.
731+
* While such problems are not insurmountable (and we may have to
732+
* deal with similar problems in the future, anyway), it doesn't
733+
* seem worth the effort to support swap mode for upgrades from
734+
* long-unsupported versions.
735+
*/
736+
if (GET_MAJOR_VERSION(old_cluster.major_version) < 1000)
737+
pg_fatal("Swap mode can only upgrade clusters from PostgreSQL version %s and later.",
738+
"10");
739+
713740
break;
714741
}
715742

src/bin/pg_upgrade/controldata.c

+14-7
Original file line numberDiff line numberDiff line change
@@ -751,7 +751,7 @@ check_control_data(ControlData *oldctrl,
751751

752752

753753
void
754-
disable_old_cluster(void)
754+
disable_old_cluster(transferMode transfer_mode)
755755
{
756756
char old_path[MAXPGPATH],
757757
new_path[MAXPGPATH];
@@ -766,10 +766,17 @@ disable_old_cluster(void)
766766
old_path, new_path);
767767
check_ok();
768768

769-
pg_log(PG_REPORT, "\n"
770-
"If you want to start the old cluster, you will need to remove\n"
771-
"the \".old\" suffix from %s/global/pg_control.old.\n"
772-
"Because \"link\" mode was used, the old cluster cannot be safely\n"
773-
"started once the new cluster has been started.",
774-
old_cluster.pgdata);
769+
if (transfer_mode == TRANSFER_MODE_LINK)
770+
pg_log(PG_REPORT, "\n"
771+
"If you want to start the old cluster, you will need to remove\n"
772+
"the \".old\" suffix from %s/global/pg_control.old.\n"
773+
"Because \"link\" mode was used, the old cluster cannot be safely\n"
774+
"started once the new cluster has been started.",
775+
old_cluster.pgdata);
776+
else if (transfer_mode == TRANSFER_MODE_SWAP)
777+
pg_log(PG_REPORT, "\n"
778+
"Because \"swap\" mode was used, the old cluster can no longer be\n"
779+
"safely started.");
780+
else
781+
pg_fatal("unrecognized transfer mode");
775782
}

src/bin/pg_upgrade/dump.c

+3-1
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,11 @@ generate_old_dump(void)
5252
snprintf(log_file_name, sizeof(log_file_name), DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
5353

5454
parallel_exec_prog(log_file_name, NULL,
55-
"\"%s/pg_dump\" %s --no-data %s --quote-all-identifiers "
55+
"\"%s/pg_dump\" %s --no-data %s %s --quote-all-identifiers "
5656
"--binary-upgrade --format=custom %s --no-sync --file=\"%s/%s\" %s",
5757
new_cluster.bindir, cluster_conn_opts(&old_cluster),
58+
(user_opts.transfer_mode == TRANSFER_MODE_SWAP) ?
59+
"" : "--sequence-data",
5860
log_opts.verbose ? "--verbose" : "",
5961
user_opts.do_statistics ? "" : "--no-statistics",
6062
log_opts.dumpdir,

0 commit comments

Comments
 (0)