Skip to content

Commit dbc3c05

Browse files
MasaoFujiiCommitfest Bot
authored and
Commitfest Bot
committed
Improve error message when standby does accept connections.
Even after reaching the minimum recovery point, if there are long-lived write transactions with 64 subtransactions on the primary, the recovery snapshot may not yet be ready for hot standby, delaying read-only connections on the standby. Previously, when read-only connections were not accepted due to this condition, the following error message was logged: FATAL: the database system is not yet accepting connections DETAIL: Consistent recovery state has not been yet reached. This DETAIL message was misleading because the following message was already logged in this case: LOG: consistent recovery state reached This contradiction, i.e., indicating that the recovery state was consistent while also stating it wasn’t, caused confusion. This commit improves the error message to better reflect the actual state: FATAL: the database system is not yet accepting connections DETAIL: Recovery snapshot is not yet ready for hot standby. HINT: To enable hot standby, close write transactions with more than 64 subtransactions on the primary server. To implement this, the commit introduces a new postmaster signal, PMSIGNAL_RECOVERY_CONSISTENT. When the startup process reaches a consistent recovery state, it sends this signal to the postmaster, allowing it to correctly recognize that state. Since this is not a clear bug, the change is applied only to the master branch and is not back-patched. Author: Atsushi Torikoshi <[email protected]> Co-authored-by: Fujii Masao <[email protected]> Reviewed-by: Yugo Nagata <[email protected]> Discussion: https://postgr.es/m/[email protected]
1 parent 2fd3e2f commit dbc3c05

File tree

6 files changed

+38
-13
lines changed

6 files changed

+38
-13
lines changed

doc/src/sgml/high-availability.sgml

+8-4
Original file line numberDiff line numberDiff line change
@@ -1535,7 +1535,8 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
15351535
<para>
15361536
When the <xref linkend="guc-hot-standby"/> parameter is set to true on a
15371537
standby server, it will begin accepting connections once the recovery has
1538-
brought the system to a consistent state. All such connections are
1538+
brought the system to a consistent state and be ready for hot standby.
1539+
All such connections are
15391540
strictly read-only; not even temporary tables may be written.
15401541
</para>
15411542

@@ -1974,9 +1975,12 @@ LOG: database system is ready to accept read-only connections
19741975
Consistency information is recorded once per checkpoint on the primary.
19751976
It is not possible to enable hot standby when reading WAL
19761977
written during a period when <varname>wal_level</varname> was not set to
1977-
<literal>replica</literal> or <literal>logical</literal> on the primary. Reaching
1978-
a consistent state can also be delayed in the presence of both of these
1979-
conditions:
1978+
<literal>replica</literal> or <literal>logical</literal> on the primary.
1979+
Even after reaching a consistent state, the recovery snapshot may not
1980+
be ready for hot standby if both of the following conditions are met,
1981+
delaying accepting read-only connections. To enable hot standby,
1982+
long-lived write transactions with more than 64 subtransactions
1983+
need to be closed on the primary.
19801984

19811985
<itemizedlist>
19821986
<listitem>

src/backend/access/transam/xlogrecovery.c

+6
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,11 @@ static bool backupEndRequired = false;
291291
* Consistent state means that the system is internally consistent, all
292292
* the WAL has been replayed up to a certain point, and importantly, there
293293
* is no trace of later actions on disk.
294+
*
295+
* This flag is used only by the startup process and postmaster. When
296+
* minRecoveryPoint is reached, the startup process sets it to true and
297+
* sends a PMSIGNAL_RECOVERY_CONSISTENT signal to the postmaster,
298+
* which then sets it to true upon receiving the signal.
294299
*/
295300
bool reachedConsistency = false;
296301

@@ -2248,6 +2253,7 @@ CheckRecoveryConsistency(void)
22482253
CheckTablespaceDirectory();
22492254

22502255
reachedConsistency = true;
2256+
SendPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT);
22512257
ereport(LOG,
22522258
(errmsg("consistent recovery state reached at %X/%X",
22532259
LSN_FORMAT_ARGS(lastReplayedEndRecPtr))));

src/backend/postmaster/postmaster.c

+9-3
Original file line numberDiff line numberDiff line change
@@ -1825,8 +1825,7 @@ canAcceptConnections(BackendType backend_type)
18251825
else if (!FatalError && pmState == PM_STARTUP)
18261826
return CAC_STARTUP; /* normal startup */
18271827
else if (!FatalError && pmState == PM_RECOVERY)
1828-
return CAC_NOTCONSISTENT; /* not yet at consistent recovery
1829-
* state */
1828+
return CAC_NOTHOTSTANDBY; /* not yet ready for hot standby */
18301829
else
18311830
return CAC_RECOVERY; /* else must be crash recovery */
18321831
}
@@ -3699,6 +3698,7 @@ process_pm_pmsignal(void)
36993698
/* WAL redo has started. We're out of reinitialization. */
37003699
FatalError = false;
37013700
AbortStartTime = 0;
3701+
reachedConsistency = false;
37023702

37033703
/*
37043704
* Start the archiver if we're responsible for (re-)archiving received
@@ -3724,8 +3724,14 @@ process_pm_pmsignal(void)
37243724
UpdatePMState(PM_RECOVERY);
37253725
}
37263726

3727-
if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
3727+
if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT) &&
37283728
pmState == PM_RECOVERY && Shutdown == NoShutdown)
3729+
{
3730+
reachedConsistency = true;
3731+
}
3732+
3733+
if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
3734+
(pmState == PM_RECOVERY && Shutdown == NoShutdown))
37293735
{
37303736
ereport(LOG,
37313737
(errmsg("database system is ready to accept read-only connections")));

src/backend/tcop/backend_startup.c

+13-5
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#include <unistd.h>
1919

2020
#include "access/xlog.h"
21+
#include "access/xlogrecovery.h"
2122
#include "common/ip.h"
2223
#include "common/string.h"
2324
#include "libpq/libpq.h"
@@ -306,17 +307,24 @@ BackendInitialize(ClientSocket *client_sock, CAC_state cac)
306307
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
307308
errmsg("the database system is starting up")));
308309
break;
309-
case CAC_NOTCONSISTENT:
310-
if (EnableHotStandby)
310+
case CAC_NOTHOTSTANDBY:
311+
if (!EnableHotStandby)
312+
ereport(FATAL,
313+
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
314+
errmsg("the database system is not accepting connections"),
315+
errdetail("Hot standby mode is disabled.")));
316+
else if (reachedConsistency)
311317
ereport(FATAL,
312318
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
313319
errmsg("the database system is not yet accepting connections"),
314-
errdetail("Consistent recovery state has not been yet reached.")));
320+
errdetail("Recovery snapshot is not yet ready for hot standby."),
321+
errhint("To enable hot standby, close write transactions with more than %d subtransactions on the primary server.",
322+
PGPROC_MAX_CACHED_SUBXIDS)));
315323
else
316324
ereport(FATAL,
317325
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
318-
errmsg("the database system is not accepting connections"),
319-
errdetail("Hot standby mode is disabled.")));
326+
errmsg("the database system is not yet accepting connections"),
327+
errdetail("Consistent recovery state has not been yet reached.")));
320328
break;
321329
case CAC_SHUTDOWN:
322330
ereport(FATAL,

src/include/storage/pmsignal.h

+1
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
typedef enum
3434
{
3535
PMSIGNAL_RECOVERY_STARTED, /* recovery has started */
36+
PMSIGNAL_RECOVERY_CONSISTENT, /* recovery has reached consistent state */
3637
PMSIGNAL_BEGIN_HOT_STANDBY, /* begin Hot Standby */
3738
PMSIGNAL_ROTATE_LOGFILE, /* send SIGUSR1 to syslogger to rotate logfile */
3839
PMSIGNAL_START_AUTOVAC_LAUNCHER, /* start an autovacuum launcher */

src/include/tcop/backend_startup.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ typedef enum CAC_state
3636
CAC_STARTUP,
3737
CAC_SHUTDOWN,
3838
CAC_RECOVERY,
39-
CAC_NOTCONSISTENT,
39+
CAC_NOTHOTSTANDBY,
4040
CAC_TOOMANY,
4141
} CAC_state;
4242

0 commit comments

Comments
 (0)