summaryrefslogtreecommitdiff
path: root/src/backend/commands
diff options
context:
space:
mode:
authorAndres Freund2015-02-26 11:50:07 +0000
committerAndres Freund2015-02-26 11:50:07 +0000
commitfd6a3f3ad4067f1b8fc28e9de6e99e5936d82161 (patch)
treeeb7c137e6943818ce1176feacf1130d5e20e656e /src/backend/commands
parenta7920b872fff36668a2d33157609024b851b5c2e (diff)
Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if synchronous_commit != off) and to be synchronously replicated (if enabled), even if a transaction did not have a xid assigned. The primary reason for that is that sequence's nextval() did not assign a xid, but are worthwhile to wait for on commit. This can be problematic because sometimes read only transactions do write WAL, e.g. HOT page prune records. That then could lead to read only transactions having to wait during commit. Not something people expect in a read only transaction. This lead to such strange symptoms as backends being seemingly stuck during connection establishment when all synchronous replicas are down. Especially annoying when said stuck connection is the standby trying to reconnect to allow syncrep again... This behavior also is involved in a rather complicated <= 9.4 bug where the transaction started by catchup interrupt processing waited for syncrep using latches, but didn't get the wakeup because it was already running inside the same overloaded signal handler. Fix the issue here doesn't properly solve that issue, merely papers over the problems. In 9.5 catchup interrupts aren't processed out of signal handlers anymore. To fix all this, make nextval() acquire a top level xid, and only wait for transaction commit if a transaction both acquired a xid and emitted WAL records. If only a xid has been assigned we don't uselessly want to wait just because of writes to temporary/unlogged tables; if only WAL has been written we don't want to wait just because of HOT prunes. The xid assignment in nextval() is unlikely to cause overhead in real-world workloads. For one it only happens SEQ_LOG_VALS/32 values anyway, for another only usage of nextval() without using the result in an insert or similar is affected. Discussion: [email protected], 369698E947874884A77849D8FE3680C2@maumau, 5CF4ABBA67674088B3941894E22A0D25@maumau Per complaint from maumau and Thom Brown Backpatch all the way back; 9.0 doesn't have syncrep, but it seems better to be consistent behavior across all maintained branches.
Diffstat (limited to 'src/backend/commands')
-rw-r--r--src/backend/commands/sequence.c23
1 files changed, 23 insertions, 0 deletions
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 622ccf75184..0070c4f34ef 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -17,6 +17,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
+#include "access/xact.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "access/xlogutils.h"
@@ -358,6 +359,10 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
ItemPointerSet(&tuple->t_data->t_ctid, 0, FirstOffsetNumber);
+ /* check the comment above nextval_internal()'s equivalent call. */
+ if (RelationNeedsWAL(rel))
+ GetTopTransactionId();
+
START_CRIT_SECTION();
MarkBufferDirty(buf);
@@ -438,6 +443,10 @@ AlterSequence(AlterSeqStmt *stmt)
/* Note that we do not change the currval() state */
elm->cached = elm->last;
+ /* check the comment above nextval_internal()'s equivalent call. */
+ if (RelationNeedsWAL(seqrel))
+ GetTopTransactionId();
+
/* Now okay to update the on-disk tuple */
START_CRIT_SECTION();
@@ -679,6 +688,16 @@ nextval_internal(Oid relid)
last_used_seq = elm;
+ /*
+ * If something needs to be WAL logged, acquire an xid, so this
+ * transaction's commit will trigger a WAL flush and wait for
+ * syncrep. It's sufficient to ensure the toplevel transaction has a xid,
+ * no need to assign xids subxacts, that'll already trigger a appropriate
+ * wait. (Have to do that here, so we're outside the critical section)
+ */
+ if (logit && RelationNeedsWAL(seqrel))
+ GetTopTransactionId();
+
/* ready to change the on-disk (or really, in-buffer) tuple */
START_CRIT_SECTION();
@@ -867,6 +886,10 @@ do_setval(Oid relid, int64 next, bool iscalled)
/* In any case, forget any future cached numbers */
elm->cached = elm->last;
+ /* check the comment above nextval_internal()'s equivalent call. */
+ if (RelationNeedsWAL(seqrel))
+ GetTopTransactionId();
+
/* ready to change the on-disk (or really, in-buffer) tuple */
START_CRIT_SECTION();