Skip to content

Commit ebbb399

Browse files
author
Commitfest Bot
committed
[CF 5664] v3 - Fix slot synchronization with two_phase decoding enabled
This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/5664 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/OS0PR01MB571616DB432287BA89A79A1694812@OS0PR01MB5716.jpnprd01.prod.outlook.com Author(s): Zhijie Hou
2 parents 73e7361 + c5df465 commit ebbb399

File tree

1 file changed

+24
-12
lines changed

1 file changed

+24
-12
lines changed

src/backend/replication/logical/slotsync.c

+24-12
Original file line numberDiff line numberDiff line change
@@ -196,14 +196,14 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
196196
* restart_lsn or the initial xmin_horizon computed for the local slot
197197
* is ahead of the remote slot.
198198
*
199-
* If the slot is persistent, restart_lsn of the synced slot could
200-
* still be ahead of the remote slot. Since we use slot advance
201-
* functionality to keep snapbuild/slot updated, it is possible that
202-
* the restart_lsn is advanced to a later position than it has on the
203-
* primary. This can happen when slot advancing machinery finds
204-
* running xacts record after reaching the consistent state at a later
205-
* point than the primary where it serializes the snapshot and updates
206-
* the restart_lsn.
199+
* If the slot is persistent, both restart_lsn and catalog_xmin of the
200+
* synced slot could still be ahead of the remote slot. Since we use
201+
* slot advance functionality to keep snapbuild/slot updated, it is
202+
* possible that the restart_lsn and catalog_xmin are advanced to a
203+
* later position than it has on the primary. This can happen when
204+
* slot advancing machinery finds running xacts record after reaching
205+
* the consistent state at a later point than the primary where it
206+
* serializes the snapshot and updates the restart_lsn.
207207
*
208208
* We LOG the message if the slot is temporary as it can help the user
209209
* to understand why the slot is not sync-ready. In the case of a
@@ -221,16 +221,28 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
221221

222222
if (remote_slot_precedes)
223223
*remote_slot_precedes = true;
224+
225+
/*
226+
* Return immediately without updating the configuration. This is
227+
* crucial when two-phase commit is enabled on the remote slot. In
228+
* this scenario, syncing only two_phase_at, without the latest
229+
* confirmed_lsn, could cause transactions between the old
230+
* confirmed_lsn and two_phase_at to be unexpectedly decoded and sent
231+
* to the subscriber following a promotion. Therefore, we delay
232+
* syncing both the latest confirmed_lsn and two_phase_at until the
233+
* remote slot's restart_lsn and catalog_xmin are ahead.
234+
*/
235+
return false;
224236
}
225237

226238
/*
227239
* Attempt to sync LSNs and xmins only if remote slot is ahead of local
228240
* slot.
229241
*/
230-
else if (remote_slot->confirmed_lsn > slot->data.confirmed_flush ||
231-
remote_slot->restart_lsn > slot->data.restart_lsn ||
232-
TransactionIdFollows(remote_slot->catalog_xmin,
233-
slot->data.catalog_xmin))
242+
if (remote_slot->confirmed_lsn > slot->data.confirmed_flush ||
243+
remote_slot->restart_lsn > slot->data.restart_lsn ||
244+
TransactionIdFollows(remote_slot->catalog_xmin,
245+
slot->data.catalog_xmin))
234246
{
235247
/*
236248
* We can't directly copy the remote slot's LSN or xmin unless there

0 commit comments

Comments
 (0)