Quick Links

doc: Mention clock synchronization recommendation for hot_standby_feedback

Lists:	pgsql-hackers

From:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2024-12-05 09:43:41
Message-ID:	CAKZiRmwBcALLrDgCyEhHP1enUxtPMjyNM_d1A2Lng3_6Rf4Qfw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

One of our customers ran into a very odd case, where hot standby feedback
backend_xmin propagation stopped working due to major (hours/days) clock
time shifts on hypervisor-managed VMs. This happens (and is fully
reproducible) e.g. in scenarios where standby connects and its own VM is
having time from the future (relative to primary) and then that time goes
back to "normal". In such situation "sends hot_standby_feedback xmin"
timestamp messages are stopped being transferred, e.g.:

2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot
standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending
write 6/E9015230 flush 6/E9015230 apply 6/E9015230
2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot
standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
<-- clock readjustment and no further "sending hot standby feedback"
2024-12-04 14:18:54 UTC [6002]: db=,user=,app=,client= DEBUG: sendtime
2024-12-04 14:18:51.836936+00 receipttime 2024-12-04 14:18:54.199223+00
replication apply delay 0 ms transfer latency 2363 ms
2024-12-04 14:18:54 UTC [6002]: db=,user=,app=,client= DEBUG: sending
write 6/E9015258 flush 6/E9015230 apply 6/E9015230
2024-12-04 14:18:54 UTC [6002]: db=,user=,app=,client= DEBUG: sending
write 6/E9015258 flush 6/E9015258 apply 6/E9015258
2024-12-04 14:18:54 UTC [6002]: db=,user=,app=,client= DEBUG: sending
write 6/E9015258 flush 6/E9015258 apply 6/E9015258
2024-12-04 14:18:55 UTC [6002]: db=,user=,app=,client= DEBUG: sendtime
2024-12-04 14:18:53.136738+00 receipttime 2024-12-04 14:18:55.498946+00
replication apply delay 0 ms transfer latency 2363 ms
2024-12-04 14:18:55 UTC [6002]: db=,user=,app=,client= DEBUG: sending
write 6/E9015280 flush 6/E9015258 apply 6/E9015258
2024-12-04 14:18:55 UTC [6002]: db=,user=,app=,client= DEBUG: sending
write 6/E9015280 flush 6/E9015280 apply 6/E9015280

I can share reproduction steps if anyone is interested. This basically
happens due to usage of TimestampDifferenceExceeds() in
XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios.

What I was kind of surprised about was the lack of recommendation for
having primary/standby to have clocks synced when using
hot_standby_feedback, but such a thing is mentioned for
recovery_min_apply_delay. So I would like to add at least one sentence to
hot_standby_feedback to warn about this too, patch attached.

-J.

Attachment	Content-Type	Size
v1-0001-doc-Mention-clock-synchronization-recommendation-.patch	application/octet-stream	1.2 KB

From:	"Euler Taveira" <euler(at)eulerto(dot)com>
To:	"Jakub Wartak" <jakub(dot)wartak(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2024-12-05 15:06:41
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 5, 2024, at 6:43 AM, Jakub Wartak wrote:
> One of our customers ran into a very odd case, where hot standby feedback backend_xmin propagation stopped working due to major (hours/days) clock time shifts on hypervisor-managed VMs. This happens (and is fully reproducible) e.g. in scenarios where standby connects and its own VM is having time from the future (relative to primary) and then that time goes back to "normal". In such situation "sends hot_standby_feedback xmin" timestamp messages are stopped being transferred, e.g.:

Is it worth a WARNING message if there is a "huge" time difference
between the servers? We already have the reply time in the message so
it is a matter of defining the "huge" interval plus a roundtrip. We also
need to avoid spamming the log.

Your patch looks good to me. Should it be converted into a
<note>...</note>? (See synchronous_standby_names [1] for an example.)

[1] https://www.postgresql.org/docs/current/runtime-config-replication.html

--
Euler Taveira
EDB https://www.enterprisedb.com/

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2024-12-05 15:17:12
Message-ID:	qric5wndzfsjhrpzzkav7ehjlzbjfojbmxkhwz5ewfch4vnple@bfozwm5rb6zc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-12-05 12:06:41 -0300, Euler Taveira wrote:
> Is it worth a WARNING message if there is a "huge" time difference
> between the servers? We already have the reply time in the message so
> it is a matter of defining the "huge" interval plus a roundtrip. We also
> need to avoid spamming the log.

IME folks who have huge time differences between the servers are not going to
look at the log carefully enough to see such a warning.

From:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2024-12-09 10:52:16
Message-ID:	CAKZiRmzkeoGV6zcCex8R_FG_Lmq64ZGg1zJ-3y_JTdccaTYq_Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Euler!,

On Thu, Dec 5, 2024 at 4:07 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:

> On Thu, Dec 5, 2024, at 6:43 AM, Jakub Wartak wrote:
>
> One of our customers ran into a very odd case, where hot standby feedback
> backend_xmin propagation stopped working due to major (hours/days) clock
> time shifts on hypervisor-managed VMs. This happens (and is fully
> reproducible) e.g. in scenarios where standby connects and its own VM is
> having time from the future (relative to primary) and then that time goes
> back to "normal". In such situation "sends hot_standby_feedback xmin"
> timestamp messages are stopped being transferred, e.g.:
>
>
> Is it worth a WARNING message if there is a "huge" time difference
> between the servers? We already have the reply time in the message so
> it is a matter of defining the "huge" interval plus a roundtrip. We also
> need to avoid spamming the log.
>

I'm trying to stay consistent with what the recovery_min_apply_delay did
(there is a warning in the docs, but there's no warning in code) and I just
wanted the to have pointer in the documentation that if someone is using
hot_standby_feedback then he would be at least warned before. Given it is
very rare I don't think we need additional code (+ what Andres has noted ).

> Your patch looks good to me. Should it be converted into a
> <note>...</note>? (See synchronous_standby_names [1] for an example.)
>

Fine for me, but we would have to also convert the recovery_min_apply_delay
to do the same, right?

-J.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2024-12-18 09:33:33
Message-ID:	CAA4eK1JG1R4c7DDEdr7QAiQ1sFjb-EkQmp1H=dSKguoKX7PZDg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> One of our customers ran into a very odd case, where hot standby feedback backend_xmin propagation stopped working due to major (hours/days) clock time shifts on hypervisor-managed VMs. This happens (and is fully reproducible) e.g. in scenarios where standby connects and its own VM is having time from the future (relative to primary) and then that time goes back to "normal". In such situation "sends hot_standby_feedback xmin" timestamp messages are stopped being transferred, e.g.:
>
> 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending write 6/E9015230 flush 6/E9015230 apply 6/E9015230
> 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> <-- clock readjustment and no further "sending hot standby feedback"
...
>
> I can share reproduction steps if anyone is interested. This basically happens due to usage of TimestampDifferenceExceeds() in XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios.
>

We started to use a different mechanism in HEAD. See XLogWalRcvSendHSFeedback().

> What I was kind of surprised about was the lack of recommendation for having primary/standby to have clocks synced when using hot_standby_feedback, but such a thing is mentioned for recovery_min_apply_delay. So I would like to add at least one sentence to hot_standby_feedback to warn about this too, patch attached.
>

IIUC, this issue doesn't occur because the primary and standby clocks
are not synchronized. It happened because the clock on standby moved
backward. This is quite unlike the 'recovery_min_apply_delay' where
non-synchronization of clocks between primary and standby can lead to
unexpected results. This is because we don't compare any time on the
primary with the time on standby. If this understanding is correct
then the wording proposed by your patch should be changed accordingly.

--
With Regards,
Amit Kapila.

From:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-01-08 12:49:38
Message-ID:	CAKZiRmyC8r5Yc7dhUrxMmYF4+_QitFNDf4fwGp6p1ER+3UydUw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 18, 2024 at 10:33 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

Hi Amit!

> On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak
> <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> >
> > One of our customers ran into a very odd case, where hot standby feedback backend_xmin propagation stopped working due to major (hours/days) clock time shifts on hypervisor-managed VMs. This happens (and is fully reproducible) e.g. in scenarios where standby connects and its own VM is having time from the future (relative to primary) and then that time goes back to "normal". In such situation "sends hot_standby_feedback xmin" timestamp messages are stopped being transferred, e.g.:
> >
> > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending write 6/E9015230 flush 6/E9015230 apply 6/E9015230
> > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> > <-- clock readjustment and no further "sending hot standby feedback"
> ...
> >
> > I can share reproduction steps if anyone is interested. This basically happens due to usage of TimestampDifferenceExceeds() in XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios.
> >
>
> We started to use a different mechanism in HEAD. See XLogWalRcvSendHSFeedback().

Yes, you are correct somewhat because I was looking on REL13_STABLE,
but I've taken a fresh quick look at 05a7be93558 and tested it too.
Sadly, PG17 still maintains the same behavior of lack of proper
backend_xmin propagation (it stops sending hot standby feedback once
time on standby jumps forward). I believe this is the case because
walreceiver schedules next wakeup in far far future (when the clock /
now() is way ahead, see WalRcvComputeNextWakeup()), so when the clock
is back to normal (resetted back -X hours/days), the next wakeup seems
to be +X hours/days ahead.

> > What I was kind of surprised about was the lack of recommendation for having primary/standby to have clocks synced when using hot_standby_feedback, but such a thing is mentioned for recovery_min_apply_delay. So I would like to add at least one sentence to hot_standby_feedback to warn about this too, patch attached.
> >
>
> IIUC, this issue doesn't occur because the primary and standby clocks
> are not synchronized. It happened because the clock on standby moved
> backward.

In PG17 it would be because the clock moved way forward too much on
the standby. I don't know how it happened to that customer, but it was
probably done somehow by the hypervisor in that scenario (so time
wasn't slewed improperly by ntpd AFAIR, edge case, I know...)

> This is quite unlike the 'recovery_min_apply_delay' where
> non-synchronization of clocks between primary and standby can lead to
> unexpected results. This is because we don't compare any time on the
> primary with the time on standby. If this understanding is correct
> then the wording proposed by your patch should be changed accordingly.

.. if my understanding is correct, it is both depending on version :^)
I was thinking about backpatching docs (of what is the recommended
policy here? to just update new-release docs?), so I'm proposing
something more generic than earlier, but it takes Your point into
account - would something like below be good enough?

- <para>
- Using this option requires the primary and standby(s) to have system
- clocks synchronized, otherwise it may lead to prolonged risk of not
- removing dead rows on primary for extended periods of time as the
- feedback mechanism is based on timestamps exchanged between primary
- and standby(s).
- </para>

+ <para>
+ Using this option requires the primary and standby(s) to have system
+ clocks synchronized (without big time jumps), otherwise it may lead to
+ prolonged risk of not removing dead rows on primary for
extended periods
+ of time as the feedback mechanism implementation is timestamp based.
+ </para>

-J.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-03 05:26:42
Message-ID:	CAA4eK1+sM11DVgtfjTyLA4dBox109z94iQe0wPJWNtZm4MqXCg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 8, 2025 at 6:20 PM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> On Wed, Dec 18, 2024 at 10:33 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Hi Amit!
>
> > On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak
> > <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> > >
> > > One of our customers ran into a very odd case, where hot standby feedback backend_xmin propagation stopped working due to major (hours/days) clock time shifts on hypervisor-managed VMs. This happens (and is fully reproducible) e.g. in scenarios where standby connects and its own VM is having time from the future (relative to primary) and then that time goes back to "normal". In such situation "sends hot_standby_feedback xmin" timestamp messages are stopped being transferred, e.g.:
> > >
> > > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending write 6/E9015230 flush 6/E9015230 apply 6/E9015230
> > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> > > <-- clock readjustment and no further "sending hot standby feedback"
> > ...
> > >
> > > I can share reproduction steps if anyone is interested. This basically happens due to usage of TimestampDifferenceExceeds() in XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios.
> > >
> >
> > We started to use a different mechanism in HEAD. See XLogWalRcvSendHSFeedback().
>
> Yes, you are correct somewhat because I was looking on REL13_STABLE,
> but I've taken a fresh quick look at 05a7be93558 and tested it too.
> Sadly, PG17 still maintains the same behavior of lack of proper
> backend_xmin propagation (it stops sending hot standby feedback once
> time on standby jumps forward). I believe this is the case because
> walreceiver schedules next wakeup in far far future (when the clock /
> now() is way ahead, see WalRcvComputeNextWakeup()), so when the clock
> is back to normal (resetted back -X hours/days), the next wakeup seems
> to be +X hours/days ahead.
>
> > > What I was kind of surprised about was the lack of recommendation for having primary/standby to have clocks synced when using hot_standby_feedback, but such a thing is mentioned for recovery_min_apply_delay. So I would like to add at least one sentence to hot_standby_feedback to warn about this too, patch attached.
> > >
> >
> > IIUC, this issue doesn't occur because the primary and standby clocks
> > are not synchronized. It happened because the clock on standby moved
> > backward.
>
> In PG17 it would be because the clock moved way forward too much on
> the standby. I don't know how it happened to that customer, but it was
> probably done somehow by the hypervisor in that scenario (so time
> wasn't slewed improperly by ntpd AFAIR, edge case, I know...)
>
> > This is quite unlike the 'recovery_min_apply_delay' where
> > non-synchronization of clocks between primary and standby can lead to
> > unexpected results. This is because we don't compare any time on the
> > primary with the time on standby. If this understanding is correct
> > then the wording proposed by your patch should be changed accordingly.
>
> .. if my understanding is correct, it is both depending on version :^)
>

AFAICS, it doesn't depend on the version. I checked the code of PG13,
and it uses a similar implementation. I am referring to the below code
in PG13:
if (!immed)
{
/*
* Send feedback at most once per wal_receiver_status_interval.
*/
if (!TimestampDifferenceExceeds(sendTime, now,
wal_receiver_status_interval * 1000))
return;
sendTime = now;
}

> I was thinking about backpatching docs (of what is the recommended
> policy here? to just update new-release docs?), so I'm proposing
> something more generic than earlier, but it takes Your point into
> account - would something like below be good enough?
>
> - <para>
> - Using this option requires the primary and standby(s) to have system
> - clocks synchronized, otherwise it may lead to prolonged risk of not
> - removing dead rows on primary for extended periods of time as the
> - feedback mechanism is based on timestamps exchanged between primary
> - and standby(s).
> - </para>
>
> + <para>
> + Using this option requires the primary and standby(s) to have system
> + clocks synchronized (without big time jumps), otherwise it may lead to
> + prolonged risk of not removing dead rows on primary for
> extended periods
> + of time as the feedback mechanism implementation is timestamp based.
> + </para>
>

How about something like: "Note that if the clock on standby is moved
ahead or backward, the feedback message may not be sent at the
required interval. This can lead to prolonged risk of not removing
dead rows on primary for extended periods as the feedback mechanism is
based on timestamp."

--
With Regards,
Amit Kapila.

From:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-03 07:35:58
Message-ID:	CAKZiRmyEBkR5tfwrzXaoC6D29Gp6g_fD6Bd_k58DjZV1=rbKdQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Amit,

On Mon, Mar 3, 2025 at 6:26 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
[..]

OK, sure.

> How about something like: "Note that if the clock on standby is moved
> ahead or backward, the feedback message may not be sent at the
> required interval. This can lead to prolonged risk of not removing
> dead rows on primary for extended periods as the feedback mechanism is
> based on timestamp."

Sure thing. I've just added '(..) In the extreme cases this can..' as
it is pretty rare to hit it. Patch attached.

-J.

Attachment	Content-Type	Size
v2-0001-doc-Mention-clock-synchronization-recommendation-.patch	application/octet-stream	1.3 KB

From:	Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-03 09:48:41
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2025/03/03 16:35, Jakub Wartak wrote:
> Hi Amit,
>
> On Mon, Mar 3, 2025 at 6:26 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> [..]
>
> OK, sure.
>
>> How about something like: "Note that if the clock on standby is moved
>> ahead or backward, the feedback message may not be sent at the
>> required interval. This can lead to prolonged risk of not removing
>> dead rows on primary for extended periods as the feedback mechanism is
>> based on timestamp."
>
> Sure thing. I've just added '(..) In the extreme cases this can..' as
> it is pretty rare to hit it. Patch attached.

When the clock moves forward or backward, couldn't it affect
not only the standby but also the primary? I’m wondering
because TimestampDifferenceExceeds() seems to be used
in several places in addition to hot standby feedback.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-04 03:59:21
Message-ID:	CAA4eK1LqShf_kaPMKEi8U6ooTSRCxu_HOfE=hN5RughRQbSe6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 3, 2025 at 3:18 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2025/03/03 16:35, Jakub Wartak wrote:
> > Hi Amit,
> >
> > On Mon, Mar 3, 2025 at 6:26 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > [..]
> >
> > OK, sure.
> >
> >> How about something like: "Note that if the clock on standby is moved
> >> ahead or backward, the feedback message may not be sent at the
> >> required interval. This can lead to prolonged risk of not removing
> >> dead rows on primary for extended periods as the feedback mechanism is
> >> based on timestamp."
> >
> > Sure thing. I've just added '(..) In the extreme cases this can..' as
> > it is pretty rare to hit it. Patch attached.
>
> When the clock moves forward or backward, couldn't it affect
> not only the standby but also the primary? I’m wondering
> because TimestampDifferenceExceeds() seems to be used
> in several places in addition to hot standby feedback.
>

Right, it could impact other places as well, like background WAL flush
being delayed. So, what should we do about this? Shall we leave this
as is, make a general statement, find all cases and make a note about
them in docs, do it for the important ones where the impact is more,
or something else?

--
With Regards,
Amit Kapila.

From:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-04 11:14:22
Message-ID:	CAKZiRmxH7McJVJGss-i9JkWbRpZw0RnKU_K_vDCrFNVqd4FDEw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Tue, Mar 4, 2025 at 4:59 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> > >
> > > Sure thing. I've just added '(..) In the extreme cases this can..' as
> > > it is pretty rare to hit it. Patch attached.
> >
> > When the clock moves forward or backward, couldn't it affect
> > not only the standby but also the primary? I’m wondering
> > because TimestampDifferenceExceeds() seems to be used
> > in several places in addition to hot standby feedback.
> >
>
> Right, it could impact other places as well, like background WAL flush
> being delayed. So, what should we do about this? Shall we leave this
> as is, make a general statement, find all cases and make a note about
> them in docs, do it for the important ones where the impact is more,
> or something else?

Given the occurrence of such conditions is almost close to 0, we could
just open a new separate doc thread/cfentry if somebody is concerned
and add some general statement that OS time should not jump too much
(in some installation section), that it should be slewed (gradually
adjusted) instead. If someone has time jumping on his box back and
forth and something stops working , I still think he has bigger issues
(e.g. now() reflecting wrong data). I would stay vague as much as
possible, because every installation seems to use something different
(hypervisor, kernel modules, ntpd vs ntpd -x and so on).

The problem here was that standby was deteriorating primary (so you
couldn't see easily on primary what could be causing this), so IMHO
patch is fine as it stands, it just adds another not so known reason
to the pool of knowledge why backend_xmin might stop propagating.

-J.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-05 06:15:58
Message-ID:	CAA4eK1KwLM84n7UnGsCrG6rROO4jM-QrhACqXwhQYRx6yyTGsg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 4, 2025 at 4:44 PM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> On Tue, Mar 4, 2025 at 4:59 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > > >
> > > > Sure thing. I've just added '(..) In the extreme cases this can..' as
> > > > it is pretty rare to hit it. Patch attached.
> > >
> > > When the clock moves forward or backward, couldn't it affect
> > > not only the standby but also the primary? I’m wondering
> > > because TimestampDifferenceExceeds() seems to be used
> > > in several places in addition to hot standby feedback.
> > >
> >
> > Right, it could impact other places as well, like background WAL flush
> > being delayed. So, what should we do about this? Shall we leave this
> > as is, make a general statement, find all cases and make a note about
> > them in docs, do it for the important ones where the impact is more,
> > or something else?
>
> Given the occurrence of such conditions is almost close to 0, we could
> just open a new separate doc thread/cfentry if somebody is concerned
> and add some general statement that OS time should not jump too much
> (in some installation section), that it should be slewed (gradually
> adjusted) instead. If someone has time jumping on his box back and
> forth and something stops working , I still think he has bigger issues
> (e.g. now() reflecting wrong data). I would stay vague as much as
> possible, because every installation seems to use something different
> (hypervisor, kernel modules, ntpd vs ntpd -x and so on).
>
> The problem here was that standby was deteriorating primary (so you
> couldn't see easily on primary what could be causing this), so IMHO
> patch is fine as it stands, it just adds another not so known reason
> to the pool of knowledge why backend_xmin might stop propagating.
>

I can go with the last patch as you observed that in a real-world
case, and we can look at others (if any) on a case-to-case basis.
Fujii-San, others, do you have any opinion on this?

--
With Regards,
Amit Kapila.

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-14 10:31:13
Message-ID:	CALDaNm2pwt8pA=6G3C=7TcF6iW5OYJKGzq4-tZmi_MtMkUNYew@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 5 Mar 2025 at 11:46, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Mar 4, 2025 at 4:44 PM Jakub Wartak
>
> I can go with the last patch as you observed that in a real-world
> case, and we can look at others (if any) on a case-to-case basis.
> Fujii-San, others, do you have any opinion on this?

+1 for this approach.

Regards,
Vignesh

From:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-24 09:24:30
Message-ID:	CAKZiRmzuJv9va__=C6wjVwjFQOjzb+g_E053R8mD5rxtUsU8aA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 14, 2025 at 11:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 5 Mar 2025 at 11:46, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Mar 4, 2025 at 4:44 PM Jakub Wartak
> >
> > I can go with the last patch as you observed that in a real-world
> > case, and we can look at others (if any) on a case-to-case basis.
> > Fujii-San, others, do you have any opinion on this?
>
> +1 for this approach.
>

OK, so I have set this as Ready for Committer (I assume everybody who
wanted to take voice already did).

-J.

From:	Peter Eisentraut <peter(at)eisentraut(dot)org>
To:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Date:	2025-03-31 14:58:35
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24.03.25 10:24, Jakub Wartak wrote:
> On Fri, Mar 14, 2025 at 11:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>>
>> On Wed, 5 Mar 2025 at 11:46, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>
>>> On Tue, Mar 4, 2025 at 4:44 PM Jakub Wartak
>>>
>>> I can go with the last patch as you observed that in a real-world
>>> case, and we can look at others (if any) on a case-to-case basis.
>>> Fujii-San, others, do you have any opinion on this?
>>
>> +1 for this approach.
>>
>
> OK, so I have set this as Ready for Committer (I assume everybody who
> wanted to take voice already did).

committed