proposal: add 'waiting for replication' to pg_stat_activity.state

Lists: pgsql-hackers
From: Julian Schauder <julian(dot)schauder(at)credativ(dot)de>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2015-12-02 20:22:07
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Hackers,

I recently analyzed an incident where a major lag in synchronous replication
blocked a number of synchronous backends. I found myself looking at backends
that, according to pg_stat_activity, were neither waiting nor idle but yet they
didn't finish their work.

As it turns out, the major waiting loop for syncrep updates the processtitle,
but is silent within postgres and stat_activity. It seems misleading that
commited but waiting backends are 'active' although there is little done apart
from waiting.

> # select pid, waiting, state, substr(query,1,6) from pg_stat_activity ;
> pid | waiting | state | substr
> -------+---------+--------+--------
> 26294 | f | active | END;
> 26318 | f | active | create
> 26323 | f | active | insert
> 26336 | f | active | insert
(output of waiting statements [vanilla])

While 'active' is technically correct for a backend that is commited but waiting
for replication in terms of 'not beeing available for new tasks', it also
implies that a backend is dealing with the issue at hand. The remote host
however is out of our clusters control, hence all signs should be pointing to
the standby-host.

I suggest adding a new state to pg_stat_activity.state for backends that are
waiting for their synchronous commit to be flushed on the remote host.
I chose 'waiting for synchronous replication' for now.

One should refrain from the waiting flag at this point as there is no waiting
done on internal processes. Instead the backend waits for factors beyond our
clusters control to change.

> # select pid, waiting, state, substr(query,1,6) from pg_stat_activity ;
> pid | waiting | state | substr
> ------+---------+-------------------------------------+--------
> 3360 | f | waiting for synchronous replication | END;
> 3465 | f | waiting for synchronous replication | create
> 3477 | f | waiting for synchronous replication | insert
> 3489 | f | waiting for synchronous replication | insert
(output of waiting statements [patched])

patch attached

regards,

Julian Schauder

Attachment Content-Type Size
patch text/plain 2.6 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Julian Schauder <julian(dot)schauder(at)credativ(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2015-12-03 00:00:03
Message-ID: CAMsr+YHMEw4SeqDpO2jzV+fzBUR0EgkFCmMA0-DRsvr7mr0GyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 December 2015 at 04:22, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>
wrote:

> I suggest adding a new state to pg_stat_activity.state for backends that
> are
>
waiting for their synchronous commit to be flushed on the remote host.
>
>
Excellent idea. Anything that improves management and visibility into what
the system is doing like this is really valuable.

I notice that you don't set the 'waiting' flag. 'waiting' is presently
documented as:

<entry>True if this backend is currently waiting on a lock</entry>

... but I'm inclined to just widen its definition and set it here, since we
most certainly are waiting, and the column isn't named 'waiting_on_a_lock'.
It shouldn't upset various canned lock monitoring queries people have since
they generally do an inner join on pg_locks anyway.

There are no test changes in this patch, but that's reasonable because we
don't currently have a way to automate tests of sync rep.

I've attached a patch with minor wording/formatting changes, but I think
I'd like to change 'waiting' to true as well. Opinions?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Add-waiting-for-replication-state-to-pg_stat_activit.patch text/x-patch 3.0 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Julian Schauder <julian(dot)schauder(at)credativ(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2015-12-03 00:40:06
Message-ID: CAMsr+YErDUUXQGNRG6zYm+x7Auv6KDhxHwmbQKjSvB2yXpu3Bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 December 2015 at 04:22, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>
wrote:

>
> I suggest adding a new state to pg_stat_activity.state for backends that
> are
> waiting for their synchronous commit to be flushed on the remote host.
> I chose 'waiting for synchronous replication' for now.
>
>
I've added this to the next CF:

https://commitfest.postgresql.org/8/436/

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2015-12-03 01:32:41
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/2/15 7:00 PM, Craig Ringer wrote:
> I notice that you don't set the 'waiting' flag. 'waiting' is presently
> documented as:
>
> <entry>True if this backend is currently waiting on a lock</entry>
>
> ... but I'm inclined to just widen its definition and set it here, since
> we most certainly are waiting, and the column isn't named
> 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
> queries people have since they generally do an inner join on pg_locks
> anyway.

I'm not so sure about that assumption.


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Julian Schauder <julian(dot)schauder(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2015-12-03 03:32:20
Message-ID: CAMsr+YFiUNqz5FC0f8q1c=VrFvpofisWa8aW+uZDtM5vMsPk5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 December 2015 at 09:32, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> On 12/2/15 7:00 PM, Craig Ringer wrote:
> > I notice that you don't set the 'waiting' flag. 'waiting' is presently
> > documented as:
> >
> > <entry>True if this backend is currently waiting on a lock</entry>
> >
> > ... but I'm inclined to just widen its definition and set it here, since
> > we most certainly are waiting, and the column isn't named
> > 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
> > queries people have since they generally do an inner join on pg_locks
> > anyway.
>
> I'm not so sure about that assumption.
>

Even if it's an outer join, the worst that'll happen is that they'll get
entries with nulls in pg_locks. I don't think it's worth worrying about too
much.

We could always mitigate it by adding a pg_lock_status view to the system
catalogs with a decent canned query over pg_stat_activity and pg_locks, so
people can stop copying & pasting from the wiki or using buggy homebrew
queries ;)

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2015-12-03 14:58:08
Message-ID: CAA4eK1JpVasak2NzygXkocQMg2L7VVkh_PXzaa7RBKyBt0+Ovg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
> On 3 December 2015 at 09:32, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>
>> On 12/2/15 7:00 PM, Craig Ringer wrote:
>> > I notice that you don't set the 'waiting' flag. 'waiting' is presently
>> > documented as:
>> >
>> > <entry>True if this backend is currently waiting on a
lock</entry>
>> >
>> > ... but I'm inclined to just widen its definition and set it here,
since
>> > we most certainly are waiting, and the column isn't named
>> > 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
>> > queries people have since they generally do an inner join on pg_locks
>> > anyway.
>>
>> I'm not so sure about that assumption.
>
>
> Even if it's an outer join, the worst that'll happen is that they'll get
entries with nulls in pg_locks. I don't think it's worth worrying about too
much.
>

That can be one way of dealing with it and another is that we
keep the current column as it is and add another view related
wait stats, anyway we need something like that for other purposes
like lwlock waits etc. Basically something on lines what we
is being discussed in thread [1]

[1] -
http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=LwAJ+w@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2015-12-04 08:48:55
Message-ID: CAMsr+YHcd3ucC8GA7i25p_LC=QFKRywpNWDoE3BE+3EsGLNCxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 December 2015 at 22:58, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer <craig(at)2ndquadrant(dot)com>
> wrote:
> >
> > On 3 December 2015 at 09:32, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> >>
> >> On 12/2/15 7:00 PM, Craig Ringer wrote:
> >> > I notice that you don't set the 'waiting' flag. 'waiting' is
> presently
> >> > documented as:
> >> >
> >> > <entry>True if this backend is currently waiting on a
> lock</entry>
> >> >
> >> > ... but I'm inclined to just widen its definition and set it here,
> since
> >> > we most certainly are waiting, and the column isn't named
> >> > 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
> >> > queries people have since they generally do an inner join on pg_locks
> >> > anyway.
> >>
> >> I'm not so sure about that assumption.
> >
> >
> > Even if it's an outer join, the worst that'll happen is that they'll get
> entries with nulls in pg_locks. I don't think it's worth worrying about too
> much.
> >
>
> That can be one way of dealing with it and another is that we
> keep the current column as it is and add another view related
> wait stats, anyway we need something like that for other purposes
> like lwlock waits etc. Basically something on lines what we
> is being discussed in thread [1]
>
> [1] -
> http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=LwAJ+w@mail.gmail.com
>

Good point. I'm not sure this shouldn't set 'waiting' anyway, though.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2015-12-04 09:12:46
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On December 4, 2015 9:48:55 AM GMT+01:00, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>On 3 December 2015 at 22:58, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>wrote:
>
>> On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer <craig(at)2ndquadrant(dot)com>
>> wrote:

>http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=LwAJ+w@mail.gmail.com
>>
>Good point. I'm not sure this shouldn't set 'waiting' anyway, though.

No. Waiting corresponds to pg locks -setting it to true for other things would be even more confusing...

Andres

---
Please excuse brevity and formatting - I am writing this on my mobile phone.


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2016-01-18 17:43:18
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund wrote:
> On December 4, 2015 9:48:55 AM GMT+01:00, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> >On 3 December 2015 at 22:58, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> >wrote:
> >
> >> On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer <craig(at)2ndquadrant(dot)com>
> >> wrote:
>
> >http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=LwAJ+w@mail.gmail.com
> >>
> >Good point. I'm not sure this shouldn't set 'waiting' anyway, though.
>
> No. Waiting corresponds to pg locks -setting it to true for other things would be even more confusing...

Robert's opinion elsewhere is relevant for this patch, and contradicts
Andres' opinion here:
http://www.postgresql.org/message-id/CA+Tgmoaj+EPoO9qobvFH18F5kJg2tAEXhQ8_-VayWHUr-ZATMw@mail.gmail.com

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Julian Schauder <julian(dot)schauder(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2016-01-29 02:19:23
Message-ID: CAEepm=2k8nM=Bg_OTt2C7dDrUoB5MNKA5mceN6cxVRy3mTA7qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 3, 2015 at 1:00 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 3 December 2015 at 04:22, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>
> wrote:
>
>>
>> I suggest adding a new state to pg_stat_activity.state for backends that
>> are
>>
>> waiting for their synchronous commit to be flushed on the remote host.
>>
>
> Excellent idea. Anything that improves management and visibility into what
> the system is doing like this is really valuable.
>
> I notice that you don't set the 'waiting' flag. 'waiting' is presently
> documented as:
>
> <entry>True if this backend is currently waiting on a lock</entry>
>
> ... but I'm inclined to just widen its definition and set it here, since we
> most certainly are waiting, and the column isn't named 'waiting_on_a_lock'.
> It shouldn't upset various canned lock monitoring queries people have since
> they generally do an inner join on pg_locks anyway.
>
> There are no test changes in this patch, but that's reasonable because we
> don't currently have a way to automate tests of sync rep.
>
> I've attached a patch with minor wording/formatting changes, but I think I'd
> like to change 'waiting' to true as well. Opinions?

A couple of minor things I noticed:

+ <literal>waiting for synchronous replication</>: The
backend is waiting for its transaction to be flushed on a synchronous
standby.

I wouldn't say "flushed" here. If you set synchronous_commit =
remote_write, then it's not actually waiting for it to be flushed (and
there may eventually be other levels too). Maybe just "The backend is
waiting for a response from a synchronous standby"?

+#include <pgstat.h>

I think this should use "" instead of <> and should come after
#include "miscadmin.h".

Thinking of other patches in flight, I think I'd want the proposed
N-sync standbys feature to be able to explain in more detail what it's
waiting for (and likewise my causal reads proposal which does that via
the process title), though I realise that the details of how/where to
do that are changing in the "replace pg_stat_activity.waiting" thread
which this patch is waiting on.

--
Thomas Munro
http://www.enterprisedb.com


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Julian Schauder <julian(dot)schauder(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: add 'waiting for replication' to pg_stat_activity.state
Date: 2016-02-01 22:39:30
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Munro wrote:

> Thinking of other patches in flight, I think I'd want the proposed
> N-sync standbys feature to be able to explain in more detail what it's
> waiting for (and likewise my causal reads proposal which does that via
> the process title), though I realise that the details of how/where to
> do that are changing in the "replace pg_stat_activity.waiting" thread
> which this patch is waiting on.

Right, interesting thought. I think that for the time being we should
close this one as returned with feedback, since there's no point in
keeping it open in commitfest. I encourage the author (and everyone
else) to look at the other patch and help there, which is going to
ultimately achieve the outcome desired with this patch.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services