identifying the backend that owns a temporary schema

Lists: pgsql-hackers
From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: identifying the backend that owns a temporary schema
Date: 2022-08-15 20:58:11
Message-ID: 20220815205811.GA250990@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi hackers,

As Greg Stark noted elsewhere [0], it is presently very difficult to
identify the PID of the session using a temporary schema, which is
particularly unfortunate when a temporary table is putting a cluster in
danger of transaction ID wraparound. I noted [1] that the following query
can be used to identify the PID for a given backend ID:

SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM pg_stat_get_backend_idset() bid;

But on closer inspection, this is just plain wrong. The backend IDs
returned by pg_stat_get_backend_idset() might initially bear some
resemblance to the backend IDs stored in PGPROC, so my suggested query
might work some of the time, but the pg_stat_get_backend_* backend IDs
typically diverge from the PGPROC backend IDs as sessions connect and
disconnect.

I think it would be nice to have a reliable way to discover the PID for a
given temporary schema via SQL. The other thread [2] introduces a helpful
log message that indicates the PID for temporary tables that are in danger
of causing transaction ID wraparound, and I intend for this proposal to be
complementary to that work.

At first, I thought about adding a new function for retrieving the PGPROC
backend IDs, but I am worried that having two sets of backend IDs would be
even more confusing than having one set that can't reliably be used for
temporary schemas. Instead, I tried adjusting the pg_stat_get_backend_*()
suite of functions to use the PGPROC backend IDs. This ended up being
simpler than anticipated. I added a backend_id field to the
LocalPgBackendStatus struct (which is populated within
pgstat_read_current_status()), and I changed pgstat_fetch_stat_beentry() to
bsearch() for the entry with the given PGPROC backend ID.

This does result in a small behavior change. Currently,
pg_stat_get_backend_idset() simply returns a range of numbers (1 to the
number of active backends). With the attached patch, this function will
still return a set of numbers, but there might be gaps between the IDs, and
the maximum backend ID will usually be greater than the number of active
backends. I suppose this might break some existing uses, but I'm not sure
how much we should worry about that. IMO uniting the backend IDs is a net
improvement.

Thoughts?

[0] https://postgr.es/m/CAM-w4HPCOuJDs4fdkgNdA8FFMeYMULPCAxjPpsOgvCO24KOAVg%40mail.gmail.com
[1] https://postgr.es/m/DDF0D1BC-261D-45C2-961C-5CBDBB41EE71%40amazon.com
[2] https://commitfest.postgresql.org/39/3358/

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-Adjust-pg_stat_get_backend_-to-use-backends-PGPRO.patch text/x-diff 7.1 KB

From: Jeremy Schneider <schnjere(at)amazon(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-08-15 21:47:25
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8/15/22 1:58 PM, Nathan Bossart wrote:
> Hi hackers,
>
> As Greg Stark noted elsewhere [0], it is presently very difficult to
> identify the PID of the session using a temporary schema, which is
> particularly unfortunate when a temporary table is putting a cluster in
> danger of transaction ID wraparound. I noted [1] that the following query
> can be used to identify the PID for a given backend ID:
>
> SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM pg_stat_get_backend_idset() bid;
>
> But on closer inspection, this is just plain wrong. The backend IDs
> returned by pg_stat_get_backend_idset() might initially bear some
> resemblance to the backend IDs stored in PGPROC, so my suggested query
> might work some of the time, but the pg_stat_get_backend_* backend IDs
> typically diverge from the PGPROC backend IDs as sessions connect and
> disconnect.

I didn't review the patch itself yet, but I'd like to chime in with a
big "+1" for the idea. I've had several past experiences getting called
to help in situations where a database was getting close to wraparound
and the culprit was a temp table blocking vacuum. I went down this same
trail of pg_stat_get_backend_idset() and I can attest that it did work
once or twice, but it didn't work other times.

AFAIK, in PostgreSQL today, there's really no way to reliably get the
PID of the session holding particular temp tables. (The idea of
iterating through backends with gdb and trying to find & dump some
obscure data structure seems completely impractical for regular
production ops.)

I'll take a look at the patch if I can... and I'm hopeful that we're
able to move this idea forward and get this little gap in PG filled once
and for all!

-Jeremy

--
Jeremy Schneider
Database Engineer
Amazon Web Services


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Jeremy Schneider <schnjere(at)amazon(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-08-16 23:04:23
Message-ID: 20220816230423.GA341431@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 15, 2022 at 02:47:25PM -0700, Jeremy Schneider wrote:
> I'll take a look at the patch if I can... and I'm hopeful that we're
> able to move this idea forward and get this little gap in PG filled once
> and for all!

Thanks!

I noticed that the "result" variable in pg_stat_get_backend_idset() is kind
of pointless after my patch is applied, so here is a v2 with it removed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-Adjust-pg_stat_get_backend_-to-use-backends-PGPRO.patch text/x-diff 7.4 KB

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: nathandbossart(at)gmail(dot)com
Cc: schnjere(at)amazon(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-08-23 09:19:37
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Tue, 16 Aug 2022 16:04:23 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> On Mon, Aug 15, 2022 at 02:47:25PM -0700, Jeremy Schneider wrote:
> > I'll take a look at the patch if I can... and I'm hopeful that we're
> > able to move this idea forward and get this little gap in PG filled once
> > and for all!
>
> Thanks!
>
> I noticed that the "result" variable in pg_stat_get_backend_idset() is kind
> of pointless after my patch is applied, so here is a v2 with it removed.

It seems to be a sensible way to expose the PGPROC backend ids to SQL
interface. It inserts bsearch into relatively frequently-called
function but (I believe) that doesn't seem to matter much (comparing
to, for example, the size of id->position translation table).

I don't think pgstat_fetch_stat_beentry needs to check for
out-of-range ids anymore. That case is a kind of rare and bsearch
properly handles it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Greg Stark <stark(at)mit(dot)edu>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-08-23 09:29:05
Message-ID: CAM-w4HMZ03AbQjBbfN_mTxct68HX3NPRQVfLaRxLGJTejJ8xbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Having this function would be great (I admit I never responded because
I never figured out if your suggestion was right or not:). But should
it also be added to the pg_stat_activity view? Perhaps even just in
the SQL view using the function.

Alternately should pg_stat_activity show the actual temp schema name
instead of the id? I don't recall if it's visible outside the backend
but if it is, could pg_stat_activity show whether the temp schema is
actually attached or not?


From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-08-23 11:30:57
Message-ID: CAMsGm5cPjXbk+-15vrc6yjZoSUj6egcUeHZTK5sey9RHcHDooQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 23 Aug 2022 at 05:29, Greg Stark <stark(at)mit(dot)edu> wrote:

> Having this function would be great (I admit I never responded because
> I never figured out if your suggestion was right or not:). But should
> it also be added to the pg_stat_activity view? Perhaps even just in
> the SQL view using the function.
>
> Alternately should pg_stat_activity show the actual temp schema name
> instead of the id? I don't recall if it's visible outside the backend
> but if it is, could pg_stat_activity show whether the temp schema is
> actually attached or not?
>

Would it work to cast the schema oid to type regnamespace? Then the actual
data (numeric oid) would be present in the view, but it would display as
text.


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-08-29 23:07:57
Message-ID: 20220829230757.GA542616@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 23, 2022 at 10:29:05AM +0100, Greg Stark wrote:
> Having this function would be great (I admit I never responded because
> I never figured out if your suggestion was right or not:). But should
> it also be added to the pg_stat_activity view? Perhaps even just in
> the SQL view using the function.
>
> Alternately should pg_stat_activity show the actual temp schema name
> instead of the id? I don't recall if it's visible outside the backend
> but if it is, could pg_stat_activity show whether the temp schema is
> actually attached or not?

I'm open to adding the backend ID or the temp schema name to
pg_stat_activity, but I wouldn't be surprised to learn that others aren't.
It'd be great to hear a few more opinions on the idea before I spend too
much time on the patches. IMO we should still adjust the
pg_stat_get_backend_*() functions even if we do end up adjusting
pg_stat_activity.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-09-24 17:41:38
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> On Tue, Aug 23, 2022 at 10:29:05AM +0100, Greg Stark wrote:
>> Alternately should pg_stat_activity show the actual temp schema name
>> instead of the id? I don't recall if it's visible outside the backend
>> but if it is, could pg_stat_activity show whether the temp schema is
>> actually attached or not?

> I'm open to adding the backend ID or the temp schema name to
> pg_stat_activity, but I wouldn't be surprised to learn that others aren't.

FWIW, I'd vote against adding the temp schema per se. We can see from
outside whether the corresponding temp schema exists, but we can't readily
tell whether the session has decided to use it, so attributing it to the
session is a bit dangerous. Maybe there is an argument for having
sessions report it to pgstats when they do adopt a temp schema, but I
think there'd be race conditions, rollback after error, and other issues
to contend with there.

The proposed patch seems like an independent first step in any case.

One thing I don't like about it documentation-wise is that it leaves
the concept of backend ID pretty much completely undefined.

regards, tom lane


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-09-26 16:08:22
Message-ID: 20220926160822.GA1298186@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote:
> One thing I don't like about it documentation-wise is that it leaves
> the concept of backend ID pretty much completely undefined.

How specific do you think this definition ought to be? All I've come up
with so far is "internal identifier for the backend that is independent
from its PID," which is what I used in the attached patch. Do we want to
mention its uses in more detail (e.g., temp schema name), or should we keep
it vague?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v3-0001-Adjust-pg_stat_get_backend_-to-use-backends-PGPRO.patch text/x-diff 7.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-09-26 19:50:09
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote:
>> One thing I don't like about it documentation-wise is that it leaves
>> the concept of backend ID pretty much completely undefined.

> How specific do you think this definition ought to be?

Fairly specific, I think, so that people can reason about how it behaves.
Notably, it seems absolutely critical to be clear that the IDs recycle
over short time frames. Maybe like

These access functions use the session's backend ID number, which is
a small integer that is distinct from the backend ID of any concurrent
session, although an ID can be recycled as soon as the session exits.
The backend ID is used, among other things, to identify the session's
temporary schema if it has one.

I'd prefer to use the terminology "session" than "backend" in the
definition. I suppose we can't get away with actually calling it
a "session ID" given that "backend ID" is used in so many places;
but I think people have a clearer handle on what a session is.

regards, tom lane


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-09-26 20:11:13
Message-ID: 20220926201113.GA1340606@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 26, 2022 at 03:50:09PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
>> On Sat, Sep 24, 2022 at 01:41:38PM -0400, Tom Lane wrote:
>>> One thing I don't like about it documentation-wise is that it leaves
>>> the concept of backend ID pretty much completely undefined.
>
>> How specific do you think this definition ought to be?
>
> Fairly specific, I think, so that people can reason about how it behaves.
> Notably, it seems absolutely critical to be clear that the IDs recycle
> over short time frames. Maybe like
>
> These access functions use the session's backend ID number, which is
> a small integer that is distinct from the backend ID of any concurrent
> session, although an ID can be recycled as soon as the session exits.
> The backend ID is used, among other things, to identify the session's
> temporary schema if it has one.
>
> I'd prefer to use the terminology "session" than "backend" in the
> definition. I suppose we can't get away with actually calling it
> a "session ID" given that "backend ID" is used in so many places;
> but I think people have a clearer handle on what a session is.

Thanks for the suggestion. I used it in v4 of the patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v4-0001-Adjust-pg_stat_get_backend_-to-use-backends-PGPRO.patch text/x-diff 7.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-09-28 22:56:20
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> Thanks for the suggestion. I used it in v4 of the patch.

I reviewed this and made some changes, some cosmetic some less so.

Notably, I was bemused that of the four calls of
pgstat_fetch_stat_local_beentry, three tested for a NULL result even
though they cannot get one, while the fourth (pg_stat_get_backend_idset)
*is* at hazard of a NULL result but lacked a check. I changed
pg_stat_get_backend_idset so that it too cannot get a NULL, and deleted
the dead code from the other callers.

A point that still bothers me a bit about pg_stat_get_backend_idset is
that it could miss or duplicate some backend IDs if the user calls
pg_stat_clear_snapshot() partway through the SRF's run, and we reload
a different set of backend entries than we had before. I added a comment
about that, with an argument why it's not worth working harder, but
is the argument convincing? If not, what should we do?

Also, I realized that the functions we're changing here are mostly
not exercised in the current regression tests :-(. So I added a
small test case.

I think this is probably committable if you agree with my changes.

regards, tom lane

Attachment Content-Type Size
v5-0001-use-real-backend-id.patch text/x-diff 13.2 KB

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-09-29 02:40:38
Message-ID: 20220929024038.GA4945@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 28, 2022 at 06:56:20PM -0400, Tom Lane wrote:
> I reviewed this and made some changes, some cosmetic some less so.

Thanks for the detailed review.

> A point that still bothers me a bit about pg_stat_get_backend_idset is
> that it could miss or duplicate some backend IDs if the user calls
> pg_stat_clear_snapshot() partway through the SRF's run, and we reload
> a different set of backend entries than we had before. I added a comment
> about that, with an argument why it's not worth working harder, but
> is the argument convincing? If not, what should we do?

Isn't this an existing problem? Granted, it'd manifest differently with
this patch, but ISTM we could already end up with too many or too few
backend IDs if there's a refresh partway through. I don't know if there's
an easy way to avoid mismatches altogether besides perhaps ERROR-ing if
there's a concurrent refresh.

> - if (beid < 1 || beid > localNumBackends)
> - return NULL;

The reason I'd kept this in was because I was worried about overflow in the
comparator function. Upon further inspection, I don't think there's
actually any way that will produce incorrect results. And I'm not sure we
should worry about such cases too much, anyway.

Overall, LGTM.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-09-29 14:47:06
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> On Wed, Sep 28, 2022 at 06:56:20PM -0400, Tom Lane wrote:
>> A point that still bothers me a bit about pg_stat_get_backend_idset is
>> that it could miss or duplicate some backend IDs if the user calls
>> pg_stat_clear_snapshot() partway through the SRF's run, and we reload
>> a different set of backend entries than we had before. I added a comment
>> about that, with an argument why it's not worth working harder, but
>> is the argument convincing? If not, what should we do?

> Isn't this an existing problem? Granted, it'd manifest differently with
> this patch, but ISTM we could already end up with too many or too few
> backend IDs if there's a refresh partway through.

Right. I'd been thinking the current code wouldn't generate duplicate IDs
--- but it might produce duplicate query output anyway, in case a given
backend's entry is later in the array than it was before. So really
there's not much guarantees here in any case.

>> - if (beid < 1 || beid > localNumBackends)
>> - return NULL;

> The reason I'd kept this in was because I was worried about overflow in the
> comparator function. Upon further inspection, I don't think there's
> actually any way that will produce incorrect results. And I'm not sure we
> should worry about such cases too much, anyway.

Ah, I see: if the user passes in a "backend ID" that is close to INT_MIN,
then the comparator's subtraction could overflow. We could fix that by
writing out the comparator code longhand ("if (a < b) return -1;" etc),
but I don't really think it's necessary. bsearch is guaranteed to
correctly report that such a key is not present, even if it takes a
strange search path through the array due to inconsistent comparator
results. So the test quoted above just serves to fail a bit more quickly,
but we certainly shouldn't be optimizing for the case of a bad ID.

> Overall, LGTM.

OK. Will push shortly.

regards, tom lane


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: identifying the backend that owns a temporary schema
Date: 2022-09-29 16:23:32
Message-ID: 20220929162332.GA69039@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 29, 2022 at 10:47:06AM -0400, Tom Lane wrote:
> OK. Will push shortly.

Thanks!

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com