subscriptionCheck failures on nightjar

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: subscriptionCheck failures on nightjar
Date: 2019-02-11 06:31:23
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

nightjar just did this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2019-02-11%2004%3A33%3A07

The critical bit seems to be that the publisher side of the
010_truncate.pl test failed like so:

2019-02-10 23:55:58.765 EST [40771] sub3 LOG: statement: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
2019-02-10 23:55:58.765 EST [40771] sub3 LOG: received replication command: CREATE_REPLICATION_SLOT "sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
2019-02-10 23:55:58.798 EST [40728] sub1 PANIC: could not open file "pg_logical/snapshots/0-160B578.snap": No such file or directory
2019-02-10 23:55:58.800 EST [40771] sub3 LOG: logical decoding found consistent point at 0/160B578
2019-02-10 23:55:58.800 EST [40771] sub3 DETAIL: There are no running transactions.

I'm not sure what to make of that, but I notice that nightjar has
failed subscriptionCheck seven times since mid-December, and every one
of those shows this same PANIC. Meanwhile, no other buildfarm member
has produced such a failure. It smells like a race condition with
a rather tight window, but that's just a guess.

So: (1) what's causing the failure? (2) could we respond with
something less than take-down-the-whole-database when a failure
happens in this area?

regards, tom lane


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 11:55:59
Message-ID: CAEepm=1pbie9C_PtojGum7qXAAU1hB8JtA6v_9dQFPgay3PcZg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 11, 2019 at 7:31 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC: could not open file "pg_logical/snapshots/0-160B578.snap": No such file or directory

<pokes at totally unfamiliar code>

They get atomically renamed into place, which seems kosher even if
snapshots for the same LSN are created concurrently by different
backends (and tracing syscalls confirms that that does occasionally
happen). It's hard to believe that nightjar's rename() ceased to be
atomic a couple of months ago. It looks like the only way for files
to get unlinked after that is by CheckPointSnapBuild() deciding they
are too old.

Hmm. Could this be relevant, and cause a well timed checkpoint to
unlink files too soon?

2019-02-12 21:52:58.304 EST [22922] WARNING: out of logical
replication worker slots

--
Thomas Munro
http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 16:57:32
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Mon, Feb 11, 2019 at 7:31 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC: could not open file "pg_logical/snapshots/0-160B578.snap": No such file or directory

> <pokes at totally unfamiliar code>

> They get atomically renamed into place, which seems kosher even if
> snapshots for the same LSN are created concurrently by different
> backends (and tracing syscalls confirms that that does occasionally
> happen). It's hard to believe that nightjar's rename() ceased to be
> atomic a couple of months ago. It looks like the only way for files
> to get unlinked after that is by CheckPointSnapBuild() deciding they
> are too old.

> Hmm. Could this be relevant, and cause a well timed checkpoint to
> unlink files too soon?
> 2019-02-12 21:52:58.304 EST [22922] WARNING: out of logical
> replication worker slots

I've managed to reproduce this locally, and obtained this PANIC:

log/010_truncate_publisher.log:2019-02-13 11:29:04.759 EST [9973] sub3 PANIC: could not open file "pg_logical/snapshots/0-16067B0.snap": No such file or directory

with this stack trace:

#0 0x0000000801ab610c in kill () from /lib/libc.so.7
#1 0x0000000801ab4d3b in abort () from /lib/libc.so.7
#2 0x000000000089202e in errfinish (dummy=Variable "dummy" is not available.
) at elog.c:552
#3 0x000000000075d339 in fsync_fname_ext (
fname=0x7fffffffba20 "pg_logical/snapshots/0-16067B0.snap", isdir=Variable "isdir" is not available.
)
at fd.c:3372
#4 0x0000000000730c75 in SnapBuildSerialize (builder=0x80243c118,
lsn=23095216) at snapbuild.c:1669
#5 0x0000000000731297 in SnapBuildProcessRunningXacts (builder=0x80243c118,
lsn=23095216, running=0x8024424f0) at snapbuild.c:1110
#6 0x0000000000722eac in LogicalDecodingProcessRecord (ctx=0x802414118,
record=0x8024143d8) at decode.c:318
#7 0x0000000000736f30 in XLogSendLogical () at walsender.c:2826
#8 0x00000000007389c7 in WalSndLoop (send_data=0x736ed0 <XLogSendLogical>)
at walsender.c:2184
#9 0x0000000000738c3d in StartLogicalReplication (cmd=Variable "cmd" is not available.
) at walsender.c:1118
#10 0x0000000000739895 in exec_replication_command (
cmd_string=0x80240e118 "START_REPLICATION SLOT \"sub3\" LOGICAL 0/0 (proto_version '1', publication_names '\"pub3\"')") at walsender.c:1536
#11 0x000000000078b272 in PostgresMain (argc=-14464, argv=Variable "argv" is not available.
) at postgres.c:4252
#12 0x00000000007113fa in PostmasterMain (argc=-14256, argv=0x7fffffffcc60)
at postmaster.c:4382

So the problem seems to boil down to "somebody removed the snapshot
file between the time we renamed it into place and the time we tried
to fsync it".

I do find messages like the one you mention, but they are on the
subscriber not the publisher side, so I'm not sure if this matches
the scenario you have in mind?

log/010_truncate_subscriber.log:2019-02-13 11:29:02.343 EST [9970] WARNING: out of logical replication worker slots
log/010_truncate_subscriber.log:2019-02-13 11:29:02.344 EST [9970] WARNING: out of logical replication worker slots
log/010_truncate_subscriber.log:2019-02-13 11:29:03.401 EST [9970] WARNING: out of logical replication worker slots

Anyway, I think we might be able to fix this along the lines of

CloseTransientFile(fd);

+ /* ensure snapshot file is down to stable storage */
+ fsync_fname(tmppath, false);
fsync_fname("pg_logical/snapshots", true);

/*
* We may overwrite the work from some other backend, but that's ok, our
* snapshot is valid as well, we'll just have done some superfluous work.
*/
if (rename(tmppath, path) != 0)
{
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not rename file \"%s\" to \"%s\": %m",
tmppath, path)));
}

- /* make sure we persist */
- fsync_fname(path, false);
+ /* ensure we persist the file rename */
fsync_fname("pg_logical/snapshots", true);

The existing code here seems simply wacky/unsafe to me regardless
of this race condition: couldn't it potentially result in a corrupt
snapshot file appearing with a valid name, if the system crashes
after persisting the rename but before it's pushed the data out?

I also wonder why bother with the directory sync just before the
rename.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 17:11:01
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> I've managed to reproduce this locally, and obtained this PANIC:

Cool. How exactly?

Nice catch.

> Anyway, I think we might be able to fix this along the lines of
>
> CloseTransientFile(fd);
>
> + /* ensure snapshot file is down to stable storage */
> + fsync_fname(tmppath, false);
> fsync_fname("pg_logical/snapshots", true);
>
> /*
> * We may overwrite the work from some other backend, but that's ok, our
> * snapshot is valid as well, we'll just have done some superfluous work.
> */
> if (rename(tmppath, path) != 0)
> {
> ereport(ERROR,
> (errcode_for_file_access(),
> errmsg("could not rename file \"%s\" to \"%s\": %m",
> tmppath, path)));
> }
>
> - /* make sure we persist */
> - fsync_fname(path, false);
> + /* ensure we persist the file rename */
> fsync_fname("pg_logical/snapshots", true);

Hm, but that's not the same? On some filesystems one needs the directory
fsync, on some the file fsync, and I think both in some cases. ISTM we
should just open the file before the rename, and then use fsync() on the
filehandle rather than the filename. Then a concurrent rename ought not
to hurt us?

> The existing code here seems simply wacky/unsafe to me regardless
> of this race condition: couldn't it potentially result in a corrupt
> snapshot file appearing with a valid name, if the system crashes
> after persisting the rename but before it's pushed the data out?

What do you mean precisely with "before it's pushed the data out"?

> I also wonder why bother with the directory sync just before the
> rename.

Because on some FS/OS combinations the size of the renamed-into-place
file isn't guaranteed to be durable unless the directory was
fsynced. Otherwise there's the possibility to have incomplete data if we
crash after renaming, but before fsyncing.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 17:37:35
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
>> I've managed to reproduce this locally, and obtained this PANIC:

> Cool. How exactly?

Andrew told me that nightjar is actually running in a qemu VM,
so I set up freebsd 9.0 in a qemu VM, and boom. It took a bit
of fiddling with qemu parameters, but for such a timing-sensitive
problem, that's not surprising.

>> Anyway, I think we might be able to fix this along the lines of
>> [ fsync the data before renaming not after ]

> Hm, but that's not the same? On some filesystems one needs the directory
> fsync, on some the file fsync, and I think both in some cases.

Now that I look at it, there's a pg_fsync() just above this, so
I wonder why we need a second fsync on the file at all. fsync'ing
the directory is needed to ensure the directory entry is on disk;
but the file data should be out already, or else the kernel is
simply failing to honor fsync.

>> The existing code here seems simply wacky/unsafe to me regardless
>> of this race condition: couldn't it potentially result in a corrupt
>> snapshot file appearing with a valid name, if the system crashes
>> after persisting the rename but before it's pushed the data out?

> What do you mean precisely with "before it's pushed the data out"?

Given the previous pg_fsync, this isn't an issue.

>> I also wonder why bother with the directory sync just before the
>> rename.

> Because on some FS/OS combinations the size of the renamed-into-place
> file isn't guaranteed to be durable unless the directory was
> fsynced.

Bleah. But in any case, the rename should not create a situation
in which we need to fsync the file data again.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 17:41:51
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> >> I've managed to reproduce this locally, and obtained this PANIC:
>
> > Cool. How exactly?
>
> Andrew told me that nightjar is actually running in a qemu VM,
> so I set up freebsd 9.0 in a qemu VM, and boom. It took a bit
> of fiddling with qemu parameters, but for such a timing-sensitive
> problem, that's not surprising.

Ah.

> >> I also wonder why bother with the directory sync just before the
> >> rename.
>
> > Because on some FS/OS combinations the size of the renamed-into-place
> > file isn't guaranteed to be durable unless the directory was
> > fsynced.
>
> Bleah. But in any case, the rename should not create a situation
> in which we need to fsync the file data again.

Well, it's not super well defined which of either you need to make the
rename durable, and it appears to differ between OSs. Any argument
against fixing it up like I suggested, by using an fd from before the
rename?

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 17:59:19
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
>> Bleah. But in any case, the rename should not create a situation
>> in which we need to fsync the file data again.

> Well, it's not super well defined which of either you need to make the
> rename durable, and it appears to differ between OSs. Any argument
> against fixing it up like I suggested, by using an fd from before the
> rename?

I'm unimpressed. You're speculating about the filesystem having random
deviations from POSIX behavior, and using that weak argument to justify a
totally untested technique having its own obvious portability hazards.
Who's to say that an fsync on a file opened before a rename is going to do
anything good after the rename? (On, eg, NFS there are obvious reasons
why it might not.)

Also, I wondered why this is coming out as a PANIC. I thought originally
that somebody must be causing this code to run in a critical section,
but it looks like the real issue is just that fsync_fname() uses
data_sync_elevel, which is

int
data_sync_elevel(int elevel)
{
return data_sync_retry ? elevel : PANIC;
}

I really really don't want us doing questionably-necessary fsyncs with a
PANIC as the result. Perhaps more to the point, the way this was coded,
the PANIC applies to open() failures in fsync_fname_ext() not just fsync()
failures; that's painting with too broad a brush isn't it?

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 18:12:25
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
> >> Bleah. But in any case, the rename should not create a situation
> >> in which we need to fsync the file data again.
>
> > Well, it's not super well defined which of either you need to make the
> > rename durable, and it appears to differ between OSs. Any argument
> > against fixing it up like I suggested, by using an fd from before the
> > rename?
>
> I'm unimpressed. You're speculating about the filesystem having random
> deviations from POSIX behavior, and using that weak argument to justify a
> totally untested technique having its own obvious portability hazards.

Uhm, we've reproduced failures due to the lack of such fsyncs at some
point. And not some fringe OS, but ext4 (albeit with data=writeback).

I don't think POSIX has yet figured out what they actually think is
required:
http://austingroupbugs.net/view.php?id=672

I guess we could just ignore ENOENT in this case, that ought to be just
as safe as using the old fd.

> Also, I wondered why this is coming out as a PANIC. I thought originally
> that somebody must be causing this code to run in a critical section,
> but it looks like the real issue is just that fsync_fname() uses
> data_sync_elevel, which is
>
> int
> data_sync_elevel(int elevel)
> {
> return data_sync_retry ? elevel : PANIC;
> }
>
> I really really don't want us doing questionably-necessary fsyncs with a
> PANIC as the result.

Well, given the 'failed fsync throws dirty data away' issue, I don't
quite see what we can do otherwise. But:

> Perhaps more to the point, the way this was coded, the PANIC applies
> to open() failures in fsync_fname_ext() not just fsync() failures;
> that's painting with too broad a brush isn't it?

That indeed seems wrong. Thomas? I'm not quite sure how to best fix
this though - I guess we could rename fsync_fname_ext's eleval parameter
to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
to bad?

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 18:24:03
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
>> Perhaps more to the point, the way this was coded, the PANIC applies
>> to open() failures in fsync_fname_ext() not just fsync() failures;
>> that's painting with too broad a brush isn't it?

> That indeed seems wrong. Thomas? I'm not quite sure how to best fix
> this though - I guess we could rename fsync_fname_ext's eleval parameter
> to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> to bad?

Perhaps fsync_fname() should pass the elevel through as-is, and
then fsync_fname_ext() apply the data_sync_elevel() promotion only
to the fsync() call not the open()? I'm not sure.

The bigger picture here is that there are probably some call sites where
PANIC on open() failure is appropriate too. So having fsync_fname[_ext]
deciding what to do on its own is likely to be inadequate.

If we fix this by allowing ENOENT to not be an error in this particular
call case, that's going to involve an fsync_fname_ext API change anyway...

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 18:33:03
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-02-13 13:24:03 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
> >> Perhaps more to the point, the way this was coded, the PANIC applies
> >> to open() failures in fsync_fname_ext() not just fsync() failures;
> >> that's painting with too broad a brush isn't it?
>
> > That indeed seems wrong. Thomas? I'm not quite sure how to best fix
> > this though - I guess we could rename fsync_fname_ext's eleval parameter
> > to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> > to bad?
>
> Perhaps fsync_fname() should pass the elevel through as-is, and
> then fsync_fname_ext() apply the data_sync_elevel() promotion only
> to the fsync() call not the open()? I'm not sure.

Yea, that's probably not a bad plan. It'd address your:

> The bigger picture here is that there are probably some call sites where
> PANIC on open() failure is appropriate too. So having fsync_fname[_ext]
> deciding what to do on its own is likely to be inadequate.

Seems to me we ought to do this regardless of the bug discussed
here. But we'd nee dto be careful that we'd take the "more severe" error
between the passed in elevel and data_sync_elevel(). Otherwise we'd end
up downgrading errors...

> If we fix this by allowing ENOENT to not be an error in this particular
> call case, that's going to involve an fsync_fname_ext API change anyway...

I was kinda pondering just open coding it. I am not yet convinced that
my idea of just using an open FD isn't the least bad approach for the
issue at hand. What precisely is the NFS issue you're concerned about?

Right now fsync_fname_ext isn't exposed outside fd.c...

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 19:11:26
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> I was kinda pondering just open coding it. I am not yet convinced that
> my idea of just using an open FD isn't the least bad approach for the
> issue at hand. What precisely is the NFS issue you're concerned about?

I'm not sure that fsync-on-FD after the rename will work, considering that
the issue here is that somebody might've unlinked the file altogether
before we get to doing the fsync. I don't have a hard time believing that
that might result in a failure report on NFS or similar. Yeah, it's
hypothetical, but the argument that we need a repeat fsync at all seems
equally hypothetical.

> Right now fsync_fname_ext isn't exposed outside fd.c...

Mmm. That makes it easier to consider changing its API.

regards, tom lane


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 20:52:33
Message-ID: CAEepm=0wB7vgztC5sg2nmJ-H3bnrBT5GQfhUzP+Ffq-WT3g8VA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I was kinda pondering just open coding it. I am not yet convinced that
> > my idea of just using an open FD isn't the least bad approach for the
> > issue at hand. What precisely is the NFS issue you're concerned about?
>
> I'm not sure that fsync-on-FD after the rename will work, considering that
> the issue here is that somebody might've unlinked the file altogether
> before we get to doing the fsync. I don't have a hard time believing that
> that might result in a failure report on NFS or similar. Yeah, it's
> hypothetical, but the argument that we need a repeat fsync at all seems
> equally hypothetical.
>
> > Right now fsync_fname_ext isn't exposed outside fd.c...
>
> Mmm. That makes it easier to consider changing its API.

Just to make sure I understand: it's OK for the file not to be there
when we try to fsync it by name, because a concurrent checkpoint can
remove it, having determined that we don't need it anymore? In other
words, we really needed either missing_ok=true semantics, or to use
the fd we already had instead of the name?

I found 3 examples of this failing with an ERROR (though not turning
the BF red, so nobody noticed) before the PANIC patch went in:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-09-10%2020%3A54%3A21&amp;stg=subscription-check
2018-09-10 17:20:09.247 EDT [23287] sub1 ERROR: could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory
2018-09-10 17:20:09.247 EDT [23285] ERROR: could not receive data
from WAL stream: ERROR: could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-08-31%2023%3A25%3A41&amp;stg=subscription-check
2018-08-31 19:52:06.634 EDT [52724] sub1 ERROR: could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-31 19:52:06.634 EDT [52721] ERROR: could not receive data
from WAL stream: ERROR: could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-08-22%2021%3A49%3A18&amp;stg=subscription-check
2018-08-22 18:10:29.422 EDT [44208] sub1 ERROR: could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-22 18:10:29.422 EDT [44206] ERROR: could not receive data
from WAL stream: ERROR: could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory

--
Thomas Munro
http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 21:42:20
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> I found 3 examples of this failing with an ERROR (though not turning
> the BF red, so nobody noticed) before the PANIC patch went in:

Yeah, I suspected that had happened before with less-obvious consequences.
Now that we know where the problem is, you could probably make it highly
reproducible by inserting a sleep of a few msec between the rename and the
second fsync.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-02-13 21:51:47
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:
> On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > I was kinda pondering just open coding it. I am not yet convinced that
> > > my idea of just using an open FD isn't the least bad approach for the
> > > issue at hand. What precisely is the NFS issue you're concerned about?
> >
> > I'm not sure that fsync-on-FD after the rename will work, considering that
> > the issue here is that somebody might've unlinked the file altogether
> > before we get to doing the fsync. I don't have a hard time believing that
> > that might result in a failure report on NFS or similar. Yeah, it's
> > hypothetical, but the argument that we need a repeat fsync at all seems
> > equally hypothetical.
> >
> > > Right now fsync_fname_ext isn't exposed outside fd.c...
> >
> > Mmm. That makes it easier to consider changing its API.
>
> Just to make sure I understand: it's OK for the file not to be there
> when we try to fsync it by name, because a concurrent checkpoint can
> remove it, having determined that we don't need it anymore? In other
> words, we really needed either missing_ok=true semantics, or to use
> the fd we already had instead of the name?

I'm not yet sure that that's actually something that's supposed to
happen, I got to spend some time analysing how this actually
happens. Normally the contents of the slot should actually prevent it
from being removed (as they're newer than
ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
a bug in the drop logic in newer releases.

Greetings,

Andres Freund


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-03-10 18:40:15
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers


On 2/13/19 1:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>>> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
>>>> Bleah. But in any case, the rename should not create a situation
>>>> in which we need to fsync the file data again.
>>> Well, it's not super well defined which of either you need to make the
>>> rename durable, and it appears to differ between OSs. Any argument
>>> against fixing it up like I suggested, by using an fd from before the
>>> rename?
>> I'm unimpressed. You're speculating about the filesystem having random
>> deviations from POSIX behavior, and using that weak argument to justify a
>> totally untested technique having its own obvious portability hazards.
> Uhm, we've reproduced failures due to the lack of such fsyncs at some
> point. And not some fringe OS, but ext4 (albeit with data=writeback).
>
> I don't think POSIX has yet figured out what they actually think is
> required:
> http://austingroupbugs.net/view.php?id=672
>
> I guess we could just ignore ENOENT in this case, that ought to be just
> as safe as using the old fd.
>
>
>> Also, I wondered why this is coming out as a PANIC. I thought originally
>> that somebody must be causing this code to run in a critical section,
>> but it looks like the real issue is just that fsync_fname() uses
>> data_sync_elevel, which is
>>
>> int
>> data_sync_elevel(int elevel)
>> {
>> return data_sync_retry ? elevel : PANIC;
>> }
>>
>> I really really don't want us doing questionably-necessary fsyncs with a
>> PANIC as the result.
> Well, given the 'failed fsync throws dirty data away' issue, I don't
> quite see what we can do otherwise. But:
>
>
>> Perhaps more to the point, the way this was coded, the PANIC applies
>> to open() failures in fsync_fname_ext() not just fsync() failures;
>> that's painting with too broad a brush isn't it?
> That indeed seems wrong. Thomas? I'm not quite sure how to best fix
> this though - I guess we could rename fsync_fname_ext's eleval parameter
> to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> to bad?
>

Thread seems to have gone quiet ...

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-06-27 01:10:49
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:
>> Just to make sure I understand: it's OK for the file not to be there
>> when we try to fsync it by name, because a concurrent checkpoint can
>> remove it, having determined that we don't need it anymore? In other
>> words, we really needed either missing_ok=true semantics, or to use
>> the fd we already had instead of the name?

> I'm not yet sure that that's actually something that's supposed to
> happen, I got to spend some time analysing how this actually
> happens. Normally the contents of the slot should actually prevent it
> from being removed (as they're newer than
> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
> a bug in the drop logic in newer releases.

My animal dromedary just reproduced this failure, which we've previously
only seen on nightjar.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-26%2023%3A57%3A45

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-06-28 15:17:45
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> My animal dromedary just reproduced this failure, which we've previously
> only seen on nightjar.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-26%2023%3A57%3A45

Twice:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-28%2006%3A50%3A41

Since this is a live (if old) system, not some weird qemu emulation,
we can no longer pretend that it might not be reachable in the field.
I've added an open item.

regards, tom lane


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-08-13 08:04:35
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 13, 2019 at 01:51:47PM -0800, Andres Freund wrote:
> I'm not yet sure that that's actually something that's supposed to
> happen, I got to spend some time analysing how this actually
> happens. Normally the contents of the slot should actually prevent it
> from being removed (as they're newer than
> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
> a bug in the drop logic in newer releases.

In the same context, could it be a consequence of 9915de6c which has
introduced a conditional variable to control slot operations? This
could have exposed more easily a pre-existing race condition.
--
Michael


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-08-26 13:29:04
Message-ID: 20190826132904.3ayuw36qzl2c4ktr@development
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 13, 2019 at 05:04:35PM +0900, Michael Paquier wrote:
>On Wed, Feb 13, 2019 at 01:51:47PM -0800, Andres Freund wrote:
>> I'm not yet sure that that's actually something that's supposed to
>> happen, I got to spend some time analysing how this actually
>> happens. Normally the contents of the slot should actually prevent it
>> from being removed (as they're newer than
>> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
>> a bug in the drop logic in newer releases.
>
>In the same context, could it be a consequence of 9915de6c which has
>introduced a conditional variable to control slot operations? This
>could have exposed more easily a pre-existing race condition.
>--

This is one of the remaining open items, and we don't seem to be moving
forward with it :-(

I'm willing to take a stab at it, but to do that I need a way to
reproduce it. Tom, you mentioned you've managed to reproduce it in a
qemu instance, but that it took some fiddling with qemu parmeters or
something. Can you share what exactly was necessary?

An observation about the issue - while we started to notice this after
Decemeber, that's mostly because the PANIC patch went it shortly before.
We've however seen the issue before, as Thomas Munro mentioned in [1].

Those reports are from August, so it's quite possible something in the
first CF upset the code. And there's only a single commit in 2018-07
that seems related to logical decoding / snapshots [2], i.e. f49a80c:

commit f49a80c481f74fa81407dce8e51dea6956cb64f8
Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Date: Tue Jun 26 16:38:34 2018 -0400

Fix "base" snapshot handling in logical decoding

...

The other reason to suspect this is related is that the fix also made it
to REL_11_STABLE at that time, and if you check the buildfarm data [3],
you'll see 11 fails on nightjar too, from time to time.

This means it's not a 12+ only issue, it's a live issue on 11. I don't
know if f49a80c is the culprit, or if it simply uncovered a pre-existing
bug (e.g. due to timing).

[1] https://www.postgresql.org/message-id/CAEepm%3D0wB7vgztC5sg2nmJ-H3bnrBT5GQfhUzP%2BFfq-WT3g8VA%40mail.gmail.com

[2] https://commitfest.postgresql.org/18/1650/

[3] https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=nightjar&br=REL_11_STABLE

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-08-26 15:01:20
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> I'm willing to take a stab at it, but to do that I need a way to
> reproduce it. Tom, you mentioned you've managed to reproduce it in a
> qemu instance, but that it took some fiddling with qemu parmeters or
> something. Can you share what exactly was necessary?

I don't recall exactly what I did anymore, and it was pretty fiddly
anyway. Upthread I suggested

>> Now that we know where the problem is, you could probably make it highly
>> reproducible by inserting a sleep of a few msec between the rename and the
>> second fsync.

so why not try that first?

regards, tom lane


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-08-26 20:23:25
Message-ID: 20190826202325.soqe6gmgnenbmzee@development
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 26, 2019 at 11:01:20AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> I'm willing to take a stab at it, but to do that I need a way to
>> reproduce it. Tom, you mentioned you've managed to reproduce it in a
>> qemu instance, but that it took some fiddling with qemu parmeters or
>> something. Can you share what exactly was necessary?
>
>I don't recall exactly what I did anymore, and it was pretty fiddly
>anyway. Upthread I suggested
>
>>> Now that we know where the problem is, you could probably make it highly
>>> reproducible by inserting a sleep of a few msec between the rename and the
>>> second fsync.
>
>so why not try that first?
>

Ah, right. I'll give that a try.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-17 15:54:45
Message-ID: CA+TgmoaNOMG9+Ho9d3CX+-10O7+nqqvmSpXb1m0F3dqWB4C-8g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> This is one of the remaining open items, and we don't seem to be moving
> forward with it :-(

Why exactly is this an open item, anyway?

I don't find any discussion on the thread which makes a clear argument
that this problem originated with v12. If it didn't, it's still a bug
and it still ought to be fixed at some point, but it's not a
release-blocker.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-17 16:39:33
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> This is one of the remaining open items, and we don't seem to be moving
>> forward with it :-(

> Why exactly is this an open item, anyway?

The reason it's still here is that Andres expressed a concern that
there might be more than meets the eye in this. What meets the eye
is that PANICing on file-not-found is not appropriate here, but Andres
seemed to think that the file not being present might reflect an
actual bug not just an expectable race condition [1].

Personally I'd be happy just to treat it as an expectable case and
fix the code to not PANIC on file-not-found.

In either case, it probably belongs in the "older bugs" section;
nightjar is showing the same failure on v11 from time to time.

regards, tom lane

[1] https://www.postgresql.org/message-id/20190213215147.cjbymfojf6xndr4t%40alap3.anarazel.de


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-17 19:45:10
Message-ID: 20190917194510.iqwyl3be62pz7l27@development
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 17, 2019 at 12:39:33PM -0400, Tom Lane wrote:
>Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> This is one of the remaining open items, and we don't seem to be moving
>>> forward with it :-(
>
>> Why exactly is this an open item, anyway?
>
>The reason it's still here is that Andres expressed a concern that
>there might be more than meets the eye in this. What meets the eye
>is that PANICing on file-not-found is not appropriate here, but Andres
>seemed to think that the file not being present might reflect an
>actual bug not just an expectable race condition [1].
>
>Personally I'd be happy just to treat it as an expectable case and
>fix the code to not PANIC on file-not-found.
>

FWIW I agree with Andres that there probably is an actual bug. The file
should not just disappear like this, it's clearly unexpected so the
PANIC does not seem entirely inappropriate.

I've tried reproducing the issue on my local systems, with the extra
sleeps between fsyncs and so on, but I haven't managed to trigger it so
far :-(

>In either case, it probably belongs in the "older bugs" section;
>nightjar is showing the same failure on v11 from time to time.
>

Yes, it should be moved to the older section - it's clearly a v11 bug.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-18 00:58:15
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 17, 2019 at 09:45:10PM +0200, Tomas Vondra wrote:
> FWIW I agree with Andres that there probably is an actual bug. The file
> should not just disappear like this, it's clearly unexpected so the
> PANIC does not seem entirely inappropriate.

Agreed.

> I've tried reproducing the issue on my local systems, with the extra
> sleeps between fsyncs and so on, but I haven't managed to trigger it so
> far :-(

On my side, I have let this thing run for a couple of hours with a
patched version to include a sleep between the rename and the sync but
I could not reproduce it either:
#!/bin/bash
attempt=0
while true; do
attempt=$((attempt+1))
echo "Attempt $attempt"
cd $HOME/postgres/src/test/recovery/
PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
ERRNUM=$?
if [ $ERRNUM != 0 ]; then
echo "Failed at attempt $attempt"
exit $ERRNUM
fi
done
> Yes, it should be moved to the older section - it's clearly a v11 bug.

And agreed.
--
Michael


From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-18 10:55:14
Message-ID: CAGz5QCJv5JbRDsATDTkJqq7h9F7u0QLnNnLHfxR1nEOa4DnkJQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hello Michael,

On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On my side, I have let this thing run for a couple of hours with a
> patched version to include a sleep between the rename and the sync but
> I could not reproduce it either:
> #!/bin/bash
> attempt=0
> while true; do
> attempt=$((attempt+1))
> echo "Attempt $attempt"
> cd $HOME/postgres/src/test/recovery/
> PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
> ERRNUM=$?
> if [ $ERRNUM != 0 ]; then
> echo "Failed at attempt $attempt"
> exit $ERRNUM
> fi
> done
I think the failing test is src/test/subscription/t/010_truncate.pl.
I've tried to reproduce the same failure using your script in OS X
10.14 and Ubuntu 18.04.2 (Linux version 5.0.0-23-generic), but
couldn't reproduce the same.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-18 21:58:08
Message-ID: 20190918215808.yonxqgycme6pbctp@development
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 18, 2019 at 04:25:14PM +0530, Kuntal Ghosh wrote:
>Hello Michael,
>
>On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>>
>> On my side, I have let this thing run for a couple of hours with a
>> patched version to include a sleep between the rename and the sync but
>> I could not reproduce it either:
>> #!/bin/bash
>> attempt=0
>> while true; do
>> attempt=$((attempt+1))
>> echo "Attempt $attempt"
>> cd $HOME/postgres/src/test/recovery/
>> PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
>> ERRNUM=$?
>> if [ $ERRNUM != 0 ]; then
>> echo "Failed at attempt $attempt"
>> exit $ERRNUM
>> fi
>> done
>I think the failing test is src/test/subscription/t/010_truncate.pl.
>I've tried to reproduce the same failure using your script in OS X
>10.14 and Ubuntu 18.04.2 (Linux version 5.0.0-23-generic), but
>couldn't reproduce the same.
>

I kinda suspect it might be just a coincidence that it fails during that
particular test. What likely plays a role here is a checkpoint timing
(AFAICS that's the thing removing the file). On most systems the tests
complete before any checkpoint is triggered, hence no issue.

Maybe aggressively triggering checkpoints on the running cluter from
another session would do the trick ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-19 04:23:05
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 18, 2019 at 11:58:08PM +0200, Tomas Vondra wrote:
> I kinda suspect it might be just a coincidence that it fails during that
> particular test. What likely plays a role here is a checkpoint timing
> (AFAICS that's the thing removing the file). On most systems the tests
> complete before any checkpoint is triggered, hence no issue.
>
> Maybe aggressively triggering checkpoints on the running cluter from
> another session would do the trick ...

Now that I recall, another thing I forgot to mention on this thread is
that I patched guc.c to reduce the minimum of checkpoint_timeout to
1s.
--
Michael


From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-19 11:50:15
Message-ID: CAGz5QC+5_mPFoDj7ZSMV0gwvMY+kdOp4t1w=TTDpzuV9F2-X6g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hello hackers,

It seems there is a pattern how the error is occurring in different
systems. Following are the relevant log snippets:

nightjar:
sub3 LOG: received replication command: CREATE_REPLICATION_SLOT
"sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
sub3 LOG: logical decoding found consistent point at 0/160B578
sub1 PANIC: could not open file
"pg_logical/snapshots/0-160B578.snap": No such file or directory

dromedary scenario 1:
sub3_16414_sync_16399 LOG: received replication command:
CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
pgoutput USE_SNAPSHOT
sub3_16414_sync_16399 LOG: logical decoding found consistent point at 0/15EA694
sub2 PANIC: could not open file
"pg_logical/snapshots/0-15EA694.snap": No such file or directory

dromedary scenario 2:
sub3_16414_sync_16399 LOG: received replication command:
CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
pgoutput USE_SNAPSHOT
sub3_16414_sync_16399 LOG: logical decoding found consistent point at 0/15EA694
sub1 PANIC: could not open file
"pg_logical/snapshots/0-15EA694.snap": No such file or directory

While subscription 3 is created, it eventually reaches to a consistent
snapshot point and prints the WAL location corresponding to it. It
seems sub1/sub2 immediately fails to serialize the snapshot to the
.snap file having the same WAL location.

Is this helpful?

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 01:01:55
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 19, 2019 at 05:20:15PM +0530, Kuntal Ghosh wrote:
> While subscription 3 is created, it eventually reaches to a consistent
> snapshot point and prints the WAL location corresponding to it. It
> seems sub1/sub2 immediately fails to serialize the snapshot to the
> .snap file having the same WAL location.
>
> Is this helpful?

It looks like you are pointing to something here.
--
Michael


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 15:30:48
Message-ID: 20190920153048.wpe27ht5lcs3zlrm@development
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 19, 2019 at 01:23:05PM +0900, Michael Paquier wrote:
>On Wed, Sep 18, 2019 at 11:58:08PM +0200, Tomas Vondra wrote:
>> I kinda suspect it might be just a coincidence that it fails during that
>> particular test. What likely plays a role here is a checkpoint timing
>> (AFAICS that's the thing removing the file). On most systems the tests
>> complete before any checkpoint is triggered, hence no issue.
>>
>> Maybe aggressively triggering checkpoints on the running cluter from
>> another session would do the trick ...
>
>Now that I recall, another thing I forgot to mention on this thread is
>that I patched guc.c to reduce the minimum of checkpoint_timeout to
>1s.

But even with that change you haven't managed to reproduce the issue,
right? Or am I misunderstanding?

regarss

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 17:08:31
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-09-19 17:20:15 +0530, Kuntal Ghosh wrote:
> It seems there is a pattern how the error is occurring in different
> systems. Following are the relevant log snippets:
>
> nightjar:
> sub3 LOG: received replication command: CREATE_REPLICATION_SLOT
> "sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
> sub3 LOG: logical decoding found consistent point at 0/160B578
> sub1 PANIC: could not open file
> "pg_logical/snapshots/0-160B578.snap": No such file or directory
>
> dromedary scenario 1:
> sub3_16414_sync_16399 LOG: received replication command:
> CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
> pgoutput USE_SNAPSHOT
> sub3_16414_sync_16399 LOG: logical decoding found consistent point at 0/15EA694
> sub2 PANIC: could not open file
> "pg_logical/snapshots/0-15EA694.snap": No such file or directory
>
>
> dromedary scenario 2:
> sub3_16414_sync_16399 LOG: received replication command:
> CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
> pgoutput USE_SNAPSHOT
> sub3_16414_sync_16399 LOG: logical decoding found consistent point at 0/15EA694
> sub1 PANIC: could not open file
> "pg_logical/snapshots/0-15EA694.snap": No such file or directory
>
> While subscription 3 is created, it eventually reaches to a consistent
> snapshot point and prints the WAL location corresponding to it. It
> seems sub1/sub2 immediately fails to serialize the snapshot to the
> .snap file having the same WAL location.

Since now a number of people (I tried as well), failed to reproduce this
locally, I propose that we increase the log-level during this test on
master. And perhaps expand the set of debugging information. With the
hope that the additional information on the cases encountered on the bf
helps us build a reproducer or, even better, diagnose the issue
directly. If people agree, I'll come up with a patch.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 20:25:21
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Since now a number of people (I tried as well), failed to reproduce this
> locally, I propose that we increase the log-level during this test on
> master. And perhaps expand the set of debugging information. With the
> hope that the additional information on the cases encountered on the bf
> helps us build a reproducer or, even better, diagnose the issue
> directly. If people agree, I'll come up with a patch.

I recreated my freebsd-9-under-qemu setup and I can still reproduce
the problem, though not with high reliability (order of 1 time in 10).
Anything particular you want logged?

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 21:26:03
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > Since now a number of people (I tried as well), failed to reproduce this
> > locally, I propose that we increase the log-level during this test on
> > master. And perhaps expand the set of debugging information. With the
> > hope that the additional information on the cases encountered on the bf
> > helps us build a reproducer or, even better, diagnose the issue
> > directly. If people agree, I'll come up with a patch.
>
> I recreated my freebsd-9-under-qemu setup and I can still reproduce
> the problem, though not with high reliability (order of 1 time in 10).
> Anything particular you want logged?

A DEBUG2 log would help a fair bit, because it'd log some information
about what changes the "horizons" determining when data may be removed.

Perhaps with the additional elogs attached? I lowered some messages to
DEBUG2 so we don't have to suffer the noise of the ipc.c DEBUG3
messages.

If I use a TEMP_CONFIG setting log_min_messages=DEBUG2 with the patches
applied, the subscription tests still pass.

I hope they still fail on your setup, even though the increased logging
volume probably changes timing somewhat.

Greetings,

Andres Freund

Attachment Content-Type Size
verbose.diff text/x-diff 3.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 21:49:27
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
>> I recreated my freebsd-9-under-qemu setup and I can still reproduce
>> the problem, though not with high reliability (order of 1 time in 10).
>> Anything particular you want logged?

> A DEBUG2 log would help a fair bit, because it'd log some information
> about what changes the "horizons" determining when data may be removed.

Actually, what I did was as attached [1], and I am getting traces like
[2]. The problem seems to occur only when there are two or three
processes concurrently creating the same snapshot file. It's not
obvious from the debug trace, but the snapshot file *does* exist
after the music stops.

It is very hard to look at this trace and conclude anything other
than "rename(2) is broken, it's not atomic". Nothing in our code
has deleted the file: no checkpoint has started, nor do we see
the DEBUG1 output that CheckPointSnapBuild ought to produce.
But fsync_fname momentarily can't see it (and then later another
process does see it).

It is now apparent why we're only seeing this on specific ancient
platforms. I looked around for info about rename(2) not being
atomic, and I found this info about FreeBSD:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=94849

The reported symptom there isn't quite the same, so probably there
is another issue, but there is plenty of reason to be suspicious
that UFS rename(2) is buggy in this release. As for dromedary's
ancient version of macOS, Apple is exceedinly untransparent about
their bugs, but I found

http://www.weirdnet.nl/apple/rename.html

In short, what we got here is OS bugs that have probably been
resolved years ago.

The question is what to do next. Should we just retire these
specific buildfarm critters, or do we want to push ahead with
getting rid of the PANIC here?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 21:51:06
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Sigh, forgot about attaching the attachments ...

regards, tom lane

Attachment Content-Type Size
more-logging.patch text/x-diff 1.3 KB
010_truncate_publisher.log text/plain 38.4 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 22:03:32
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-09-20 17:49:27 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
> >> I recreated my freebsd-9-under-qemu setup and I can still reproduce
> >> the problem, though not with high reliability (order of 1 time in 10).
> >> Anything particular you want logged?
>
> > A DEBUG2 log would help a fair bit, because it'd log some information
> > about what changes the "horizons" determining when data may be removed.
>
> Actually, what I did was as attached [1], and I am getting traces like
> [2]. The problem seems to occur only when there are two or three
> processes concurrently creating the same snapshot file. It's not
> obvious from the debug trace, but the snapshot file *does* exist
> after the music stops.
>
> It is very hard to look at this trace and conclude anything other
> than "rename(2) is broken, it's not atomic". Nothing in our code
> has deleted the file: no checkpoint has started, nor do we see
> the DEBUG1 output that CheckPointSnapBuild ought to produce.
> But fsync_fname momentarily can't see it (and then later another
> process does see it).

Yikes. No wondering most of us weren't able to reproduce the
problem. And that staring at our code didn't point to a bug.

Nice catch.

> In short, what we got here is OS bugs that have probably been
> resolved years ago.
>
> The question is what to do next. Should we just retire these
> specific buildfarm critters, or do we want to push ahead with
> getting rid of the PANIC here?

Hm. Given that the fsync failing is actually an issue, I'm somewhat
disinclined to remove the PANIC. It's not like only raising an ERROR
actually solves anything, except making the problem even harder to
diagnose? Or that we otherwise are ok, with renames not being atomic?

So I'd be tentatively in favor of either upgrading, replacing the
filesystem (perhaps ZFS isn't buggy in the same way?), or retiring
those animals.

Greetings,

Andres Freund


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 22:06:20
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 2019-Sep-20, Tom Lane wrote:

> Actually, what I did was as attached [1], and I am getting traces like
> [2]. The problem seems to occur only when there are two or three
> processes concurrently creating the same snapshot file. It's not
> obvious from the debug trace, but the snapshot file *does* exist
> after the music stops.

Uh .. I didn't think it was possible that we would build the same
snapshot file more than once. Isn't that a waste of time anyway? Maybe
we can fix the symptom by just not doing that in the first place?
I don't have a strategy to do that, but seems worth considering before
retiring the bf animals.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>,Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>,Michael Paquier <michael(at)paquier(dot)xyz>,Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>,Robert Haas <robertmhaas(at)gmail(dot)com>,Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>,PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 22:11:22
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On September 20, 2019 3:06:20 PM PDT, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>On 2019-Sep-20, Tom Lane wrote:
>
>> Actually, what I did was as attached [1], and I am getting traces
>like
>> [2]. The problem seems to occur only when there are two or three
>> processes concurrently creating the same snapshot file. It's not
>> obvious from the debug trace, but the snapshot file *does* exist
>> after the music stops.
>
>Uh .. I didn't think it was possible that we would build the same
>snapshot file more than once. Isn't that a waste of time anyway?
>Maybe
>we can fix the symptom by just not doing that in the first place?
>I don't have a strategy to do that, but seems worth considering before
>retiring the bf animals.

We try to avoid it, but the check is racy. Check comments in SnapBuildSerialize. We could introduce locking etc to avoid that, but that seems overkill, given that were really just dealing with a broken os.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 22:17:02
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Uh .. I didn't think it was possible that we would build the same
> snapshot file more than once. Isn't that a waste of time anyway? Maybe
> we can fix the symptom by just not doing that in the first place?
> I don't have a strategy to do that, but seems worth considering before
> retiring the bf animals.

The comment near the head of SnapBuildSerialize says

* there is an obvious race condition here between the time we stat(2) the
* file and us writing the file. But we rename the file into place
* atomically and all files created need to contain the same data anyway,
* so this is perfectly fine, although a bit of a resource waste. Locking
* seems like pointless complication.

which seems like a reasonable argument. Also, this is hardly the only
place where we expect rename(2) to be atomic. So I tend to agree with
Andres that we should consider OSes with such a bug to be unsupported.

Dromedary is running the last release of macOS that supports 32-bit
hardware, so if we decide to kick that to the curb, I'd either shut
down the box or put some newer Linux or BSD variant on it.

regards, tom lane


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-21 02:16:59
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 20, 2019 at 05:30:48PM +0200, Tomas Vondra wrote:
> But even with that change you haven't managed to reproduce the issue,
> right? Or am I misunderstanding?

No, I was not able to see it on my laptop running Debian.
--
Michael


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-23 20:12:45
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers


On 9/20/19 6:17 PM, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> Uh .. I didn't think it was possible that we would build the same
>> snapshot file more than once. Isn't that a waste of time anyway? Maybe
>> we can fix the symptom by just not doing that in the first place?
>> I don't have a strategy to do that, but seems worth considering before
>> retiring the bf animals.
> The comment near the head of SnapBuildSerialize says
>
> * there is an obvious race condition here between the time we stat(2) the
> * file and us writing the file. But we rename the file into place
> * atomically and all files created need to contain the same data anyway,
> * so this is perfectly fine, although a bit of a resource waste. Locking
> * seems like pointless complication.
>
> which seems like a reasonable argument. Also, this is hardly the only
> place where we expect rename(2) to be atomic. So I tend to agree with
> Andres that we should consider OSes with such a bug to be unsupported.
>
> Dromedary is running the last release of macOS that supports 32-bit
> hardware, so if we decide to kick that to the curb, I'd either shut
> down the box or put some newer Linux or BSD variant on it.
>
>

Well, nightjar is on FBSD 9.0 which is oldish. I can replace it before
long with an 11-stable instance if that's appropriate.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-10-01 01:40:06
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> On 9/20/19 6:17 PM, Tom Lane wrote:
>> Dromedary is running the last release of macOS that supports 32-bit
>> hardware, so if we decide to kick that to the curb, I'd either shut
>> down the box or put some newer Linux or BSD variant on it.

> Well, nightjar is on FBSD 9.0 which is oldish. I can replace it before
> long with an 11-stable instance if that's appropriate.

FYI, I've installed FreeBSD 12.0/i386 on that machine and it's
now running buildfarm member florican, using clang with -msse2
(a configuration we had no buildfarm coverage of before, AFAIK).

I can still boot the macOS installation if anyone is interested
in specific tests in that environment, but I don't intend to run
dromedary on a regular basis anymore.

regards, tom lane