Re: [External]Re: Postgres Crash Issue

Lists: pgsql-general
From: "Sankar, Uma (Uma)" <usankar(at)avaya(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Cc: "Menon, Deepak (Deepak)" <menond(at)avaya(dot)com>
Subject: Postgres Crash Issue
Date: 2022-03-15 07:20:57
Message-ID: PH0PR15MB484717889C3D8F193D1CFA02AA109@PH0PR15MB4847.namprd15.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general

Hi All,

We have a database crash issue last Friday and it's a patroni HA-based Postgres database running, we have checked the pg logs and it shows error as shared memory corruptions. Can someone please check the attached logs and share your suggestions and feedback on the crash issues. Thanks in advance.

Regards,
Uma Sankar

Attachment Content-Type Size
postgres-Fri_11_03.log application/octet-stream 1.0 MB
postgres-Thu_10_03.log application/octet-stream 1.0 MB
postgres-Tue_08_03.log application/octet-stream 63.6 KB
postgres-Wed_09_03.log application/octet-stream 22.8 KB

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: "Sankar, Uma (Uma)" <usankar(at)avaya(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, "Menon, Deepak (Deepak)" <menond(at)avaya(dot)com>
Subject: Re: Postgres Crash Issue
Date: 2022-03-15 13:27:47
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general

On Tue, Mar 15, 2022 at 07:20:57AM +0000, Sankar, Uma (Uma) wrote:
> Hi All,
>
> We have a database crash issue last Friday and it's a patroni HA-based
> Postgres database running, we have checked the pg logs and it shows
> error as shared memory corruptions. Can someone please check the
> attached logs and share your suggestions and feedback on the crash
> issues. Thanks in advance.

It wasn't crash.

Logs show:

#v+
2022-03-11 08:58:42.956 UTC [27799] LOG: received immediate shutdown request
2022-03-11 08:58:42.956 UTC [17115] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.956 UTC [17115] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-03-11 08:58:42.957 UTC [14618] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [14618] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.957 UTC [14618] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-03-11 08:58:42.957 UTC [16396] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [16396] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.957 UTC [16396] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-03-11 08:58:42.957 UTC [12873] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [12873] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.957 UTC [12873] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-03-11 08:58:42.958 UTC [16564] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.958 UTC [16564] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.958 UTC [16564] HINT: In a moment you should be able to reconnect to the database and repeat your command.
#v-

and

#v+
2022-03-11 08:58:45.969 UTC [17354] LOG: consistent recovery state reached at A/E1836620
2022-03-11 08:58:45.969 UTC [17347] LOG: database system is ready to accept read only connections
2022-03-11 08:59:01.234 UTC [17347] LOG: received fast shutdown request
2022-03-11 08:59:01.234 UTC [17347] LOG: aborting any active transactions
2022-03-11 08:59:01.234 UTC [17412] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.234 UTC [17415] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.234 UTC [17393] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.235 UTC [17399] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.235 UTC [17408] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.236 UTC [17394] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.236 UTC [17389] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.237 UTC [17400] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.237 UTC [17378] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.238 UTC [17362] LOG: shutting down
2022-03-11 08:59:01.254 UTC [17347] LOG: database system is shut down
2022-03-11 08:59:01.419 UTC [17444] LOG: database system was shut down in recovery at 2022-03-11 08:59:01 UTC
2022-03-11 08:59:01.419 UTC [17444] LOG: entering standby mode
2022-03-11 08:59:01.420 UTC [17444] LOG: redo starts at A/E17CB690
#v-

Something/someone stopped the pg, twice, ~ 20 seconds apart.
What/how/why - can't tell, but "received ... shutdown request" is not
crash. It is someone/something doing "pg_ctl stop"

depesz


From: "Menon, Deepak (Deepak)" <menond(at)avaya(dot)com>
To: "depesz(at)depesz(dot)com" <depesz(at)depesz(dot)com>, "Sankar, Uma (Uma) **CTR**" <usankar(at)avaya(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: RE: [External]Re: Postgres Crash Issue
Date: 2022-03-15 13:30:25
Message-ID: CO1PR15MB5020FB13294468079C705A5ED8109@CO1PR15MB5020.namprd15.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general

Hi Depesz,

Then what does this message mean ?

2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond(at)avaya(dot)com
Leave Alert :

-----Original Message-----
From: depesz(at)depesz(dot)com <depesz(at)depesz(dot)com>
Sent: Tuesday, March 15, 2022 6:58 PM
To: Sankar, Uma (Uma) **CTR** <usankar(at)avaya(dot)com>
Cc: pgsql-general(at)postgresql(dot)org; Menon, Deepak (Deepak) <menond(at)avaya(dot)com>
Subject: [External]Re: Postgres Crash Issue

[External Sender]

On Tue, Mar 15, 2022 at 07:20:57AM +0000, Sankar, Uma (Uma) wrote:
> Hi All,
>
> We have a database crash issue last Friday and it's a patroni HA-based
> Postgres database running, we have checked the pg logs and it shows
> error as shared memory corruptions. Can someone please check the
> attached logs and share your suggestions and feedback on the crash
> issues. Thanks in advance.

It wasn't crash.

Logs show:

#v+
2022-03-11 08:58:42.956 UTC [27799] LOG: received immediate shutdown request
2022-03-11 08:58:42.956 UTC [17115] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.956 UTC [17115] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-03-11 08:58:42.957 UTC [14618] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [14618] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.957 UTC [14618] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-03-11 08:58:42.957 UTC [16396] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [16396] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.957 UTC [16396] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-03-11 08:58:42.957 UTC [12873] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [12873] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.957 UTC [12873] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-03-11 08:58:42.958 UTC [16564] WARNING: terminating connection because of crash of another server process
2022-03-11 08:58:42.958 UTC [16564] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-03-11 08:58:42.958 UTC [16564] HINT: In a moment you should be able to reconnect to the database and repeat your command.
#v-

and

#v+
2022-03-11 08:58:45.969 UTC [17354] LOG: consistent recovery state reached at A/E1836620
2022-03-11 08:58:45.969 UTC [17347] LOG: database system is ready to accept read only connections
2022-03-11 08:59:01.234 UTC [17347] LOG: received fast shutdown request
2022-03-11 08:59:01.234 UTC [17347] LOG: aborting any active transactions
2022-03-11 08:59:01.234 UTC [17412] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.234 UTC [17415] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.234 UTC [17393] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.235 UTC [17399] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.235 UTC [17408] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.236 UTC [17394] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.236 UTC [17389] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.237 UTC [17400] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.237 UTC [17378] FATAL: terminating connection due to administrator command
2022-03-11 08:59:01.238 UTC [17362] LOG: shutting down
2022-03-11 08:59:01.254 UTC [17347] LOG: database system is shut down
2022-03-11 08:59:01.419 UTC [17444] LOG: database system was shut down in recovery at 2022-03-11 08:59:01 UTC
2022-03-11 08:59:01.419 UTC [17444] LOG: entering standby mode
2022-03-11 08:59:01.420 UTC [17444] LOG: redo starts at A/E17CB690
#v-

Something/someone stopped the pg, twice, ~ 20 seconds apart.
What/how/why - can't tell, but "received ... shutdown request" is not crash. It is someone/something doing "pg_ctl stop"

depesz


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: "Menon, Deepak (Deepak)" <menond(at)avaya(dot)com>
Cc: "Sankar, Uma (Uma) **CTR**" <usankar(at)avaya(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [External]Re: Postgres Crash Issue
Date: 2022-03-15 13:32:13
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general

On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote:
> Hi Depesz,
>
> Then what does this message mean ?
>
> 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

When someone issues pg_ctl -m immediate stop (which is the immediate
shutdown). pg basically shutd backends in their head. This, in turn,
means that all other have to exit to avoid shared memory corruption.

Sometimes such exit happens before pg_ctl will kill next backend,
sometimes not. So you sometimes can get such messages.

depesz


From: "Menon, Deepak (Deepak)" <menond(at)avaya(dot)com>
To: "depesz(at)depesz(dot)com" <depesz(at)depesz(dot)com>
Cc: "Sankar, Uma (Uma) **CTR**" <usankar(at)avaya(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: RE: [External]Re: Postgres Crash Issue
Date: 2022-03-15 13:38:04
Message-ID: CO1PR15MB502027E5C91F1E1677D374E8D8109@CO1PR15MB5020.namprd15.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general

Thanks Depesz. Is there anyway to check the source of the command as this HA setup is managed by using Patroni

Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond(at)avaya(dot)com
Leave Alert :

-----Original Message-----
From: depesz(at)depesz(dot)com <depesz(at)depesz(dot)com>
Sent: Tuesday, March 15, 2022 7:02 PM
To: Menon, Deepak (Deepak) <menond(at)avaya(dot)com>
Cc: Sankar, Uma (Uma) **CTR** <usankar(at)avaya(dot)com>; pgsql-general(at)postgresql(dot)org
Subject: Re: [External]Re: Postgres Crash Issue

On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote:
> Hi Depesz,
>
> Then what does this message mean ?
>
> 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

When someone issues pg_ctl -m immediate stop (which is the immediate shutdown). pg basically shutd backends in their head. This, in turn, means that all other have to exit to avoid shared memory corruption.

Sometimes such exit happens before pg_ctl will kill next backend, sometimes not. So you sometimes can get such messages.

depesz


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: "Menon, Deepak (Deepak)" <menond(at)avaya(dot)com>
Cc: "Sankar, Uma (Uma) **CTR**" <usankar(at)avaya(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [External]Re: Postgres Crash Issue
Date: 2022-03-15 13:42:00
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general

On Tue, Mar 15, 2022 at 01:38:04PM +0000, Menon, Deepak (Deepak) wrote:
> Thanks Depesz. Is there anyway to check the source of the command as this HA setup is managed by using Patroni

From pg perspective it got immediate stop request. There is no way to
tell why/how.

depesz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: depesz(at)depesz(dot)com
Cc: "Menon, Deepak (Deepak)" <menond(at)avaya(dot)com>, "Sankar, Uma (Uma) **CTR**" <usankar(at)avaya(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [External]Re: Postgres Crash Issue
Date: 2022-03-15 14:17:46
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general

hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote:
>> Then what does this message mean ?
>>
>> 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

> When someone issues pg_ctl -m immediate stop (which is the immediate
> shutdown). pg basically shutd backends in their head. This, in turn,
> means that all other have to exit to avoid shared memory corruption.

FWIW, v14 and later produce more on-point error reporting in this
case. The code path used to be the same as for a backend crash,
but now clients see something like

WARNING: terminating connection due to immediate shutdown command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

and the server log isn't spammed with copies of that, it just
looks like

2022-03-15 10:07:48.241 EDT [315456] LOG: received immediate shutdown request
2022-03-15 10:07:48.244 EDT [315456] LOG: database system is shut down

None of this gets us any closer to understanding where the signal
came from, unfortunately.

regards, tom lane


From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: "Menon, Deepak (Deepak)" <menond(at)avaya(dot)com>, "depesz(at)depesz(dot)com" <depesz(at)depesz(dot)com>, "Sankar, Uma (Uma) **CTR**" <usankar(at)avaya(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [External]Re: Postgres Crash Issue
Date: 2022-03-15 17:40:49
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general

On 3/15/22 6:30 AM, Menon, Deepak (Deepak) wrote:
> Hi Depesz,
>
> Then what does this message mean ?

From here, assuming someone/thing did pg_ctl -m immediate stop:

https://www.postgresql.org/docs/current/app-pg-ctl.html

"“Immediate” mode will abort all server processes immediately, without a
clean shutdown. This choice will lead to a crash-recovery cycle during
the next server start."

>
> 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
>
>
> Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond(at)avaya(dot)com
> Leave Alert :

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com