Lists: | pgsql-hackers |
---|
From: | "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | FSM doesn't recover after zeroing damaged page. |
Date: | 2025-02-07 00:15:17 |
Message-ID: | a61efc0b-9cfc-4f24-ac5d-ea6600d9ccbf@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi!
At the current master i found that if not the last page of
the FSM bottom layer was corrupted it is not restored after zeroing.
Here is reproduction like that:
1) Create a table with FSM of 4 pages:
create table t (int) as select * from generate_series(1, 1E6);
delete from t where ctid in (select ctid from t tablesample bernoulli (20));
SELECT pg_relation_filepath('t'); -- to know the filename with FSM
vacuum t;
2) Do checkpoint and stop the server.
3) Corrupt a byte in the third page. For instance, the lower byte of the CRC:
printf '\xAA' | dd of=/usr/local/pg12252-vanm/data/base/5/<filename_fsm> bs=1 seek=$((2*8192+8)) count=1 conv=notrunc
4) start server and execute: vacuum t; twice: to ensure that corrupted page
is fixed in memory, zeroed and a new header was written on it.
postgres=# vacuum t;
WARNING: page verification failed, calculated checksum 13869 but expected 13994
WARNING: invalid page in block 2 of relation base/5/16384_fsm; zeroing out page
VACUUM
postgres=# vacuum t; -- without warnings
VACUUM
5) Do checkpoint and restart the server. After vacuum t; the warnings appeared again:
postgres=# vacuum t;
WARNING: page verification failed, calculated checksum 13869 but expected 13994
WARNING: invalid page in block 2 of relation base/5/16384_fsm; zeroing out page
VACUUM
I noticed that the updated page is not written to disk because the
buffer where it is located is not marked dirty. Moreover MarkBufferDirtyHint(),
which is called for modified FSM pages, seems is not suitable here,
since as i suppose the corrupted page must be rewritten certainly, not for hint.
Therefore, maybe mark it dirty immediately after writing the new header?
Here is a small patch that does it and eliminates multiple warnings.
Would be glad if you take a look on it.
With the best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-recovering-damaged-FSM-pages.patch | text/x-patch | 906 bytes |
From: | Stepan Neretin <slpmcf(at)gmail(dot)com> |
---|---|
To: | "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: FSM doesn't recover after zeroing damaged page. |
Date: | 2025-03-10 07:58:24 |
Message-ID: | CA+Yyo5R-5A3+dWsS651vEKuGeTW2i5OMMYCfpS2=W8q4s_spng@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Feb 7, 2025 at 7:15 AM Anton A. Melnikov <a(dot)melnikov(at)postgrespro(dot)ru>
wrote:
> Hi!
>
> At the current master i found that if not the last page of
> the FSM bottom layer was corrupted it is not restored after zeroing.
>
> Here is reproduction like that:
> 1) Create a table with FSM of 4 pages:
> create table t (int) as select * from generate_series(1, 1E6);
> delete from t where ctid in (select ctid from t tablesample bernoulli
> (20));
> SELECT pg_relation_filepath('t'); -- to know the filename with FSM
> vacuum t;
>
> 2) Do checkpoint and stop the server.
>
> 3) Corrupt a byte in the third page. For instance, the lower byte of the
> CRC:
> printf '\xAA' | dd of=/usr/local/pg12252-vanm/data/base/5/<filename_fsm>
> bs=1 seek=$((2*8192+8)) count=1 conv=notrunc
>
> 4) start server and execute: vacuum t; twice: to ensure that corrupted page
> is fixed in memory, zeroed and a new header was written on it.
>
> postgres=# vacuum t;
> WARNING: page verification failed, calculated checksum 13869 but expected
> 13994
> WARNING: invalid page in block 2 of relation base/5/16384_fsm; zeroing
> out page
> VACUUM
> postgres=# vacuum t; -- without warnings
> VACUUM
>
> 5) Do checkpoint and restart the server. After vacuum t; the warnings
> appeared again:
> postgres=# vacuum t;
> WARNING: page verification failed, calculated checksum 13869 but expected
> 13994
> WARNING: invalid page in block 2 of relation base/5/16384_fsm; zeroing
> out page
> VACUUM
>
> I noticed that the updated page is not written to disk because the
> buffer where it is located is not marked dirty. Moreover
> MarkBufferDirtyHint(),
> which is called for modified FSM pages, seems is not suitable here,
> since as i suppose the corrupted page must be rewritten certainly, not for
> hint.
> Therefore, maybe mark it dirty immediately after writing the new header?
> Here is a small patch that does it and eliminates multiple warnings.
> Would be glad if you take a look on it.
>
> With the best regards,
>
> --
> Anton A. Melnikov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
Hi, I reproduce the problem step by step. Patch fixes this problem. Looks
good to me.
Best Regards, Stepan Neretin.
From: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
---|---|
To: | "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: FSM doesn't recover after zeroing damaged page. |
Date: | 2025-03-10 13:56:44 |
Message-ID: | CALdSSPjg0hBUvPASkVt799z7+JOYdcE0j_WznbxjVB0H05M+qw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi!
On Fri, 7 Feb 2025 at 05:15, Anton A. Melnikov
<a(dot)melnikov(at)postgrespro(dot)ru> wrote:
>
> Hi!
> I noticed that the updated page is not written to disk because the
> buffer where it is located is not marked dirty. Moreover MarkBufferDirtyHint(),
> which is called for modified FSM pages, seems is not suitable here,
> since as i suppose the corrupted page must be rewritten certainly, not for hint.
Could you please elaborate? FSM changes are never wal-logged, so it is
possible to read torn pages from a disk (their checksum will mismatch
with the header), so
MarkBufferDirtyHint seems to be completely fine here. I don't think
MarkBufferDirty provides something different from MarkBufferDirtyHint
in the FSM case (because FSM changed are not persistent). At the end
of the day, you just should write a page on a disk sooner or later,
and that's it.
--
Best regards,
Kirill Reshke
From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: FSM doesn't recover after zeroing damaged page. |
Date: | 2025-03-16 13:03:07 |
Message-ID: | CALDaNm2gYZxHK=aG7r8t7K=Wq3ewB3LV5Lh-X89Lms-b19dojA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, 7 Feb 2025 at 05:45, Anton A. Melnikov
<a(dot)melnikov(at)postgrespro(dot)ru> wrote:
>
> Here is a small patch that does it and eliminates multiple warnings.
> Would be glad if you take a look on it.
I noticed that Kirill's comments from [1] are not yet addressed, I
have changed the status of commitfest entry to Waiting on Author,
kindly address them and update the status to Needs review.
[1] - https://www.postgresql.org/message-id/CALdSSPjg0hBUvPASkVt799z7%2BJOYdcE0j_WznbxjVB0H05M%2Bqw%40mail.gmail.com
Regards,
Vignesh
From: | "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: FSM doesn't recover after zeroing damaged page. |
Date: | 2025-03-24 08:36:22 |
Message-ID: | 745f09fd-7c4e-4525-8cd9-4aff582c96e7@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi!
On 16.03.2025 16:03, vignesh C wrote:
> I noticed that Kirill's comments from [1] are not yet addressed, I
> have changed the status of commitfest entry to Waiting on Author,
> kindly address them and update the status to Needs review.
> [1] - https://www.postgresql.org/message-id/CALdSSPjg0hBUvPASkVt799z7%2BJOYdcE0j_WznbxjVB0H05M%2Bqw%40mail.gmail.com
Thanks for reminding me!
On 10.03.2025 16:56, Kirill Reshke wrote:
>> since as i suppose the corrupted page must be rewritten certainly, not for hint.
>
>
> Could you please elaborate? FSM changes are never wal-logged, so it is
> possible to read torn pages from a disk (their checksum will mismatch
> with the header), so
> MarkBufferDirtyHint seems to be completely fine here. I don't think
> MarkBufferDirty provides something different from MarkBufferDirtyHint
> in the FSM case (because FSM changed are not persistent).
Sorry for the long delay in replying.
The problem turned out to be not so simple as i firstly thought.
If we break down the reproduction of the issue from the first email [1]
from the perspective of a database user, the following is occurring:
1) He sees a message about a corrupted page and that it has been zeroed out.
This means the issue with torn page is resolved.
> At the end
> of the day, you just should write a page on a disk sooner or later,
> and that's it.
2) This is exactly what happens after a checkpoint is executed.
Now the user is confident that the page is in a normal state on disk.
3) However, after a server restart, he sees that the same page is corrupted again.
This means the page was not saved to disk. And so on infinitely.
IMO, this is annoying and very much like a bug.
That said, the fix initially proposed seems incorrect and overly crude to me,
as this behavior does not occur with every FSM page but only under specific conditions.
E. g., the error will not recur if it was the last incomplete FSM page.
I think firstly it is necessary to understand the reasons for this difference in behavior.
So i plan to dig deeper into the FSM algorithm and come up with a more targeted fix.
With the best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company