Lists: | pgsql-hackers |
---|
From: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-15 01:51:33 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
I found that the consume_xids function incorrectly advances XIDs as
shown:
postgres=# select txid_current();
txid_current
--------------
746
(1 row)
postgres=# select consume_xids('100');
consume_xids
--------------
847
(1 row)
In the example, the consume_xids function consumes 100 XIDs when XID =
746, so the desired outcome from consume_xids should be 746 + 100 = 846,
which differs from the actual outcome, 847.
Behavior inside a transaction block:
postgres=# select txid_current();
txid_current
--------------
1410
(1 row)
postgres=# begin;
BEGIN
postgres=*# select consume_xids('100');
consume_xids
--------------
1511
(1 row)
postgres=*# select consume_xids('100');
consume_xids
--------------
1521
(1 row)
Here, the first call inside the transaction block consumes 100+1 XIDs
(incorrect), while the second call consumes exactly 100 XIDs (as
expected)
Summary:
The function performs incorrectly when:
- Outside of a transaction block
- The first call inside a transaction block
But works correctly when:
- After the second call inside a transaction block
The issue arises because consume_xids does not correctly count the
consumed XIDs when it calls the GetTopTransactionId function, which
allocates a new XID when none has been assigned.
With the attached patch, the consume_xids works as expected, both inside
and outside of transaction blocks as shown below:
postgres=# select txid_current();
txid_current
--------------
1546
(1 row)
postgres=# select consume_xids('100');
consume_xids
--------------
1646
(1 row)
postgres=# begin;
BEGIN
postgres=*# select consume_xids('100');
consume_xids
--------------
1746
(1 row)
postgres=*# select consume_xids('100');
consume_xids
--------------
1846
(1 row)
Regards,
Yushi
Attachment | Content-Type | Size |
---|---|---|
skip_id_correctly.diff | text/x-diff | 909 bytes |
From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-15 22:26:48 |
Message-ID: | CAD21AoATuDuqSr=V3vCSbHbOg6g9zKL61EEbe4ZPG3xOUa0Pcw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
Thank you for the report.
On Mon, Oct 14, 2024 at 6:51 PM Yushi Ogiwara
<btogiwarayuushi(at)oss(dot)nttdata(dot)com> wrote:
>
> Hi,
>
> I found that the consume_xids function incorrectly advances XIDs as
> shown:
>
> postgres=# select txid_current();
> txid_current
> --------------
> 746
> (1 row)
>
> postgres=# select consume_xids('100');
> consume_xids
> --------------
> 847
> (1 row)
>
> In the example, the consume_xids function consumes 100 XIDs when XID =
> 746, so the desired outcome from consume_xids should be 746 + 100 = 846,
> which differs from the actual outcome, 847.
>
> Behavior inside a transaction block:
>
> postgres=# select txid_current();
> txid_current
> --------------
> 1410
> (1 row)
>
> postgres=# begin;
> BEGIN
> postgres=*# select consume_xids('100');
> consume_xids
> --------------
> 1511
> (1 row)
> postgres=*# select consume_xids('100');
> consume_xids
> --------------
> 1521
> (1 row)
>
> Here, the first call inside the transaction block consumes 100+1 XIDs
> (incorrect), while the second call consumes exactly 100 XIDs (as
> expected)
>
> Summary:
>
> The function performs incorrectly when:
> - Outside of a transaction block
> - The first call inside a transaction block
> But works correctly when:
> - After the second call inside a transaction block
>
> The issue arises because consume_xids does not correctly count the
> consumed XIDs when it calls the GetTopTransactionId function, which
> allocates a new XID when none has been assigned.
I agree with your analysis.
I have one comment on the patch:
- (void) GetTopTransactionId();
+ if(!FullTransactionIdIsValid(GetTopFullTransactionIdIfAny()))
+ {
+ (void) GetTopTransactionId();
+ consumed++;
+ }
If we count this case as consumed too, I think we need to set the
returned value of GetTopTranasctionId() to lastxid. Because otherwise,
the return value when passing 1 to the function would be the latest
XID at the time but not the last XID consumed by the function. Passing
1 to this function is very unrealistic case, though.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-16 05:06:14 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
Thank you for your comment.
Regarding the first patch, I believe it works correctly when
consume_xids(1) is called. This is because the lastxid variable in the
consume_xids_common function is initialized as lastxid =
ReadNextFullTransactionId(), where the ReadNextFullTransactionId
function returns the (current XID) + 1.
Separately, I found that consume_xids(0) does not behave as expected.
Below is an example:
postgres=# select txid_current();
txid_current
--------------
45496
(1 row)
postgres=# select consume_xids(0);
consume_xids
--------------
45497
(1 row)
postgres=# select consume_xids(0);
consume_xids
--------------
45497
(1 row)
In the example, the argument to consume_xids is 0, meaning it should not
consume any XIDs. However, the first invocation of consume_xids(0) looks
like unexpectedly consuming 1 XID though it's not consuming actually.
This happens because consume_xids(0) returns the value from
ReadNextFullTransactionId.
I have updated the patch (skip.diff, attached to this e-mail) to address
this issue. Now, when consume_xids(0) is called, it returns
ReadNextFullTransactionId().value - 1, ensuring no XID is consumed as
shown below:
postgres=# select txid_current();
txid_current
--------------
45498
(1 row)
postgres=# select consume_xids(0);
consume_xids
--------------
45498
(1 row)
Regards,
Yushi
On 2024-10-16 07:26, Masahiko Sawada wrote:
> Hi,
>
> Thank you for the report.
>
> On Mon, Oct 14, 2024 at 6:51 PM Yushi Ogiwara
> <btogiwarayuushi(at)oss(dot)nttdata(dot)com> wrote:
>>
>> Hi,
>>
>> I found that the consume_xids function incorrectly advances XIDs as
>> shown:
>>
>> postgres=# select txid_current();
>> txid_current
>> --------------
>> 746
>> (1 row)
>>
>> postgres=# select consume_xids('100');
>> consume_xids
>> --------------
>> 847
>> (1 row)
>>
>> In the example, the consume_xids function consumes 100 XIDs when XID =
>> 746, so the desired outcome from consume_xids should be 746 + 100 =
>> 846,
>> which differs from the actual outcome, 847.
>>
>> Behavior inside a transaction block:
>>
>> postgres=# select txid_current();
>> txid_current
>> --------------
>> 1410
>> (1 row)
>>
>> postgres=# begin;
>> BEGIN
>> postgres=*# select consume_xids('100');
>> consume_xids
>> --------------
>> 1511
>> (1 row)
>> postgres=*# select consume_xids('100');
>> consume_xids
>> --------------
>> 1521
>> (1 row)
>>
>> Here, the first call inside the transaction block consumes 100+1 XIDs
>> (incorrect), while the second call consumes exactly 100 XIDs (as
>> expected)
>>
>> Summary:
>>
>> The function performs incorrectly when:
>> - Outside of a transaction block
>> - The first call inside a transaction block
>> But works correctly when:
>> - After the second call inside a transaction block
>>
>> The issue arises because consume_xids does not correctly count the
>> consumed XIDs when it calls the GetTopTransactionId function, which
>> allocates a new XID when none has been assigned.
>
> I agree with your analysis.
>
> I have one comment on the patch:
>
> - (void) GetTopTransactionId();
> + if(!FullTransactionIdIsValid(GetTopFullTransactionIdIfAny()))
> + {
> + (void) GetTopTransactionId();
> + consumed++;
> + }
>
> If we count this case as consumed too, I think we need to set the
> returned value of GetTopTranasctionId() to lastxid. Because otherwise,
> the return value when passing 1 to the function would be the latest
> XID at the time but not the last XID consumed by the function. Passing
> 1 to this function is very unrealistic case, though.
>
> Regards,
Attachment | Content-Type | Size |
---|---|---|
skip.diff | text/x-diff | 946 bytes |
From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-16 17:58:50 |
Message-ID: | CAD21AoAr92L0Yd-vF908He4B94F0W6Y1XC2ZUpm3WWoCLvONBw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Oct 15, 2024 at 10:06 PM Yushi Ogiwara
<btogiwarayuushi(at)oss(dot)nttdata(dot)com> wrote:
>
> Hi,
>
> Thank you for your comment.
>
> Regarding the first patch, I believe it works correctly when
> consume_xids(1) is called. This is because the lastxid variable in the
> consume_xids_common function is initialized as lastxid =
> ReadNextFullTransactionId(), where the ReadNextFullTransactionId
> function returns the (current XID) + 1.
But it's possible that concurrent transactions consume XIDs in meanwhile, no?
>
> Separately, I found that consume_xids(0) does not behave as expected.
> Below is an example:
>
> postgres=# select txid_current();
> txid_current
> --------------
> 45496
> (1 row)
>
> postgres=# select consume_xids(0);
> consume_xids
> --------------
> 45497
> (1 row)
>
> postgres=# select consume_xids(0);
> consume_xids
> --------------
> 45497
> (1 row)
>
> In the example, the argument to consume_xids is 0, meaning it should not
> consume any XIDs. However, the first invocation of consume_xids(0) looks
> like unexpectedly consuming 1 XID though it's not consuming actually.
> This happens because consume_xids(0) returns the value from
> ReadNextFullTransactionId.
>
> I have updated the patch (skip.diff, attached to this e-mail) to address
> this issue. Now, when consume_xids(0) is called, it returns
> ReadNextFullTransactionId().value - 1, ensuring no XID is consumed as
> shown below:
>
> postgres=# select txid_current();
> txid_current
> --------------
> 45498
> (1 row)
>
> postgres=# select consume_xids(0);
> consume_xids
> --------------
> 45498
> (1 row)
Hmm, I think if we expect this function to return the last XID that
the function actually consumed, calling consume_xids with 0 should
raise an error instead. Even if it returns
ReadNextFullTransactionId().value - 1 as you proposed, other
concurrent transactions might consume XIDs between txid_current() and
consume_xids(0), resulting in consume_xids() appearing to have
consumed XIDs.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-18 05:57:36 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
Thank you for your suggestion. Let me organize the discussion as
follows:
When I created the patch, I did not account for situations where other
transactions concurrently consume XIDs. If we establish the rule that
"consume_xids should not return an XID that is consumed by any other
transaction," I believe it will simplify the discussion.
Case: consume_xids(1)
Let's consider a scenario where transaction 1 (t1) and transaction 2
(t2) run concurrently. t1 calls txid_current() and consume_xids(1),
while t2 calls txid_current().
The schedule above illustates when t1 and t2 run concurrently. The "next
XID" column shows how the TransamVariables->nextXid global variable
changes over time. The /**/ blocks shows the value of each variable and
outcome of a called function.
In my previous patch, the schedule above could occur, violating the rule
because consume_xids does not update lastxid when GetTopTransactionId is
called.
If your suggestion is applied, consume_xids would no longer violate the
rule, as it would update lastxid when generating a new XID.
Case: consume_xids(0)
Now, consider the case where transaction 1 consumes 0 XIDs.
In this example, both t1 and t2 consume XID = 20, which violates the
rule.
Using ReadNextFullTransactionId() - 1 also does not resolve the issue,
as demonstrated above. This is because consume_xids(0) does not allocate
any XIDs.
In conclusion, I agree that:
* Update lastxid with GetTopTransactionId().
* consume_xids with 0 should raise an error.
I have attached a new patch that incorporates your suggestions.
Regards,
Yushi
On 2024-10-17 02:58, Masahiko Sawada wrote:
> On Tue, Oct 15, 2024 at 10:06 PM Yushi Ogiwara
> <btogiwarayuushi(at)oss(dot)nttdata(dot)com> wrote:
>
>> Hi,
>>
>> Thank you for your comment.
>>
>> Regarding the first patch, I believe it works correctly when
>> consume_xids(1) is called. This is because the lastxid variable in
>> the
>> consume_xids_common function is initialized as lastxid =
>> ReadNextFullTransactionId(), where the ReadNextFullTransactionId
>> function returns the (current XID) + 1.
>
> But it's possible that concurrent transactions consume XIDs in
> meanwhile, no?
>
>> Separately, I found that consume_xids(0) does not behave as expected.
>> Below is an example:
>>
>> postgres=# select txid_current();
>> txid_current
>> --------------
>> 45496
>> (1 row)
>>
>> postgres=# select consume_xids(0);
>> consume_xids
>> --------------
>> 45497
>> (1 row)
>>
>> postgres=# select consume_xids(0);
>> consume_xids
>> --------------
>> 45497
>> (1 row)
>>
>> In the example, the argument to consume_xids is 0, meaning it should
>> not
>> consume any XIDs. However, the first invocation of consume_xids(0)
>> looks
>> like unexpectedly consuming 1 XID though it's not consuming actually.
>> This happens because consume_xids(0) returns the value from
>> ReadNextFullTransactionId.
>>
>> I have updated the patch (skip.diff, attached to this e-mail) to
>> address
>> this issue. Now, when consume_xids(0) is called, it returns
>> ReadNextFullTransactionId().value - 1, ensuring no XID is consumed as
>> shown below:
>>
>> postgres=# select txid_current();
>> txid_current
>> --------------
>> 45498
>> (1 row)
>>
>> postgres=# select consume_xids(0);
>> consume_xids
>> --------------
>> 45498
>> (1 row)
>
> Hmm, I think if we expect this function to return the last XID that
> the function actually consumed, calling consume_xids with 0 should
> raise an error instead. Even if it returns
> ReadNextFullTransactionId().value - 1 as you proposed, other
> concurrent transactions might consume XIDs between txid_current() and
> consume_xids(0), resulting in consume_xids() appearing to have
> consumed XIDs.
>
> Regards,
Attachment | Content-Type | Size |
---|---|---|
consume_xid_fix.diff | text/x-diff | 975 bytes |
From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-23 06:47:28 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2024/10/18 14:57, Yushi Ogiwara wrote:
> In conclusion, I agree that:
>
> * Update lastxid with GetTopTransactionId().
> * consume_xids with 0 should raise an error.
>
> I have attached a new patch that incorporates your suggestions.
One concern in this discussion is that the return value of this function isn't
entirely clear. To address this, how about adding a comment at the beginning of
consume_xids() like: "Returns the last XID assigned by consume_xids()"?
* the cache overflows, but beyond that, we don't keep track of the
* consumed XIDs.
*/
- (void) GetTopTransactionId();
+ if(!FullTransactionIdIsValid(GetTopFullTransactionIdIfAny()))
+ {
+ lastxid = GetTopTransactionId();
+ consumed++;
+ }
How about appending the following to the end of the first paragraph in
the above source comments?
If no top-level XID is assigned, a new one is obtained,
and the consumed XID counter is incremented.
if (xids_left > 2000 &&
consumed - last_reported_at < REPORT_INTERVAL &&
MyProc->subxidStatus.overflowed)
{
int64 consumed_by_shortcut = consume_xids_shortcut();
if (consumed_by_shortcut > 0)
{
consumed += consumed_by_shortcut;
continue;
}
}
By the way, this isn't directly related to the proposed patch, but while reading
the xid_wraparound code, I noticed that consume_xids_common() could potentially
return an unexpected XID if consume_xids_shortcut() returns a value greater
than 2000. Based on the current logic, consume_xids_common() should always return
a value less than 2000, so the issue I'm concerned about shouldn't occur. Still,
would it be worth adding an assertion to ensure that consume_xids_common() never
returns a value greater than 2000?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-23 20:23:43 |
Message-ID: | CAD21AoD8wJ0OiQbq=pc-TUCkEm1ohu6Et8G-Ntc8TFmPk3TuTA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Oct 22, 2024 at 11:47 PM Fujii Masao
<masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2024/10/18 14:57, Yushi Ogiwara wrote:
> > In conclusion, I agree that:
> >
> > * Update lastxid with GetTopTransactionId().
> > * consume_xids with 0 should raise an error.
> >
> > I have attached a new patch that incorporates your suggestions.
>
> One concern in this discussion is that the return value of this function isn't
> entirely clear. To address this, how about adding a comment at the beginning of
> consume_xids() like: "Returns the last XID assigned by consume_xids()"?
Agreed. That value is what I expected this function to return.
>
>
> * the cache overflows, but beyond that, we don't keep track of the
> * consumed XIDs.
> */
> - (void) GetTopTransactionId();
> + if(!FullTransactionIdIsValid(GetTopFullTransactionIdIfAny()))
> + {
> + lastxid = GetTopTransactionId();
> + consumed++;
> + }
>
> How about appending the following to the end of the first paragraph in
> the above source comments?
>
> If no top-level XID is assigned, a new one is obtained,
> and the consumed XID counter is incremented.
Agreed.
>
>
>
> if (xids_left > 2000 &&
> consumed - last_reported_at < REPORT_INTERVAL &&
> MyProc->subxidStatus.overflowed)
> {
> int64 consumed_by_shortcut = consume_xids_shortcut();
>
> if (consumed_by_shortcut > 0)
> {
> consumed += consumed_by_shortcut;
> continue;
> }
> }
>
> By the way, this isn't directly related to the proposed patch, but while reading
> the xid_wraparound code, I noticed that consume_xids_common() could potentially
> return an unexpected XID if consume_xids_shortcut() returns a value greater
> than 2000. Based on the current logic, consume_xids_common() should always return
> a value less than 2000, so the issue I'm concerned about shouldn't occur.
Good point. Yes, the function doesn't return a value greater than 2000
as long as we do "distance = Min(distance, COMMIT_TS_XACTS_PER_PAGE -
rem);". But it's true with <= 16KB block sizes.
> Still,
> would it be worth adding an assertion to ensure that consume_xids_common() never
> returns a value greater than 2000?
While adding an assertion makes sense to me, another idea is to set
last_xid even in the shortcut path. That way, it works even with 32KB
block size.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-24 14:55:22 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2024/10/24 5:23, Masahiko Sawada wrote:
>> if (xids_left > 2000 &&
>> consumed - last_reported_at < REPORT_INTERVAL &&
>> MyProc->subxidStatus.overflowed)
>> {
>> int64 consumed_by_shortcut = consume_xids_shortcut();
>>
>> if (consumed_by_shortcut > 0)
>> {
>> consumed += consumed_by_shortcut;
>> continue;
>> }
>> }
>>
>> By the way, this isn't directly related to the proposed patch, but while reading
>> the xid_wraparound code, I noticed that consume_xids_common() could potentially
>> return an unexpected XID if consume_xids_shortcut() returns a value greater
>> than 2000. Based on the current logic, consume_xids_common() should always return
>> a value less than 2000, so the issue I'm concerned about shouldn't occur.
>
> Good point. Yes, the function doesn't return a value greater than 2000
> as long as we do "distance = Min(distance, COMMIT_TS_XACTS_PER_PAGE -
> rem);". But it's true with <= 16KB block sizes.
>
>> Still,
>> would it be worth adding an assertion to ensure that consume_xids_common() never
>> returns a value greater than 2000?
>
> While adding an assertion makes sense to me, another idea is to set
> last_xid even in the shortcut path. That way, it works even with 32KB
> block size.
Agreed on making xid_wraparound compatible with a 32k block size. As you pointed out,
with 32k blocks, XidSkip() can return values greater than 2000. So, the first step is
to adjust the following if-condition by increasing "2000" to a higher value.
Since XidSkip() can return up to 3276 (based on COMMIT_TS_XACTS_PER_PAGE (BLCKSZ / 10) with 32k blocks),
we could, for instance, update "2000" to "4000."
if (xids_left > 2000 &&
consumed - last_reported_at < REPORT_INTERVAL &&
MyProc->subxidStatus.overflowed)
To ensure lastxid is set even in the shortcut case, what about modifying XidSkip()
so it returns "distance - 1" instead of "distance"? This change would make
consume_xids_common() run at least one more loop cycle,
triggering GetNewTransactionId() and setting lastxid correctly.
consumed = XidSkip(nextXid);
if (consumed > 0)
TransamVariables->nextXid.value += (uint64) consumed;
BTW, the code snippet above in consume_xids_shortcut() could potentially set
the next XID to a value below FirstNormalTransactionId? If yes, we should account for
FirstNormalFullTransactionId when increasing the next XID, similar to
how FullTransactionIdAdvance() handles it.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-25 21:31:38 |
Message-ID: | CAD21AoB0uW34gHg23CoS=sQjgf29Fnj_7hF=VaCfcqexeBemMg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Oct 24, 2024 at 7:55 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2024/10/24 5:23, Masahiko Sawada wrote:
> >> if (xids_left > 2000 &&
> >> consumed - last_reported_at < REPORT_INTERVAL &&
> >> MyProc->subxidStatus.overflowed)
> >> {
> >> int64 consumed_by_shortcut = consume_xids_shortcut();
> >>
> >> if (consumed_by_shortcut > 0)
> >> {
> >> consumed += consumed_by_shortcut;
> >> continue;
> >> }
> >> }
> >>
> >> By the way, this isn't directly related to the proposed patch, but while reading
> >> the xid_wraparound code, I noticed that consume_xids_common() could potentially
> >> return an unexpected XID if consume_xids_shortcut() returns a value greater
> >> than 2000. Based on the current logic, consume_xids_common() should always return
> >> a value less than 2000, so the issue I'm concerned about shouldn't occur.
> >
> > Good point. Yes, the function doesn't return a value greater than 2000
> > as long as we do "distance = Min(distance, COMMIT_TS_XACTS_PER_PAGE -
> > rem);". But it's true with <= 16KB block sizes.
> >
> >> Still,
> >> would it be worth adding an assertion to ensure that consume_xids_common() never
> >> returns a value greater than 2000?
> >
> > While adding an assertion makes sense to me, another idea is to set
> > last_xid even in the shortcut path. That way, it works even with 32KB
> > block size.
>
> Agreed on making xid_wraparound compatible with a 32k block size. As you pointed out,
> with 32k blocks, XidSkip() can return values greater than 2000. So, the first step is
> to adjust the following if-condition by increasing "2000" to a higher value.
> Since XidSkip() can return up to 3276 (based on COMMIT_TS_XACTS_PER_PAGE (BLCKSZ / 10) with 32k blocks),
> we could, for instance, update "2000" to "4000."
>
> if (xids_left > 2000 &&
> consumed - last_reported_at < REPORT_INTERVAL &&
> MyProc->subxidStatus.overflowed)
>
>
> To ensure lastxid is set even in the shortcut case, what about modifying XidSkip()
> so it returns "distance - 1" instead of "distance"? This change would make
> consume_xids_common() run at least one more loop cycle,
> triggering GetNewTransactionId() and setting lastxid correctly.
Increasing "2000" to "4000" makes sense to me. I think that with this
change we don't necessarily need to change XidSkip() to return
"distance - 1'. What do you think?
>
>
> consumed = XidSkip(nextXid);
> if (consumed > 0)
> TransamVariables->nextXid.value += (uint64) consumed;
>
> BTW, the code snippet above in consume_xids_shortcut() could potentially set
> the next XID to a value below FirstNormalTransactionId? If yes, we should account for
> FirstNormalFullTransactionId when increasing the next XID, similar to
> how FullTransactionIdAdvance() handles it.
Good catch. I agree with you. We need to do something similar to what
FullTransactionIdAdvance() does so that it does not appear as a
special 32-bit XID.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-29 09:00:24 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
>>
>>
>> consumed = XidSkip(nextXid);
>> if (consumed > 0)
>> TransamVariables->nextXid.value += (uint64) consumed;
>>
>> BTW, the code snippet above in consume_xids_shortcut() could
>> potentially set
>> the next XID to a value below FirstNormalTransactionId? If yes, we
>> should account for
>> FirstNormalFullTransactionId when increasing the next XID, similar to
>> how FullTransactionIdAdvance() handles it.
>
> Good catch. I agree with you. We need to do something similar to what
> FullTransactionIdAdvance() does so that it does not appear as a
> special 32-bit XID.
>
> Regards,
I think this is not the case since XidSkip returns min(UINT32_MAX - 5 -
low, *), which prevents the wrap-around of nextXid.
Regards,
Yushi Ogiwara
From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-29 18:09:12 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2024/10/29 18:00, Yushi Ogiwara wrote:
>>>
>>>
>>> consumed = XidSkip(nextXid);
>>> if (consumed > 0)
>>> TransamVariables->nextXid.value += (uint64) consumed;
>>>
>>> BTW, the code snippet above in consume_xids_shortcut() could potentially set
>>> the next XID to a value below FirstNormalTransactionId? If yes, we should account for
>>> FirstNormalFullTransactionId when increasing the next XID, similar to
>>> how FullTransactionIdAdvance() handles it.
>>
>> Good catch. I agree with you. We need to do something similar to what
>> FullTransactionIdAdvance() does so that it does not appear as a
>> special 32-bit XID.
>>
>> Regards,
>
> I think this is not the case since XidSkip returns min(UINT32_MAX - 5 - low, *), which prevents the wrap-around of nextXid.
Yes, you're right. Thanks for the correction!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-30 07:00:52 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I made a new patch (skip_xid_correctly.diff) that incorporates the
points we discussed:
1. Fix the issue that consume_xids consumes nxids+1 XIDs.
2. Update lastxid when calling GetTopFullTransactionId() to support
nxids==1 case.
3. Forbid consume_xids when nxids==0.
4. Add comments explaining the return values of consume_xids and
consume_xids_until, and the rationale for incrementing consumed++ when
GetTopFullTransactionId() is called.
Also, I made another patch (support_blksz_32k.diff) that supports the
block size == 32K case.
Best,
Yushi Ogiwara
Attachment | Content-Type | Size |
---|---|---|
skip_xid_correctly.diff | text/x-diff | 1.6 KB |
support_blksz_32k.diff | text/x-diff | 1.6 KB |
From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-10-31 18:53:34 |
Message-ID: | CAD21AoCthHcSQ5zeeivNpiz7HMi_FPG-dtwDDNYUx2oKG36bCQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Oct 30, 2024 at 12:00 AM Yushi Ogiwara
<btogiwarayuushi(at)oss(dot)nttdata(dot)com> wrote:
>
> I made a new patch (skip_xid_correctly.diff) that incorporates the
> points we discussed:
>
> 1. Fix the issue that consume_xids consumes nxids+1 XIDs.
> 2. Update lastxid when calling GetTopFullTransactionId() to support
> nxids==1 case.
> 3. Forbid consume_xids when nxids==0.
> 4. Add comments explaining the return values of consume_xids and
> consume_xids_until, and the rationale for incrementing consumed++ when
> GetTopFullTransactionId() is called.
>
> Also, I made another patch (support_blksz_32k.diff) that supports the
> block size == 32K case.
>
Thank you for making patches! Here are review comments.
skip_xid_correctly.diff:
- if (nxids < 0)
+ if (nxids <= 0)
elog(ERROR, "invalid nxids argument: %lld", (long long) nxids);
-
- if (nxids == 0)
- lastxid = ReadNextFullTransactionId();
else
lastxid = consume_xids_common(InvalidFullTransactionId, (uint64) nxids);
After calling elog(ERROR) we don't return here, so the "else" is not
necessary. That is, we can rewrite it to:
if (nxids <= 0)
elog(ERROR, "invalid nxids argument: %lld", (long long) nxids);
lastxid = consume_xids_common(InvalidFullTransactionId, (uint64) nxids);
---
+ *
+ * If no top-level XID is assigned, a new one is obtained,
+ * and the consumed XID counter is incremented.
I'm not sure this comment is appropriate since what we do here is
obvious and the comment doesn't explain why we do that. IMO we don't
need to update these comments.
support_blksz_32k.diff:
- if (xids_left > 2000 &&
+ if (xids_left > COMMIT_TS_XACTS_PER_PAGE &&
I think it's better to just replace 2000 with 4000 and explain why
4000 is sufficient. Probably we can define '#define
XID_SHORTCUT_THRESHOLD 4000" and put an assertion to XidSkip() or
consume_xids_common() to check if the number of consumed XIDs doesn't
exceed XID_SHORTCUT_THRESHOLD.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
---|---|
To: | Yushi Ogiwara <btogiwarayuushi(at)oss(dot)nttdata(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix for consume_xids advancing XIDs incorrectly |
Date: | 2024-12-03 09:28:33 |
Message-ID: | CALdSSPix0UFXTHwxWUXPJkjXJt9diGG6jwJNBSPxk5_5Ai4yEw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, 30 Oct 2024 at 12:01, Yushi Ogiwara
<btogiwarayuushi(at)oss(dot)nttdata(dot)com> wrote:
>
> I made a new patch (skip_xid_correctly.diff) that incorporates the
> points we discussed:
>
> 1. Fix the issue that consume_xids consumes nxids+1 XIDs.
> 2. Update lastxid when calling GetTopFullTransactionId() to support
> nxids==1 case.
> 3. Forbid consume_xids when nxids==0.
> 4. Add comments explaining the return values of consume_xids and
> consume_xids_until, and the rationale for incrementing consumed++ when
> GetTopFullTransactionId() is called.
>
> Also, I made another patch (support_blksz_32k.diff) that supports the
> block size == 32K case.
>
> Best,
> Yushi Ogiwara
>
Hi!
There is review comments that need to be addressed in [1]
Patch status now is waiting on author
--
Best regards,
Kirill Reshke