Re: POC and rebased patch for CSN based snapshots

Lists: pgsql-hackers
From: Movead Li <movead(dot)li(at)highgo(dot)ca>
To: "pgsql-hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: POC and rebased patch for CSN based snapshots
Date: 2020-05-08 12:43:50
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello hackers,

I have read the community mail from 'postgrespro' which the link

below ①, a summary for the patch, it generals a CSN by timestamp

when a transaction is committed and assigns a special value as CSN

for abort transaction, and record them in CSN SLRU file. Now we can

judge if a xid available in a snapshot with a CSN value instead of by

xmin,xmax and xip array so that if we hold CSN as a snapshot which

can be export and import.

CSN may be a correct direction and an important part to implement

distributed of PostgreSQL because it delivers few data among cross-nodes

for snapshot, so the patch is meant to do some research.

We want to implement Clock-SI base on the patch.However the patch

is too old, and I rebase the infrastructure part of the patch to recently

commit(7dc37ccea85).

The origin patch does not support csn alive among database restart

because it will clean csnlog at every time the database restart, it works

well until a prepared transaction occurs due to the csn of prepare

transaction cleaned by a database restart. So I add wal support for

csnlog then csn can alive all the time, and move the csnlog clean work

to auto vacuum.

It comes to another issue, now it can't switch from a xid-base snapshot

to csn-base snapshot if a prepare transaction exists because it can not

find csn for the prepare transaction produced during xid-base snapshot.

To solve it, if the database restart with snapshot change to csn-base I

record an 'xmin_for_csn' where start to check with csn snapshot. 

Some issues known about the current patch:

1. The CSN-snapshot support repeatable read isolation level only, we

should try to support other isolation levels.

2. We can not switch fluently from xid-base->csn-base, if there be prepared

transaction in database.

 

What do you think about it, I want try to test and improve the patch step
by step. 

①https://www.postgresql.org/message-id/21BC916B-80A1-43BF-8650-3363CCDAE09C%40postgrespro.ru 

-----------
Regards,

Highgo Software (Canada/China/Pakistan)

URL : http://www.highgo.ca/

EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachment Content-Type Size
0001-CSN-base-snapshot.patch application/octet-stream 49.0 KB
0002-Wal-for-csn.patch application/octet-stream 26.8 KB

From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-12 09:41:15
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello hackers,

Currently, I do some changes based on the last version:
1. Catch up to the current commit (c2bd1fec32ab54).
2. Add regression and document.
3. Add support to switch from xid-base snapshot to csn-base snapshot,
and the same with standby side.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachment Content-Type Size
0001-CSN-base-snapshot.patch application/octet-stream 49.4 KB
0002-Wal-for-csn.patch application/octet-stream 27.2 KB
0003-snapshot-switch.patch application/octet-stream 26.4 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-13 06:44:44
Message-ID: CAA4eK1+kGOusSir609P=6Hcf7tmqXAipuCkJbeozVcU0CNV0Lg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 12, 2020 at 3:11 PM movead(dot)li(at)highgo(dot)ca <movead(dot)li(at)highgo(dot)ca> wrote:
>
> Hello hackers,
>
> Currently, I do some changes based on the last version:
> 1. Catch up to the current commit (c2bd1fec32ab54).
> 2. Add regression and document.
> 3. Add support to switch from xid-base snapshot to csn-base snapshot,
> and the same with standby side.
>

AFAIU, this patch is to improve scalability and also will be helpful
for Global Snapshots stuff, is that right? If so, how much
performance/scalability benefit this patch will have after Andres's
recent work on scalability [1]?

[1] - https://www.postgresql.org/message-id/20200301083601.ews6hz5dduc3w2se%40alap3.anarazel.de

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-15 07:48:16
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020/06/12 18:41, movead(dot)li(at)highgo(dot)ca wrote:
> Hello hackers,
>
> Currently, I do some changes based on the last version:
> 1. Catch up to the current  commit (c2bd1fec32ab54).
> 2. Add regression and document.
> 3. Add support to switch from xid-base snapshot to csn-base snapshot,
> and the same with standby side.

Andrey also seems to be proposing the similar patch [1] that introduces CSN
into core. Could you tell me what the difference between his patch and yours?
If they are almost the same, we should focus on one together rather than
working separately?

Regards,

[1]
https://www.postgresql.org/message-id/[email protected]

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-18 11:57:13
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020/06/15 16:48, Fujii Masao wrote:
>
>
> On 2020/06/12 18:41, movead(dot)li(at)highgo(dot)ca wrote:
>> Hello hackers,
>>
>> Currently, I do some changes based on the last version:
>> 1. Catch up to the current  commit (c2bd1fec32ab54).
>> 2. Add regression and document.
>> 3. Add support to switch from xid-base snapshot to csn-base snapshot,
>> and the same with standby side.

Probably it's not time to do the code review yet, but when I glanced the patch,
I came up with one question.

0002 patch changes GenerateCSN() so that it generates CSN-related WAL records
(and inserts it into WAL buffers). Which means that new WAL record is generated
whenever CSN is assigned, e.g., in GetSnapshotData(). Is this WAL generation
really necessary for CSN?

BTW, GenerateCSN() is called while holding ProcArrayLock. Also it inserts new
WAL record in WriteXidCsnXlogRec() while holding spinlock. Firstly this is not
acceptable because spinlocks are intended for *very* short-term locks.
Secondly, I don't think that WAL generation during ProcArrayLock is good
design because ProcArrayLock is likely to be bottleneck and its term should
be short for performance gain.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: "Fujii Masao" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-19 03:12:12
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Thanks for reply.

>Probably it's not time to do the code review yet, but when I glanced the patch,
>I came up with one question.
>0002 patch changes GenerateCSN() so that it generates CSN-related WAL records
>(and inserts it into WAL buffers). Which means that new WAL record is generated
>whenever CSN is assigned, e.g., in GetSnapshotData(). Is this WAL generation
>really necessary for CSN?
This is designed for crash recovery, here we record our most new lsn in wal so it
will not use a history lsn after a restart. It will not write into wal every time, but with
a gap which you can see CSNAddByNanosec() function.

>BTW, GenerateCSN() is called while holding ProcArrayLock. Also it inserts new
>WAL record in WriteXidCsnXlogRec() while holding spinlock. Firstly this is not
>acceptable because spinlocks are intended for *very* short-term locks.
>Secondly, I don't think that WAL generation during ProcArrayLock is good
>design because ProcArrayLock is likely to be bottleneck and its term should
>be short for performance gain.
Thanks for point out which may help me deeply, I will reconsider that.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-19 03:29:49
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Thanks for reply.

>AFAIU, this patch is to improve scalability and also will be helpful
>for Global Snapshots stuff, is that right? If so, how much
>performance/scalability benefit this patch will have after Andres's
>recent work on scalability [1]?
The patch focus on to be an infrastructure of sharding feature, according
to my test almost has the same performance with and without the patch.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-19 03:56:16
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020/06/19 12:12, movead(dot)li(at)highgo(dot)ca wrote:
>
> Thanks for reply.
>
> >Probably it's not time to do the code review yet, but when I glanced the patch,
>>I came up with one question.
>>0002 patch changes GenerateCSN() so that it generates CSN-related WAL records
>>(and inserts it into WAL buffers). Which means that new WAL record is generated
>>whenever CSN is assigned, e.g., in GetSnapshotData(). Is this WAL generation
>>really necessary for CSN?
> This is designed for crash recovery, here we record our most new lsn in wal so it
> will not use a history lsn after a restart. It will not write into wal every time, but with
> a gap which you can see CSNAddByNanosec() function.

You mean that the last generated CSN needs to be WAL-logged because any smaller
CSN than the last one should not be reused after crash recovery. Right?

If right, that WAL-logging seems not necessary because CSN mechanism assumes
CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
crash recovery must be larger than that before. No?

>>BTW, GenerateCSN() is called while holding ProcArrayLock. Also it inserts new
>>WAL record in WriteXidCsnXlogRec() while holding spinlock. Firstly this is not
>>acceptable because spinlocks are intended for *very* short-term locks.
>>Secondly, I don't think that WAL generation during ProcArrayLock is good
>>design because ProcArrayLock is likely to be bottleneck and its term should
>>be short for performance gain.
> Thanks for point out which may help me deeply, I will reconsider that.

Thanks for working on this!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: "Fujii Masao" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-19 04:36:44
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>You mean that the last generated CSN needs to be WAL-logged because any smaller
>CSN than the last one should not be reused after crash recovery. Right?
Yes that's it.

>If right, that WAL-logging seems not necessary because CSN mechanism assumes
>CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
>crash recovery must be larger than that before. No?

CSN collected based on time of system in this patch, but time is not reliable all the
time. And it designed for Global CSN(for sharding) where it may rely on CSN from
other node , which generated from other machine.

So monotonically is not reliable and it need to keep it's largest CSN in wal in case
of crash.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-19 05:54:26
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020/06/19 13:36, movead(dot)li(at)highgo(dot)ca wrote:
>
> >You mean that the last generated CSN needs to be WAL-logged because any smaller
>>CSN than the last one should not be reused after crash recovery. Right?
> Yes that's it.
>
>>If right, that WAL-logging seems not necessary because CSN mechanism assumes
>>CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
>>crash recovery must be larger than that before. No?
>
> CSN collected based on time of  system in this patch, but time is not reliable all the
> time. And it designed for Global CSN(for sharding) where it may rely on CSN from
> other node , which generated from other machine.
>
> So monotonically is not reliable and it need to keep it's largest CSN in wal in case
> of crash.

Thanks for the explanaion! Understood.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-06-29 07:47:38
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/12/20 2:41 PM, movead(dot)li(at)highgo(dot)ca wrote:
> Hello hackers,
>
> Currently, I do some changes based on the last version:
> 1. Catch up to the current  commit (c2bd1fec32ab54).
> 2. Add regression and document.
> 3. Add support to switch from xid-base snapshot to csn-base snapshot,
> and the same with standby side.

Some remarks on your patch:
1. The variable last_max_csn can be an atomic variable.
2. GenerateCSN() routine: in the case than csn < csnState->last_max_csn
This is the case when someone changed the value of the system clock. I
think it is needed to write a WARNING to the log file. (May be we can do
synchronization with a time server.
3. That about global snapshot xmin? In the pgpro version of the patch we
had GlobalSnapshotMapXmin() routine to maintain circular buffer of
oldestXmins for several seconds in past. This buffer allows to shift
oldestXmin in the past when backend is importing global transaction.
Otherwise old versions of tuples that were needed for this transaction
can be recycled by other processes (vacuum, HOT, etc).
How do you implement protection from local pruning? I saw
SNAP_DESYNC_COMPLAIN, but it is not used anywhere.
4. The current version of the patch is not applied clearly with current
master.

--
regards,
Andrey Lepikhov
Postgres Professional


From: Movead Li <movead(dot)li(at)highgo(dot)ca>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: "pgsql-hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-02 14:31:56
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for the remarks,

>Some remarks on your patch:

>1. The variable last_max_csn can be an atomic variable.

Yes will consider.

>2. GenerateCSN() routine: in the case than csn < csnState->last_max_csn

>This is the case when someone changed the value of the system clock. I

>think it is needed to write a WARNING to the log file. (May be we can do

>synchronization with a time server.

Yes good point, I will work out a way to report the warning, it should exist a 

report gap rather than report every time it generates CSN.

If we really need a correct time? What's the inferiority if one node generate

csn by monotonically increasing?

>3. That about global snapshot xmin? In the pgpro version of the patch we

>had GlobalSnapshotMapXmin() routine to maintain circular buffer of

>oldestXmins for several seconds in past. This buffer allows to shift

>oldestXmin in the past when backend is importing global transaction.

>Otherwise old versions of tuples that were needed for this transaction

>can be recycled by other processes (vacuum, HOT, etc).

>How do you implement protection from local pruning? I saw

>SNAP_DESYNC_COMPLAIN, but it is not used anywhere.

I have researched your patch which is so great, in the patch only data

out of 'global_snapshot_defer_time' can be vacuum, and it keep dead

tuple even if no snapshot import at all,right?

I am thanking about a way if we can start remain dead tuple just before

we import a csn snapshot.

Base on Clock-SI paper, we should get local CSN then send to shard nodes,

because we do not known if the shard nodes' csn bigger or smaller then

master node, so we should keep some dead tuple all the time to support

snapshot import anytime.

Then if we can do a small change to CLock-SI model, we do not use the 

local csn when transaction start, instead we touch every shard node for

require their csn, and shard nodes start keep dead tuple, and master node

choose the biggest csn to send to shard nodes.

By the new way, we do not need to keep dead tuple all the time and do

not need to manage a ring buf, we can give to ball to 'snapshot too old'

feature. But for trade off, almost all shard node need wait. 

I will send more detail explain in few days.

>4. The current version of the patch is not applied clearly with current

>master.

Maybe it's because of the release of PG13, it cause some conflict, I will

rebase it.

---
Regards,

Highgo Software (Canada/China/Pakistan)

URL : http://www.highgo.ca/

EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Movead Li <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-03 03:24:22
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/2/20 7:31 PM, Movead Li wrote:
> Thanks for the remarks,
>
> >Some remarks on your patch:
> >1. The variable last_max_csn can be an atomic variable.
> Yes will consider.
>
> >2. GenerateCSN() routine: in the case than csn < csnState->last_max_csn
> >This is the case when someone changed the value of the system clock. I
> >think it is needed to write a WARNING to the log file. (May be we can do
> >synchronization with a time server.
> Yes good point, I will work out a way to report the warning, it should
> exist a
> report gap rather than report every time it generates CSN.
> If we really need a correct time? What's the inferiority if one node
> generate
> csn by monotonically increasing?
Changes in time values can lead to poor effects, such as old snapshot.
Adjusting the time can be a kind of defense.
>
> >3. That about global snapshot xmin? In the pgpro version of the patch we
> >had GlobalSnapshotMapXmin() routine to maintain circular buffer of
> >oldestXmins for several seconds in past. This buffer allows to shift
> >oldestXmin in the past when backend is importing global transaction.
> >Otherwise old versions of tuples that were needed for this transaction
> >can be recycled by other processes (vacuum, HOT, etc).
> >How do you implement protection from local pruning? I saw
> >SNAP_DESYNC_COMPLAIN, but it is not used anywhere.
> I have researched your patch which is so great, in the patch only data
> out of 'global_snapshot_defer_time' can be vacuum, and it keep dead
> tuple even if no snapshot import at all,right?
>
> I am thanking about a way if we can start remain dead tuple just before
> we import a csn snapshot.
>
> Base on Clock-SI paper, we should get local CSN then send to shard nodes,
> because we do not known if the shard nodes' csn bigger or smaller then
> master node, so we should keep some dead tuple all the time to support
> snapshot import anytime.
>
> Then if we can do a small change to CLock-SI model, we do not use the
> local csn when transaction start, instead we touch every shard node for
> require their csn, and shard nodes start keep dead tuple, and master node
> choose the biggest csn to send to shard nodes.
>
> By the new way, we do not need to keep dead tuple all the time and do
> not need to manage a ring buf, we can give to ball to 'snapshot too old'
> feature. But for trade off, almost all shard node need wait.
> I will send more detail explain in few days.
I think, in the case of distributed system and many servers it can be
bottleneck.
Main idea of "deferred time" is to reduce interference between DML
queries in the case of intensive OLTP workload. This time can be reduced
if the bloationg of a database prevails over the frequency of
transaction aborts.
>
>
> >4. The current version of the patch is not applied clearly with current
> >master.
> Maybe it's because of the release of PG13, it cause some conflict, I will
> rebase it.
Ok
>
> ---
> Regards,
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca <http://www.highgo.ca/>
> EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
>
>

--
regards,
Andrey Lepikhov
Postgres Professional


From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: a(dot)lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, amit(dot)kapila16 <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-04 14:56:17
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Andrey

>> I have researched your patch which is so great, in the patch only data
>> out of 'global_snapshot_defer_time' can be vacuum, and it keep dead
>> tuple even if no snapshot import at all,right?
>>
>> I am thanking about a way if we can start remain dead tuple just before
>> we import a csn snapshot.
>>
>> Base on Clock-SI paper, we should get local CSN then send to shard nodes,
>> because we do not known if the shard nodes' csn bigger or smaller then
>> master node, so we should keep some dead tuple all the time to support
>> snapshot import anytime.
>>
>> Then if we can do a small change to CLock-SI model, we do not use the
>> local csn when transaction start, instead we touch every shard node for
>> require their csn, and shard nodes start keep dead tuple, and master node
>> choose the biggest csn to send to shard nodes.
>>
>> By the new way, we do not need to keep dead tuple all the time and do
>> not need to manage a ring buf, we can give to ball to 'snapshot too old'
>> feature. But for trade off, almost all shard node need wait.
>> I will send more detail explain in few days.
>I think, in the case of distributed system and many servers it can be
>bottleneck.
>Main idea of "deferred time" is to reduce interference between DML
>queries in the case of intensive OLTP workload. This time can be reduced
>if the bloationg of a database prevails over the frequency of
>transaction aborts.
OK there maybe a performance issue, and I have another question about Clock-SI.

For example we have three nodes, shard1(as master), shard2, shard3, which
(time of node2) > (time of node2) > (time of node3), and you can see a picture:
http://movead.gitee.io/picture/blog_img_bad/csn/clock_si_question.png
As far as I know about Clock-SI, left part of the blue line will setup as a snapshotif master require a snapshot at time t1. But in fact data A should in snapshot butnot and data B should out of snapshot but not.
If this scene may appear in your origin patch? Or something my understand aboutClock-SI is wrong?


From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "amit(dot)kapila16" <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-13 04:41:12
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/4/20 7:56 PM, movead(dot)li(at)highgo(dot)ca wrote:
>
>
> As far as I know about Clock-SI, left part of the blue line will
> setup as a snapshot
>
> if master require a snapshot at time t1. But in fact data A should
> in snapshot but
>
> not and data B should out of snapshot but not.
>
>
> If this scene may appear in your origin patch? Or something my
> understand about
>
> Clock-SI is wrong?
>
>
>

Sorry for late answer.

I have doubts that I fully understood your question, but still.
What real problems do you see here? Transaction t1 doesn't get state of
shard2 until time at node with shard2 won't reach start time of t1.
If transaction, that inserted B wants to know about it position in time
relatively to t1 it will generate CSN, attach to node1 and will see,
that t1 is not started yet.

Maybe you are saying about the case that someone who has a faster data
channel can use the knowledge from node1 to change the state at node2?
If so, i think it is not a problem, or you can explain your idea.

--
regards,
Andrey Lepikhov
Postgres Professional


From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-13 06:46:23
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>I have doubts that I fully understood your question, but still.
>What real problems do you see here? Transaction t1 doesn't get state of
>shard2 until time at node with shard2 won't reach start time of t1.
>If transaction, that inserted B wants to know about it position in time
>relatively to t1 it will generate CSN, attach to node1 and will see,
>that t1 is not started yet.

>Maybe you are saying about the case that someone who has a faster data
>channel can use the knowledge from node1 to change the state at node2?
>If so, i think it is not a problem, or you can explain your idea.
Sorry, I think this is my wrong understand about Clock-SI. At first I expect
we can get a absolutly snapshot, for example B should not include in the
snapshot because it happened after time t1. How ever Clock-SI can not guarantee
that and no design guarantee that at all.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-13 11:18:52
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020/06/19 14:54, Fujii Masao wrote:
>
>
> On 2020/06/19 13:36, movead(dot)li(at)highgo(dot)ca wrote:
>>
>>  >You mean that the last generated CSN needs to be WAL-logged because any smaller
>>> CSN than the last one should not be reused after crash recovery. Right?
>> Yes that's it.
>>
>>> If right, that WAL-logging seems not necessary because CSN mechanism assumes
>>> CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
>>> crash recovery must be larger than that before. No?
>>
>> CSN collected based on time of  system in this patch, but time is not reliable all the
>> time. And it designed for Global CSN(for sharding) where it may rely on CSN from
>> other node , which generated from other machine.
>>
>> So monotonically is not reliable and it need to keep it's largest CSN in wal in case
>> of crash.
>
> Thanks for the explanaion! Understood.

I have another question about this patch;

When checking each tuple visibility, we always have to get the CSN
corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
there was the suggestion that CSN should be stored in the tuple header
or somewhere (like hint bit) to avoid the overhead by very frequehntly
lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
But this patch doesn't seem to adopt that idea. So did you confirm that
such performance overhead by lookup for CSN SLRU is negligible?

Of course I know that idea has big issue, i.e., there is no enough space
to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
be able to replace XMIN or XMAX with CSN corresponding to them. But
it means that we have to struggle with one more wraparound issue
(CSN wraparound issue). So it's not easy to adopt that idea...

Sorry if this was already discussed and concluded...

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: "Fujii Masao" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-14 02:02:10
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>When checking each tuple visibility, we always have to get the CSN
>corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
>there was the suggestion that CSN should be stored in the tuple header
>or somewhere (like hint bit) to avoid the overhead by very frequehntly
>lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
>But this patch doesn't seem to adopt that idea. So did you confirm that
>such performance overhead by lookup for CSN SLRU is negligible?
This patch came from postgrespro's patch which shows a good performance,
I have simple test on current patch and result no performance decline.
And not everytime we do a tuple visibility need lookup forCSN SLRU, only xid
large than 'TransactionXmin' need that. Maybe we have not touch the case
which cause bad performance, so it shows good performance temporary.

>Of course I know that idea has big issue, i.e., there is no enough space
>to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
>be able to replace XMIN or XMAX with CSN corresponding to them. But
>it means that we have to struggle with one more wraparound issue
>(CSN wraparound issue). So it's not easy to adopt that idea...

>Sorry if this was already discussed and concluded...
I think your point with CSN in tuple header is a exciting approach, but I have
not seen the discussion, can you show me the discussion address?

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-14 02:42:50
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020/07/14 11:02, movead(dot)li(at)highgo(dot)ca wrote:
>
>>When checking each tuple visibility, we always have to get the CSN
>>corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
>>there was the suggestion that CSN should be stored in the tuple header
>>or somewhere (like hint bit) to avoid the overhead by very frequehntly
>>lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
>>But this patch doesn't seem to adopt that idea. So did you confirm that
>>such performance overhead by lookup for CSN SLRU is negligible?
> This patch came from postgrespro's patch which shows a good performance,
> I have simple test on current patch and result no performance decline.

This is good news! When I read the past discussions about CSN, my impression
was that the performance overhead by CSN SLRU lookup might become one of
show-stopper for CSN. So I was worring about this issue...

> And not everytime we do a tuple visibility need lookup forCSN SLRU, only xid
> large than 'TransactionXmin' need that. Maybe we have not touch the case
> which cause bad performance, so it shows good performance temporary.

Yes, we would need more tests in several cases.

>>Of course I know that idea has big issue, i.e., there is no enough space
>>to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
>>be able to replace XMIN or XMAX with CSN corresponding to them. But
>>it means that we have to struggle with one more wraparound issue
>>(CSN wraparound issue). So it's not easy to adopt that idea...
>
>>Sorry if this was already discussed and concluded...
> I think your point with CSN in tuple header is a exciting approach, but I have
> not seen the discussion, can you show me the discussion address?

Probably you can find the discussion by searching with the keywords
"CSN" and "hint bit". For example,

https://www.postgresql.org/message-id/CAPpHfdv7BMwGv=OfUg3S-jGVFKqHi79pR_ZK1Wsk-13oZ+cy5g@mail.gmail.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-15 09:06:14
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/13/20 11:46 AM, movead(dot)li(at)highgo(dot)ca wrote:

I continue to see your patch. Some code improvements see at the attachment.

Questions:
* csnSnapshotActive is the only member of the CSNshapshotShared struct.
* The WriteAssignCSNXlogRec() routine. I din't understand why you add 20
nanosec to current CSN and write this into the WAL. For simplify our
communication, I rewrote this routine in accordance with my opinion (see
patch in attachment).

At general, maybe we will add your WAL writing CSN machinery + TAP tests
to the patch from the thread [1] and work on it together?

[1]
https://www.postgresql.org/message-id/flat/07b2c899-4ed0-4c87-1327-23c750311248%40postgrespro.ru

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment Content-Type Size
0001-improvements.patch text/x-patch 4.4 KB

From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-07-29 02:14:57
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Currently, we are developing and test global snapshot on branch[2] created by
Andrey, I want to keep a latest patch set on this thread so that hackers can easily
catch every change on this area.

This time it change little point come up by Fujii Masao about WriteXidCsnXlogRec()
should out of spinlocks, and add comments for CSNAddByNanosec(), and other
fine tunings.

[1]
https://github.com/danolivo/pgClockSI

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachment Content-Type Size
0002-Wal-for-csn.patch application/octet-stream 27.7 KB
0003-snapshot-switch.patch application/octet-stream 24.6 KB
0004-globale-snapshot-infrastructure.patch application/octet-stream 33.1 KB
0001-CSN-base-snapshot.patch application/octet-stream 47.5 KB

From: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: POC and rebased patch for CSN based snapshots
Date: 2020-08-10 01:52:59
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I find an issue with snapshot switch part of last patch, the xmin_for_csn value is
wrong in TransactionIdGetCSN() function. I try to hold xmin_for_csn in pg_control
and add a UnclearCSN statue for transactionid. And new patches attached.

Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Attachment Content-Type Size
0004-globale-snapshot-infrastructure.patch application/octet-stream 33.2 KB
0001-CSN-base-snapshot.patch application/octet-stream 47.5 KB
0002-Wal-for-csn.patch application/octet-stream 27.7 KB
0003-snapshot-switch.patch application/octet-stream 29.5 KB