Lists: | pgsql-hackers |
---|
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | generate syscache info automatically |
Date: | 2023-05-30 21:57:53 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
I want to report on my on-the-plane-to-PGCon project.
The idea was mentioned in [0]. genbki.pl already knows everything about
system catalog indexes. If we add a "please also make a syscache for
this one" flag to the catalog metadata, we can have genbki.pl produce
the tables in syscache.c and syscache.h automatically.
Aside from avoiding the cumbersome editing of those tables, I think this
layout is also conceptually cleaner, as you can more easily see which
system catalog indexes have syscaches and maybe ask questions about why
or why not.
As a possible follow-up, I have also started work on generating the
ObjectProperty structure in objectaddress.c. One of the things you need
for that is making genbki.pl aware of the syscache information. There
is some more work to be done there, but it's looking promising.
Attachment | Content-Type | Size |
---|---|---|
0001-Update-DECLARE_INDEX-documentation.patch | text/plain | 1.6 KB |
0002-Restructure-DECLARE_INDEX-arguments.patch | text/plain | 65.4 KB |
0003-genbki.pl-Factor-out-boilerplate-generation.patch | text/plain | 3.8 KB |
0004-Generate-syscache-info-from-catalog-files.patch | text/plain | 72.0 KB |
0005-WIP-Generate-ObjectProperty-from-catalog-files.patch | text/plain | 6.7 KB |
From: | Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-05-31 17:02:57 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Peter Eisentraut <peter(at)eisentraut(dot)org> writes:
> The idea was mentioned in [0]. genbki.pl already knows everything about
> system catalog indexes. If we add a "please also make a syscache for
> this one" flag to the catalog metadata, we can have genbki.pl produce
> the tables in syscache.c and syscache.h automatically.
+1 on this worthwhile reduction of manual work. Tangentially, it
reminded me of one of my least favourite parts of Catalog.pm, the
regexes in ParseHeader():
> diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
> index 84aaeb002a..a727d692b7 100644
> --- a/src/backend/catalog/Catalog.pm
> +++ b/src/backend/catalog/Catalog.pm
> @@ -110,7 +110,7 @@ sub ParseHeader
> };
> }
> elsif (
> - /^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*(\w+),\s*(\d+),\s*(\w+),\s*(.+)\)/
> + /^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*(\w+),\s*(\d+),\s*(\w+),\s*(\w+),\s*(.+)\)/
> )
> {
> push @{ $catalog{indexing} },
> @@ -120,7 +120,8 @@ sub ParseHeader
> index_name => $3,
> index_oid => $4,
> index_oid_macro => $5,
> - index_decl => $6
> + table_name => $6,
> + index_decl => $7
> };
> }
> elsif (/^DECLARE_OID_DEFINING_MACRO\(\s*(\w+),\s*(\d+)\)/)
Now that we require Perl 5.14, we could replace this parenthesis-
counting nightmare with named captures (introduced in Perl 5.10), which
would make the above change look like this instead (context expanded to
show the whole elsif block):
elsif (
/^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*
(?<index_name>\w+),\s*
(?<index_oid>\d+),\s*
(?<index_oid_macro>\w+),\s*
+ (?<table_name>\w+),\s*
(?<index_decl>.+)
\)/x
)
{
push @{ $catalog{indexing} },
{
is_unique => $1 ? 1 : 0,
is_pkey => $2 ? 1 : 0,
%+,
};
}
For other patterns without the optional bits in the keyword, it becomes
even simpler, e.g.
if (/^DECLARE_TOAST\(\s*
(?<parent_table>\w+),\s*
(?<toast_oid>\d+),\s*
(?<toast_index_oid>\d+)\s*
\)/x
)
{
push @{ $catalog{toasting} }, {%+};
}
I'd be happy to submit a patch to do this for all the ParseHeader()
regexes (in a separate thread) if others agree this is an improvement.
- ilmari
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-05-31 21:24:01 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 31.05.23 13:02, Dagfinn Ilmari Mannsåker wrote:
> For other patterns without the optional bits in the keyword, it becomes
> even simpler, e.g.
>
> if (/^DECLARE_TOAST\(\s*
> (?<parent_table>\w+),\s*
> (?<toast_oid>\d+),\s*
> (?<toast_index_oid>\d+)\s*
> \)/x
> )
> {
> push @{ $catalog{toasting} }, {%+};
> }
>
>
> I'd be happy to submit a patch to do this for all the ParseHeader()
> regexes (in a separate thread) if others agree this is an improvement.
I would welcome such a patch.
From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-06-15 07:45:22 |
Message-ID: | CAFBsxsHMaZ9yR7p=JpzdC2ynKwkDb1PrEr=Q5M__6cmhbbt2iQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter(at)eisentraut(dot)org>
wrote:
>
> I want to report on my on-the-plane-to-PGCon project.
>
> The idea was mentioned in [0]. genbki.pl already knows everything about
> system catalog indexes. If we add a "please also make a syscache for
> this one" flag to the catalog metadata, we can have genbki.pl produce
> the tables in syscache.c and syscache.h automatically.
>
> Aside from avoiding the cumbersome editing of those tables, I think this
> layout is also conceptually cleaner, as you can more easily see which
> system catalog indexes have syscaches and maybe ask questions about why
> or why not.
When this has come up before, one objection was that index declarations
shouldn't know about cache names and bucket sizes [1]. The second paragraph
above makes a reasonable case for that, however. I believe one alternative
idea was for a script to read the enum, which would look something like
this:
#define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
enum SysCacheIdentifier
{
DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
...
};
...which would then look up the other info in the usual way from Catalog.pm.
> As a possible follow-up, I have also started work on generating the
> ObjectProperty structure in objectaddress.c. One of the things you need
> for that is making genbki.pl aware of the syscache information. There
> is some more work to be done there, but it's looking promising.
I haven't studied this, but it seems interesting.
One other possible improvement: syscache.c has a bunch of #include's, one
for each catalog with a cache, so there's still a bit of manual work in
adding a cache, and the current #include list is a bit cumbersome. Perhaps
it's worth it to have the script emit them as well?
I also wonder if at some point it will make sense to split off a separate
script(s) for some things that are unrelated to the bootstrap data.
genbki.pl is getting pretty large, and there are additional things that
could be done with syscaches, e.g. inlined eq/hash functions for cache
lookup [2].
[1] https://www.postgresql.org/message-id/[email protected]
[2]
https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de
--
John Naylor
EDB: http://www.enterprisedb.com
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-07-03 05:45:39 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Here is an updated patch set that adjusts for the recently introduced
named captures. I will address the other issues later. I think the
first few patches in the series can be considered nonetheless.
On 15.06.23 09:45, John Naylor wrote:
> On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter(at)eisentraut(dot)org
> <mailto:peter(at)eisentraut(dot)org>> wrote:
> >
> > I want to report on my on-the-plane-to-PGCon project.
> >
> > The idea was mentioned in [0]. genbki.pl <http://genbki.pl> already
> knows everything about
> > system catalog indexes. If we add a "please also make a syscache for
> > this one" flag to the catalog metadata, we can have genbki.pl
> <http://genbki.pl> produce
> > the tables in syscache.c and syscache.h automatically.
> >
> > Aside from avoiding the cumbersome editing of those tables, I think this
> > layout is also conceptually cleaner, as you can more easily see which
> > system catalog indexes have syscaches and maybe ask questions about why
> > or why not.
>
> When this has come up before, one objection was that index declarations
> shouldn't know about cache names and bucket sizes [1]. The second
> paragraph above makes a reasonable case for that, however. I believe one
> alternative idea was for a script to read the enum, which would look
> something like this:
>
> #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
>
> enum SysCacheIdentifier
> {
> DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
> ...
> };
>
> ...which would then look up the other info in the usual way from Catalog.pm.
>
> > As a possible follow-up, I have also started work on generating the
> > ObjectProperty structure in objectaddress.c. One of the things you need
> > for that is making genbki.pl <http://genbki.pl> aware of the syscache
> information. There
> > is some more work to be done there, but it's looking promising.
>
> I haven't studied this, but it seems interesting.
>
> One other possible improvement: syscache.c has a bunch of #include's,
> one for each catalog with a cache, so there's still a bit of manual work
> in adding a cache, and the current #include list is a bit cumbersome.
> Perhaps it's worth it to have the script emit them as well?
>
> I also wonder if at some point it will make sense to split off a
> separate script(s) for some things that are unrelated to the bootstrap
> data. genbki.pl <http://genbki.pl> is getting pretty large, and there
> are additional things that could be done with syscaches, e.g. inlined
> eq/hash functions for cache lookup [2].
>
> [1] https://www.postgresql.org/message-id/[email protected]
> <https://www.postgresql.org/message-id/[email protected]>
> [2]
> https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de <https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de>
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Update-DECLARE_INDEX-documentation.patch | text/plain | 1.6 KB |
v2-0002-Restructure-DECLARE_INDEX-arguments.patch | text/plain | 65.1 KB |
v2-0003-genbki.pl-Factor-out-boilerplate-generation.patch | text/plain | 3.8 KB |
v2-0004-Generate-syscache-info-from-catalog-files.patch | text/plain | 71.7 KB |
v2-0005-Fill-in-more-of-ObjectProperty.patch | text/plain | 2.3 KB |
v2-0006-WIP-Generate-ObjectProperty-from-catalog-files.patch | text/plain | 6.7 KB |
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-08-24 14:03:29 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 03.07.23 07:45, Peter Eisentraut wrote:
> Here is an updated patch set that adjusts for the recently introduced
> named captures. I will address the other issues later. I think the
> first few patches in the series can be considered nonetheless.
I have committed the 0001 patch, which was really a (code comment) bug fix.
I think the patches 0002 and 0003 should be uncontroversial, so I'd like
to commit them if no one objects.
The remaining patches are still WIP.
> On 15.06.23 09:45, John Naylor wrote:
>> On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter(at)eisentraut(dot)org
>> <mailto:peter(at)eisentraut(dot)org>> wrote:
>> >
>> > I want to report on my on-the-plane-to-PGCon project.
>> >
>> > The idea was mentioned in [0]. genbki.pl <http://genbki.pl> already
>> knows everything about
>> > system catalog indexes. If we add a "please also make a syscache for
>> > this one" flag to the catalog metadata, we can have genbki.pl
>> <http://genbki.pl> produce
>> > the tables in syscache.c and syscache.h automatically.
>> >
>> > Aside from avoiding the cumbersome editing of those tables, I think
>> this
>> > layout is also conceptually cleaner, as you can more easily see which
>> > system catalog indexes have syscaches and maybe ask questions about
>> why
>> > or why not.
>>
>> When this has come up before, one objection was that index
>> declarations shouldn't know about cache names and bucket sizes [1].
>> The second paragraph above makes a reasonable case for that, however.
>> I believe one alternative idea was for a script to read the enum,
>> which would look something like this:
>>
>> #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
>>
>> enum SysCacheIdentifier
>> {
>> DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
>> ...
>> };
>>
>> ...which would then look up the other info in the usual way from
>> Catalog.pm.
>>
>> > As a possible follow-up, I have also started work on generating the
>> > ObjectProperty structure in objectaddress.c. One of the things you
>> need
>> > for that is making genbki.pl <http://genbki.pl> aware of the
>> syscache information. There
>> > is some more work to be done there, but it's looking promising.
>>
>> I haven't studied this, but it seems interesting.
>>
>> One other possible improvement: syscache.c has a bunch of #include's,
>> one for each catalog with a cache, so there's still a bit of manual
>> work in adding a cache, and the current #include list is a bit
>> cumbersome. Perhaps it's worth it to have the script emit them as well?
>>
>> I also wonder if at some point it will make sense to split off a
>> separate script(s) for some things that are unrelated to the bootstrap
>> data. genbki.pl <http://genbki.pl> is getting pretty large, and there
>> are additional things that could be done with syscaches, e.g. inlined
>> eq/hash functions for cache lookup [2].
>>
>> [1]
>> https://www.postgresql.org/message-id/[email protected]
>> <https://www.postgresql.org/message-id/[email protected]>
>> [2]
>> https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de <https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de>
>>
>> --
>> John Naylor
>> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-08-31 11:23:00 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
I have committed 0002 and 0003, and also a small bug fix in the
ObjectProperty entry for "transforms".
I have also gotten the automatic generation of the ObjectProperty lookup
table working (with some warts).
Attached is an updated patch set.
One win here is that the ObjectProperty lookup now uses a hash function
instead of a linear search. So hopefully the performance is better (to
be confirmed) and it could now be used for things like [0].
There was some discussion about whether the catalog files are the right
place to keep syscache information. Personally, I would find that more
convenient. (Note that we also recently moved the definitions of
indexes and toast tables from files with the whole list into the various
catalog files.) But we can also keep it somewhere else. The important
thing is that Catalog.pm can find it somewhere in a structured form.
To finish up the ObjectProperty generation, we also need to store some
more data, namely the OBJECT_* mappings. We probably do not want to
store those in the catalog headers; that looks like a layering violation
to me.
So, there is some discussion to be had about where to put all this
across various use cases.
On 24.08.23 16:03, Peter Eisentraut wrote:
> On 03.07.23 07:45, Peter Eisentraut wrote:
>> Here is an updated patch set that adjusts for the recently introduced
>> named captures. I will address the other issues later. I think the
>> first few patches in the series can be considered nonetheless.
>
> I have committed the 0001 patch, which was really a (code comment) bug fix.
>
> I think the patches 0002 and 0003 should be uncontroversial, so I'd like
> to commit them if no one objects.
>
> The remaining patches are still WIP.
>
>
>> On 15.06.23 09:45, John Naylor wrote:
>>> On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut
>>> <peter(at)eisentraut(dot)org <mailto:peter(at)eisentraut(dot)org>> wrote:
>>> >
>>> > I want to report on my on-the-plane-to-PGCon project.
>>> >
>>> > The idea was mentioned in [0]. genbki.pl <http://genbki.pl>
>>> already knows everything about
>>> > system catalog indexes. If we add a "please also make a syscache for
>>> > this one" flag to the catalog metadata, we can have genbki.pl
>>> <http://genbki.pl> produce
>>> > the tables in syscache.c and syscache.h automatically.
>>> >
>>> > Aside from avoiding the cumbersome editing of those tables, I
>>> think this
>>> > layout is also conceptually cleaner, as you can more easily see which
>>> > system catalog indexes have syscaches and maybe ask questions
>>> about why
>>> > or why not.
>>>
>>> When this has come up before, one objection was that index
>>> declarations shouldn't know about cache names and bucket sizes [1].
>>> The second paragraph above makes a reasonable case for that, however.
>>> I believe one alternative idea was for a script to read the enum,
>>> which would look something like this:
>>>
>>> #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
>>>
>>> enum SysCacheIdentifier
>>> {
>>> DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
>>> ...
>>> };
>>>
>>> ...which would then look up the other info in the usual way from
>>> Catalog.pm.
>>>
>>> > As a possible follow-up, I have also started work on generating the
>>> > ObjectProperty structure in objectaddress.c. One of the things
>>> you need
>>> > for that is making genbki.pl <http://genbki.pl> aware of the
>>> syscache information. There
>>> > is some more work to be done there, but it's looking promising.
>>>
>>> I haven't studied this, but it seems interesting.
>>>
>>> One other possible improvement: syscache.c has a bunch of #include's,
>>> one for each catalog with a cache, so there's still a bit of manual
>>> work in adding a cache, and the current #include list is a bit
>>> cumbersome. Perhaps it's worth it to have the script emit them as well?
>>>
>>> I also wonder if at some point it will make sense to split off a
>>> separate script(s) for some things that are unrelated to the
>>> bootstrap data. genbki.pl <http://genbki.pl> is getting pretty large,
>>> and there are additional things that could be done with syscaches,
>>> e.g. inlined eq/hash functions for cache lookup [2].
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/[email protected]
>>> <https://www.postgresql.org/message-id/[email protected]>
>>> [2]
>>> https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de <https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de>
>>>
>>> --
>>> John Naylor
>>> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
>
>
>
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Fill-in-more-of-ObjectProperty.patch | text/plain | 2.5 KB |
v3-0002-Generate-syscache-info-from-catalog-files.patch | text/plain | 73.9 KB |
v3-0003-Generate-ObjectProperty-from-catalog-files.patch | text/plain | 14.1 KB |
From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-09-01 11:31:14 |
Message-ID: | CAFBsxsHJF5RVvS_3JkAssOo-hOsz74cMG-A2BFm9O5x3bWxS+g@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter(at)eisentraut(dot)org>
wrote:
> Attached is an updated patch set.
>
> One win here is that the ObjectProperty lookup now uses a hash function
> instead of a linear search. So hopefully the performance is better (to
> be confirmed) and it could now be used for things like [0].
+ # XXX This one neither, but if I add it to @skip, PerfectHash will fail.
(???)
+ #FIXME: AttributeRelationId
I took a quick look at this, and attached is the least invasive way to get
it working for now, which is to bump the table size slightly. The comment
says doing this isn't reliable, but it happens to work in this case.
Playing around with the functions is hit-or-miss, and when that fails,
somehow the larger table saves the day.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
quick-hack-fix-hash.patch.txt | text/plain | 1.6 KB |
From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-09-02 10:37:50 |
Message-ID: | CAFBsxsEyug2O=P=3BeOEAhofEyQ_fJoQ6fSmc8SEWvkYa=y+MQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
I wrote:
> + # XXX This one neither, but if I add it to @skip, PerfectHash will
fail. (???)
> + #FIXME: AttributeRelationId
>
> I took a quick look at this, and attached is the least invasive way to
get it working for now, which is to bump the table size slightly. The
comment says doing this isn't reliable, but it happens to work in this
case. Playing around with the functions is hit-or-miss, and when that
fails, somehow the larger table saves the day.
To while away a rainy day, I poked at this a bit more and found the input
is pathological with our current methods. Even with a large-ish exhaustive
search, the two success are strange in that they only succeeded by
accidentally bumping the table size up to where I got it to work before
(77):
With multipliers (5, 19), it recognizes that the initial table size (75) is
a multiple of 5, so increases the table size to 76, which is a multiple of
19, so it increases it again to 77 and succeeds.
Same with (3, 76): 75 is a multiple of 3, so up to 76, which of course
divides 76, so bumps it to 77 likewise.
Turning the loop into
a = (a ^ c) * 257;
b = (b ^ c) * X;
...seems to work very well.
In fact, now trying some powers-of-two for X before the primes works most
of the time with our inputs, even for some unicode normalization functions,
on the first seed iteration. That likely won't make any difference in
practice, but it's an interesting demo. I've attached these two draft ideas
as text.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v301-0001-Use-XOR-for-combining-and-do-it-before-mixing.patch.txt | text/plain | 2.0 KB |
v301-0002-Use-powers-of-two-when-choosing-multipliers-for.patch.txt | text/plain | 2.5 KB |
From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-09-11 11:02:18 |
Message-ID: | CAFBsxsEY5tYGpJ3OPEtFd_cW-=DP5WXG3tsCo+f+mAoV+WU8hw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter(at)eisentraut(dot)org>
wrote:
> Attached is an updated patch set.
> There was some discussion about whether the catalog files are the right
> place to keep syscache information. Personally, I would find that more
> convenient. (Note that we also recently moved the definitions of
> indexes and toast tables from files with the whole list into the various
> catalog files.) But we can also keep it somewhere else. The important
> thing is that Catalog.pm can find it somewhere in a structured form.
I don't have anything further to say on how to fit it together, but I'll go
ahead share some other comments from a quick read of v3-0003:
+ # XXX hardcoded exceptions
+ # extension doesn't belong to extnamespace
+ $attnum_namespace = undef if $catname eq 'pg_extension';
+ # pg_database owner is spelled datdba
+ $attnum_owner = "Anum_pg_database_datdba" if $catname eq 'pg_database';
These exceptions seem like true exceptions...
+ # XXX?
+ $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription';
+ # XXX?
+ $is_nsp_name_unique = 1 if $catname eq 'pg_collation';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_opclass';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_opfamily';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_subscription';
... but as the XXX conveys, these indicate a failure to do the right thing.
Trying to derive this info from the index declarations is currently
fragile. E.g. making one small change to this regex:
- if ($index->{index_decl} =~ /\(\w+name name_ops(,
\w+namespace oid_ops)?\)/)
+ if ($index->{index_decl} =~ /\w+name name_ops(,
\w+namespace oid_ops)?\)/)
...causes "is_nsp_name_unique" to flip from false to true, and/or sets
"name_catcache_id" where it wasn't before, for several entries. It's not
quite clear what the "rule" is intended to be, or whether it's robust going
forward.
That being the case, perhaps some effort should also be made to make it
easy to verify the output matches HEAD (or rather, v3-0001). This is now
difficult because of the C99 ".foo = bar" syntax, plus the alphabetical
ordering.
+foreach my $catname (@catnames)
+{
+ print $objectproperty_info_fh qq{#include "catalog/${catname}_d.h"\n};
+}
Assuming we have a better way to know which catalogs need
object properties, is there a todo to only #include those?
> To finish up the ObjectProperty generation, we also need to store some
> more data, namely the OBJECT_* mappings. We probably do not want to
> store those in the catalog headers; that looks like a layering violation
> to me.
Possibly, but it's also convenient and, one could argue, more
straightforward than storing syscache id and num_buckets in the index info.
One last thing: I did try to make the hash function use uint16 for the key
(to reduce loop iterations on nul bytes), and that didn't help, so we are
left with the ideas I mentioned earlier.
(not changing CF status, because nothing specific is really required to
change at the moment, some things up in the air)
--
John Naylor
EDB: http://www.enterprisedb.com
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Subject: | Re: generate syscache info automatically |
Date: | 2023-11-01 21:13:03 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Here is a rebased patch set, along with a summary of the questions I
have about these patches:
v4-0001-Generate-syscache-info-from-catalog-files.patch
What's a good syntax to declare a syscache? Currently, I have, for example
-DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId,
pg_type, btree(oid oid_ops));
+DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId,
pg_type, btree(oid oid_ops), TYPEOID, 64);
I suppose a sensible alternative could be to leave the
DECLARE_..._INDEX... alone and make a separate statement, like
MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64);
That's at least visually easier, because some of those
DECLARE_... lines are getting pretty long.
I would like to keep those MAKE_SYSCACHE lines in the catalog files.
That just makes it easier to reference everything together.
v4-0002-Generate-ObjectProperty-from-catalog-files.patch
Several questions here:
* What's a good way to declare the mapping between catalog and object
type?
* How to select which catalogs have ObjectProperty entries generated?
* Where to declare class descriptions? Or just keep the current hack in
the patch.
* How to declare the purpose of a catalog column, like "this is the ACL
column for this catalog". This is currently done by name, but maybe
it should be more explicit.
* Question about how to pick the correct value for is_nsp_name_unique?
This second patch is clearly still WIP. I hope the first patch can be
finished soon, however.
I was also amused to find the original commit fa351d5a0db that
introduced ObjectProperty, which contains the following comment:
Performance isn't critical here, so there's no need to be smart
about how we do the search.
which I'm now trying to amend.
On 11.09.23 07:02, John Naylor wrote:
>
> On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter(at)eisentraut(dot)org
> <mailto:peter(at)eisentraut(dot)org>> wrote:
>
> > Attached is an updated patch set.
>
> > There was some discussion about whether the catalog files are the right
> > place to keep syscache information. Personally, I would find that more
> > convenient. (Note that we also recently moved the definitions of
> > indexes and toast tables from files with the whole list into the various
> > catalog files.) But we can also keep it somewhere else. The important
> > thing is that Catalog.pm can find it somewhere in a structured form.
>
> I don't have anything further to say on how to fit it together, but I'll
> go ahead share some other comments from a quick read of v3-0003:
>
> + # XXX hardcoded exceptions
> + # extension doesn't belong to extnamespace
> + $attnum_namespace = undef if $catname eq 'pg_extension';
> + # pg_database owner is spelled datdba
> + $attnum_owner = "Anum_pg_database_datdba" if $catname eq 'pg_database';
>
> These exceptions seem like true exceptions...
>
> + # XXX?
> + $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription';
> + # XXX?
> + $is_nsp_name_unique = 1 if $catname eq 'pg_collation';
> + $is_nsp_name_unique = 1 if $catname eq 'pg_opclass';
> + $is_nsp_name_unique = 1 if $catname eq 'pg_opfamily';
> + $is_nsp_name_unique = 1 if $catname eq 'pg_subscription';
>
> ... but as the XXX conveys, these indicate a failure to do the right
> thing. Trying to derive this info from the index declarations is
> currently fragile. E.g. making one small change to this regex:
>
> - if ($index->{index_decl} =~ /\(\w+name name_ops(,
> \w+namespace oid_ops)?\)/)
> + if ($index->{index_decl} =~ /\w+name name_ops(,
> \w+namespace oid_ops)?\)/)
>
> ...causes "is_nsp_name_unique" to flip from false to true, and/or sets
> "name_catcache_id" where it wasn't before, for several entries. It's not
> quite clear what the "rule" is intended to be, or whether it's robust
> going forward.
>
> That being the case, perhaps some effort should also be made to make it
> easy to verify the output matches HEAD (or rather, v3-0001). This is now
> difficult because of the C99 ".foo = bar" syntax, plus the alphabetical
> ordering.
>
> +foreach my $catname (@catnames)
> +{
> + print $objectproperty_info_fh qq{#include "catalog/${catname}_d.h"\n};
> +}
>
> Assuming we have a better way to know which catalogs need
> object properties, is there a todo to only #include those?
>
> > To finish up the ObjectProperty generation, we also need to store some
> > more data, namely the OBJECT_* mappings. We probably do not want to
> > store those in the catalog headers; that looks like a layering violation
> > to me.
>
> Possibly, but it's also convenient and, one could argue, more
> straightforward than storing syscache id and num_buckets in the index info.
>
> One last thing: I did try to make the hash function use uint16 for the
> key (to reduce loop iterations on nul bytes), and that didn't help, so
> we are left with the ideas I mentioned earlier.
>
> (not changing CF status, because nothing specific is really required to
> change at the moment, some things up in the air)
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Generate-syscache-info-from-catalog-files.patch | text/plain | 74.0 KB |
v4-0002-Generate-ObjectProperty-from-catalog-files.patch | text/plain | 14.1 KB |
From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Subject: | Re: generate syscache info automatically |
Date: | 2024-01-10 08:00:23 |
Message-ID: | CANWCAZaDbzRvGTOejr32gxNgWAD_Xs__YW2TfwVQjpRGGr-TGA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Nov 2, 2023 at 4:13 AM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> Here is a rebased patch set, along with a summary of the questions I
> have about these patches:
This is an excellent summary of the issues, thanks.
> v4-0001-Generate-syscache-info-from-catalog-files.patch
>
> What's a good syntax to declare a syscache? Currently, I have, for example
>
> -DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId,
> pg_type, btree(oid oid_ops));
> +DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId,
> pg_type, btree(oid oid_ops), TYPEOID, 64);
>
> I suppose a sensible alternative could be to leave the
> DECLARE_..._INDEX... alone and make a separate statement, like
>
> MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64);
>
> That's at least visually easier, because some of those
> DECLARE_... lines are getting pretty long.
Probably a good idea, and below I mention a third possible macro.
> I would like to keep those MAKE_SYSCACHE lines in the catalog files.
> That just makes it easier to reference everything together.
That seems fine. If we ever want to do something more invasive with
the syscaches, some of that can be copied/moved out into a separate
script, but what's there in 0001 is pretty small.
> v4-0002-Generate-ObjectProperty-from-catalog-files.patch
>
> Several questions here:
>
> * What's a good way to declare the mapping between catalog and object
> type?
Perhaps this idea:
> I suppose a sensible alternative could be to leave the
> DECLARE_..._INDEX... alone and make a separate statement, like
>
> MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64);
...could be used, so something like
DECLARE_OBJECT_INFO(OBJECT_ACCESS_METHOD);
> * How to select which catalogs have ObjectProperty entries generated?
Would the above work for that as well?
> * Where to declare class descriptions? Or just keep the current hack in
> the patch.
I don't have an opinion on this.
> * How to declare the purpose of a catalog column, like "this is the ACL
> column for this catalog". This is currently done by name, but maybe
> it should be more explicit.
Perhaps we can have additional column macros, like BKI_OBJ_ACL or
something, but that doesn't seem like it would result in less code.
> * Question about how to pick the correct value for is_nsp_name_unique?
How is that known now -- I don't mean in the patch, but in general?
Also, I get most of the hard-wired exceptions, but
+ # XXX?
+ $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription';
What is not right otherwise?
+ if ($index->{index_decl} eq 'btree(oid oid_ops)')
+ {
+ $oid_index = $index->{index_oid_macro};
+ $oid_syscache = $index->{syscache_name};
+ }
+ if ($index->{index_decl} =~ /\(\w+name name_ops(, \w+namespace oid_ops)?\)/)
+ {
+ $name_syscache = $index->{syscache_name};
+ $is_nsp_name_unique = 1 if $index->{is_unique};
+ }
The variables name_syscache and syscache_name are unfortunately close
to eachother.
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2024-01-17 12:46:02 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 10.01.24 09:00, John Naylor wrote:
>> I suppose a sensible alternative could be to leave the
>> DECLARE_..._INDEX... alone and make a separate statement, like
>>
>> MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64);
>>
>> That's at least visually easier, because some of those
>> DECLARE_... lines are getting pretty long.
>
> Probably a good idea, and below I mention a third possible macro.
I updated the patch to use this style (but I swapped the first two
arguments from my example, so that the thing being created is named first).
I also changed the names of the output files a bit to make them less
confusing. (I initially had some files named .c.h, which was weird, but
apparently necessary to avoid confusing the build system. But it's all
clearer now.)
Other than bugs and perhaps style opinions, I think the first patch is
pretty good now.
I haven't changed much in the second patch, other than to update it for
the code changes made in the first patch. It's still very much
WIP/preview at the moment.
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Generate-syscache-info-from-catalog-files.patch | text/plain | 64.4 KB |
v5-0002-WIP-Generate-ObjectProperty-from-catalog-files.patch | text/plain | 14.4 KB |
From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2024-01-19 05:28:28 |
Message-ID: | CANWCAZbvK_OVAfZPXxK8tey53CTnmwuxgJAbFEBWt57rAUN0Ag@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Jan 17, 2024 at 7:46 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> I updated the patch to use this style (but I swapped the first two
> arguments from my example, so that the thing being created is named first).
>
> I also changed the names of the output files a bit to make them less
> confusing. (I initially had some files named .c.h, which was weird, but
> apparently necessary to avoid confusing the build system. But it's all
> clearer now.)
>
> Other than bugs and perhaps style opinions, I think the first patch is
> pretty good now.
LGTM. The only style consideration that occurred to me just now was
MAKE_SYSCACHE, since it doesn't match the DECLARE_... macros. It seems
like the same thing from a code generation perspective. The other
macros refer to relations, so there is a difference, but maybe it
doesn't matter. I don't have a strong opinion.
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2024-01-19 14:03:37 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 19.01.24 06:28, John Naylor wrote:
> On Wed, Jan 17, 2024 at 7:46 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>>
>> I updated the patch to use this style (but I swapped the first two
>> arguments from my example, so that the thing being created is named first).
>>
>> I also changed the names of the output files a bit to make them less
>> confusing. (I initially had some files named .c.h, which was weird, but
>> apparently necessary to avoid confusing the build system. But it's all
>> clearer now.)
>>
>> Other than bugs and perhaps style opinions, I think the first patch is
>> pretty good now.
>
> LGTM. The only style consideration that occurred to me just now was
> MAKE_SYSCACHE, since it doesn't match the DECLARE_... macros. It seems
> like the same thing from a code generation perspective. The other
> macros refer to relations, so there is a difference, but maybe it
> doesn't matter. I don't have a strong opinion.
The DECLARE_* macros map to actual BKI commands ("declare toast",
"declare index"). So I wanted to use a different verb to distinguish
"generate code for this" from those BKI commands.
From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2024-01-22 00:33:13 |
Message-ID: | CANWCAZb9gdCEawdOO3UUN2rheqD4qMqMRvL4=trP-byAO1Oqiw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Jan 19, 2024 at 9:03 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> The DECLARE_* macros map to actual BKI commands ("declare toast",
> "declare index"). So I wanted to use a different verb to distinguish
> "generate code for this" from those BKI commands.
That makes sense to me.
From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: generate syscache info automatically |
Date: | 2024-01-23 18:58:30 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 22.01.24 01:33, John Naylor wrote:
> On Fri, Jan 19, 2024 at 9:03 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>>
>> The DECLARE_* macros map to actual BKI commands ("declare toast",
>> "declare index"). So I wanted to use a different verb to distinguish
>> "generate code for this" from those BKI commands.
>
> That makes sense to me.
I have committed the first patch, and the buildfarm hiccup seems to have
passed.
I'll close this commitfest entry now. I have enough feedback to work on
the ObjectProperty generation, but I'll make a new entry for that.