Quick Links

SQL/JSON functions vs. ECPG vs. STRING as a reserved word

Lists:	pgsql-hackers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Meskes <meskes(at)postgresql(dot)org>
Subject:	SQL/JSON functions vs. ECPG vs. STRING as a reserved word
Date:	2022-05-29 20:19:42
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Commit 1a36bc9db (SQL/JSON query functions) introduced STRING as a
type_func_name_keyword. As per the complaint at [1], this broke use
of "string" as a table name, column name, function parameter name, etc.
The restriction about column names, in particular, seems likely to
break boatloads of applications --- wouldn't you think that "string"
is a pretty likely column name? We have no cover to claim "the SQL
standard says so", either, because they list STRING as an unreserved
keyword.

This is trivial enough to fix so far as the core grammar is concerned;
it works to just change STRING to be an unreserved_keyword. However,
various ECPG tests fall over, so I surmise that somebody felt it was
okay to break potentially thousands of applications in order to avoid
touching ECPG. I do not think that's an acceptable trade-off.

I poked into it a bit and found the proximate cause: ECPG uses
ECPGColLabelCommon to represent user-chosen type names in some
places, and that production *does not include unreserved_keyword*.
(Sure enough, the failing test cases try to use "string" as a
type name in a variable declaration.) That's a pre-existing
bit of awfulness, and it's indeed pretty awful, because it means
that any time we add a new keyword --- even a fully unreserved one
--- we risk breaking somebody's ECPG application. We just hadn't
happened to add any keywords to date that conflicted with type names
used in the ECPG test suite.

I looked briefly at whether we could improve that situation.
Two of the four uses of ECPGColLabelCommon in ecpg.trailer can be
converted to the more general ECPGColLabel without difficulty,
but trying to change either of the uses in var_type results in
literally thousands of shift/reduce and reduce/reduce conflicts.
This seems to be because what follows ecpgstart can be either a general
SQL statement or an ECPGVarDeclaration (beginning with var_type),
and bison isn't smart enough to disambiguate. I have a feeling that
this situation could be improved with enough elbow grease, because
plpgsql manages to solve a closely-related problem in allowing its
assignment statements to coexist with general SQL statements. But
I don't have the interest to tackle it personally, and anyway any
fix would probably be more invasive than we want to put in post-beta.

I also wondered briefly about just changing the affected test cases,
reasoning that breaking some ECPG applications that use "string" is
less bad than breaking everybody else that uses "string". But type
"string" is apparently a standard ECPG and/or INFORMIX thing, so we
probably can't get away with that.

Hence, I propose the attached, which fixes the easily-fixable
ECPGColLabelCommon uses and adds a hard-wired special case for
STRING to handle the var_type usage.

More generally, I feel like we have a process problem: there needs to
be a higher bar to adding new fully- or even partially-reserved words.
I might've missed it, but I don't recall that there was any discussion
of the compatibility implications of this change.

regards, tom lane

[1] https://www.postgresql.org/message-id/PAXPR02MB760049C92DFC2D8B5E8B8F5AE3DA9%40PAXPR02MB7600.eurprd02.prod.outlook.com

Attachment	Content-Type	Size
make-STRING-unreserved-1.patch	text/x-diff	3.7 KB

From:	Michael Meskes <meskes(at)postgresql(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Meskes <meskes(at)postgresql(dot)org>
Subject:	Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
Date:	2022-05-30 13:25:16
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I looked briefly at whether we could improve that situation.
> Two of the four uses of ECPGColLabelCommon in ecpg.trailer can be
> converted to the more general ECPGColLabel without difficulty,
> but trying to change either of the uses in var_type results in
> literally thousands of shift/reduce and reduce/reduce conflicts.
> This seems to be because what follows ecpgstart can be either a general
> SQL statement or an ECPGVarDeclaration (beginning with var_type),
> and bison isn't smart enough to disambiguate. I have a feeling that
> this situation could be improved with enough elbow grease, because
> plpgsql manages to solve a closely-related problem in allowing its
> assignment statements to coexist with general SQL statements. But
> I don't have the interest to tackle it personally, and anyway any
> fix would probably be more invasive than we want to put in post-beta.

Right, the reason for all this is that the part after the "exec sql" could be
either language, SQL or C. It has been like this for all those years. I do not
claim that the solution we have is the best, it's only the best I could come up
with when I implemented it. If anyone has a better way, please be my guest.

> I also wondered briefly about just changing the affected test cases,
> reasoning that breaking some ECPG applications that use "string" is
> less bad than breaking everybody else that uses "string". But type
> "string" is apparently a standard ECPG and/or INFORMIX thing, so we
> probably can't get away with that.

IIRC STRING is a normal datatype for INFORMIX and thus is implemented in ECPG
for its compatibility.

> Hence, I propose the attached, which fixes the easily-fixable
> ECPGColLabelCommon uses and adds a hard-wired special case for
> STRING to handle the var_type usage.

I'm fine with this approach.

Thanks Tom.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Michael Meskes <meskes(at)postgresql(dot)org>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
Date:	2022-05-30 21:20:15
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Michael Meskes <meskes(at)postgresql(dot)org> writes:
>> This seems to be because what follows ecpgstart can be either a general
>> SQL statement or an ECPGVarDeclaration (beginning with var_type),
>> and bison isn't smart enough to disambiguate. I have a feeling that
>> this situation could be improved with enough elbow grease, because
>> plpgsql manages to solve a closely-related problem in allowing its
>> assignment statements to coexist with general SQL statements.

> Right, the reason for all this is that the part after the "exec sql" could be
> either language, SQL or C. It has been like this for all those years. I do not
> claim that the solution we have is the best, it's only the best I could come up
> with when I implemented it. If anyone has a better way, please be my guest.

I pushed the proposed patch, but after continuing to think about
it I have an idea for a possible solution to the older problem.
I noticed that the problematic cases in var_type aren't really
interested in seeing any possible unreserved keyword: they care about
certain specific built-in type names (most of which are keywords
already) and about typedef names. Now, almost every C-parsing program
I've ever seen has to lex typedef names specially, so what if we made
ecpg do that too? After a couple of false starts, I came up with the
attached draft patch. The key ideas are:

1. In pgc.l, if an identifier is a typedef name, ignore any possible
keyword meaning and return it as an IDENT. (I'd originally supposed
that we'd want to return some new TYPEDEF token type, but that does
not seem to be necessary right now, and adding a new token type would
increase the patch footprint quite a bit.)

2. In the var_type production, forget about ECPGColLabel[Common]
and just handle the keywords we know we need, plus IDENT for the
typedef case. It turns out that we have to have duplicate coding
because most of these keywords are not keywords in C lexing mode,
so that they'll come through the IDENT path anyway when we're
in a C rather than SQL context. That seemed acceptable to me.
I thought about adding them all to the C keywords list but that
seemed likely to have undesirable side-effects, and again it'd
bloat the patch footprint.

This fix is not without downsides. Disabling recognition of
keywords that match typedefs means that, for example, if you
declare a typedef named "work" then ECPG will fail to parse
"EXEC SQL BEGIN WORK". So in a real sense this is just trading
one hazard for another. But there is an important difference:
with this, whether your ECPG program works depends only on what
typedef names and SQL commands are used in the program. If
it compiles today it'll still compile next year, whereas with
the present implementation the addition of some new unreserved
SQL keyword could break it. We'd have to document this change
for sure, and it wouldn't be something to back-patch, but it
seems like it might be acceptable from the users' standpoint.

We could narrow (not eliminate) this hazard if we could get the
typedef lookup in pgc.l to happen only when we're about to parse
a var_type construct. But because of Bison's lookahead behavior,
that seems to be impossible, or at least undesirably messy
and fragile. But perhaps somebody else will see a way.

Anyway, this seems like too big a change to consider for v15,
so I'll stick this patch into the v16 CF queue. It's only
draft quality anyway --- lacks documentation changes and test
cases. There are also some coding points that could use review.
Notably, I made the typedef lookup override SQL keywords but
not C keywords; this is for consistency with the C-mode lookup
rules, but is it the right thing?

regards, tom lane

Attachment	Content-Type	Size
handle-unreserved-typedef-names-1.patch	text/x-diff	10.3 KB

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Michael Meskes <meskes(at)postgresql(dot)org>
Subject:	Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
Date:	2022-05-31 15:09:23
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2022-05-29 Su 16:19, Tom Lane wrote:
> More generally, I feel like we have a process problem: there needs to
> be a higher bar to adding new fully- or even partially-reserved words.
> I might've missed it, but I don't recall that there was any discussion
> of the compatibility implications of this change.
>

Thanks for fixing this while I was away.

I did in fact raise the issue on 1 Feb, see
<https://postgr.es/m/[email protected]>,
but nobody responded that I recall. I guess I should have pushed the
discussion further

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Michael Meskes <meskes(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
Date:	2022-07-03 08:01:27
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 30, 2022 at 05:20:15PM -0400, Tom Lane wrote:

[allow EXEC SQL TYPE unreserved_keyword IS ...]

> 1. In pgc.l, if an identifier is a typedef name, ignore any possible
> keyword meaning and return it as an IDENT. (I'd originally supposed
> that we'd want to return some new TYPEDEF token type, but that does
> not seem to be necessary right now, and adding a new token type would
> increase the patch footprint quite a bit.)
>
> 2. In the var_type production, forget about ECPGColLabel[Common]
> and just handle the keywords we know we need, plus IDENT for the
> typedef case. It turns out that we have to have duplicate coding
> because most of these keywords are not keywords in C lexing mode,
> so that they'll come through the IDENT path anyway when we're
> in a C rather than SQL context. That seemed acceptable to me.
> I thought about adding them all to the C keywords list but that
> seemed likely to have undesirable side-effects, and again it'd
> bloat the patch footprint.
>
> This fix is not without downsides. Disabling recognition of
> keywords that match typedefs means that, for example, if you
> declare a typedef named "work" then ECPG will fail to parse
> "EXEC SQL BEGIN WORK". So in a real sense this is just trading
> one hazard for another. But there is an important difference:
> with this, whether your ECPG program works depends only on what
> typedef names and SQL commands are used in the program. If
> it compiles today it'll still compile next year, whereas with
> the present implementation the addition of some new unreserved
> SQL keyword could break it. We'd have to document this change
> for sure, and it wouldn't be something to back-patch, but it
> seems like it might be acceptable from the users' standpoint.

I agree this change is more likely to please a user than to harm a user. The
user benefit is slim, but the patch is also slim.

> We could narrow (not eliminate) this hazard if we could get the
> typedef lookup in pgc.l to happen only when we're about to parse
> a var_type construct. But because of Bison's lookahead behavior,
> that seems to be impossible, or at least undesirably messy
> and fragile. But perhaps somebody else will see a way.

I don't, though I'm not much of a Bison wizard.

> Anyway, this seems like too big a change to consider for v15,
> so I'll stick this patch into the v16 CF queue. It's only
> draft quality anyway --- lacks documentation changes and test
> cases. There are also some coding points that could use review.
> Notably, I made the typedef lookup override SQL keywords but
> not C keywords; this is for consistency with the C-mode lookup
> rules, but is it the right thing?

That decision seems fine. ScanCKeywordLookup() covers just twenty-six
keywords, and that list hasn't changed since 2003. Moreover, most of them are
keywords of the C language itself, so allowing them would entailing mangling
them in the generated C to avoid C compiler errors. Given the lack of
complaints, let's not go there.

I didn't locate any problems beyond the test and doc gaps that you mentioned,
so I've marked this Ready for Committer.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Michael Meskes <meskes(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
Date:	2022-07-03 17:08:36
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Mon, May 30, 2022 at 05:20:15PM -0400, Tom Lane wrote:
>> [allow EXEC SQL TYPE unreserved_keyword IS ...]

> I didn't locate any problems beyond the test and doc gaps that you mentioned,
> so I've marked this Ready for Committer.

Thanks! Here's a fleshed-out version with doc changes, plus adjustment
of preproc/type.pgc so that it exposes the existing problem. (No code
changes from v1.) I'll push this in a few days if there are not
objections.

regards, tom lane

Attachment	Content-Type	Size
handle-unreserved-typedef-names-2.patch	text/x-diff	21.1 KB