Support TZ format code in to_timestamp()

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Support TZ format code in to_timestamp()
Date: 2023-06-13 16:20:42
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

It's annoyed me for some time that to_timestamp() doesn't implement
the TZ format code that to_char() has. I finally got motivated to
do something about that after the complaint at [1] that jsonpath's
datetime() method can't read typical JSON.stringify() output like
"2023-05-22T03:09:37.825Z". We do already understand "Z" as a
time zone abbreviation for UTC; we just need to get formatting.c
to support this.

Hence, the attached patch teaches to_timestamp() to read time
zone abbreviations as well as HH and HH:MM numeric zone offsets
when TZ is specified. (We need to accept HH and HH:MM to be
sure that we can round-trip the output of to_char(), since its
TZ output code will fall back to one of those if it does not
know any abbreviation for the current zone.)

You might reasonably say that we should make it read time zone names
not only abbreviations. I tried to make that work, and realized that
it'd create a complete mess because tzload() is so lax about what it
will interpret as a POSIX-style timezone spec. I included an example
in the test cases below: I think that

to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS')

should work and read just "EST" as the TZ value, allowing the "24"
to be read as the SS value. But tzload() will happily eat all of
"ESTFOO24" as a POSIX zone spec.

We could conceivably refactor tzload() enough to match only tzdb zone
names in this context. But I'm very hesitant to do that, for a few
reasons:

* it would make localtime.c diverge significantly from the upstream
IANA source code;

* we only need to support zone abbreviations to ensure we can
round-trip the output of to_char();

* it's not clear to me that average users would understand why
to_timestamp() accepts some but not all zone names that are accepted
by the TimeZone GUC and timestamptz input. If we document it as
taking only timezone abbreviations, that does correspond to a
concept that's in the manual already.

So I think that the attached represents a reasonable and useful
compromise. I'll park this in the July commitfest.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/014A028B-5CE6-4FDF-AC24-426CA6FC9CEE%40mohiohio.com

Attachment Content-Type Size
v1-0001-Support-timezone-abbreviations-in-to_timestamp.patch text/x-diff 22.0 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Support TZ format code in to_timestamp()
Date: 2023-06-21 18:07:34
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 13, 2023 at 12:20:42PM -0400, Tom Lane wrote:
> It's annoyed me for some time that to_timestamp() doesn't implement
> the TZ format code that to_char() has. I finally got motivated to
> do something about that after the complaint at [1] that jsonpath's
> datetime() method can't read typical JSON.stringify() output like
> "2023-05-22T03:09:37.825Z". We do already understand "Z" as a
> time zone abbreviation for UTC; we just need to get formatting.c
> to support this.

I have to admit I tend to prefer actual time zone names like
"America/New_York" over acronyms or offset values. However, I can see
the dump/restore problem with such names.

Parenthetically, I often use airport codes that map to time zones in my
own calendar. I would argue that on a global scale airport codes are
actually more useful than abbreviations like EST, assuming you don't
need to designate whether daylight saving time was active, e.g. EST vs.
EDT.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.


From: David Steele <david(at)pgmasters(dot)net>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Support TZ format code in to_timestamp()
Date: 2023-06-21 18:52:44
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 6/21/23 20:07, Bruce Momjian wrote:
> On Tue, Jun 13, 2023 at 12:20:42PM -0400, Tom Lane wrote:
>> It's annoyed me for some time that to_timestamp() doesn't implement
>> the TZ format code that to_char() has. I finally got motivated to
>> do something about that after the complaint at [1] that jsonpath's
>> datetime() method can't read typical JSON.stringify() output like
>> "2023-05-22T03:09:37.825Z". We do already understand "Z" as a
>> time zone abbreviation for UTC; we just need to get formatting.c
>> to support this.
>
> I have to admit I tend to prefer actual time zone names like
> "America/New_York" over acronyms or offset values. However, I can see
> the dump/restore problem with such names.

I think the abbreviations are worse than useless -- dangerously
misleading even. I was converting a timestamp I had pulled from the
internet the other day in IST (India Standard Time) using Postres to
test some new code I was working on. I got a rather surprising result so
changed it to Asia/Kolkata and got what I expected.

Turns out IST is *also* Israel Standard Time and Irish Standard Time. I
think Postres gave me the conversion in Irish time. At any rate, it was
not offset by 30 minutes which was the dead giveaway.

Offsets are fine when you just need an absolute date to feed into
something like recovery and it doesn't much matter what timezone you
were in.

Z and UTC also seem fine since they are unambiguous as far as I know.

Regards,
-David


From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-22 01:57:02
Message-ID: CAHut+Ps4=tbPoSAjanbVFGpbSbS1NoN66i8V-uYQDWHv0sJX_w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review" [1], but there has
been no activity on this thread for 7+ months.

If nothing more is planned for this thread then it will be closed
("Returned with feedback") at the end of this CF.

======
[1] https://commitfest.postgresql.org/46/4362/

Kind Regards,
Peter Smith.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-22 02:10:17
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Peter Smith <smithpb2250(at)gmail(dot)com> writes:
> Hi, this patch was marked in CF as "Needs Review" [1], but there has
> been no activity on this thread for 7+ months.
> If nothing more is planned for this thread then it will be closed
> ("Returned with feedback") at the end of this CF.

I still think it would be a good idea, but I can't deny the lack
of other interest in it. Unless someone steps up to review,
let's close it.

(FTR, I don't agree with David's objections to the entire concept
of zone abbreviations. We're not going to remove support for them
everywhere else, so why shouldn't to_timestamp() handle them?)

regards, tom lane


From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Smith <smithpb2250(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-22 15:25:39
Message-ID: CAJ7c6TOk7t_2Qg=QCDjiAe_bkgXzpmGDJdSLdFEAmOX3tkPmTQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

> > Hi, this patch was marked in CF as "Needs Review" [1], but there has
> > been no activity on this thread for 7+ months.
> > If nothing more is planned for this thread then it will be closed
> > ("Returned with feedback") at the end of this CF.
>
> I still think it would be a good idea, but I can't deny the lack
> of other interest in it. Unless someone steps up to review,
> let's close it.

I agree that it would be a good idea, and again I would like to
condemn the approach "since no one reviews it we are going to reject
it". A friendly reminder like "hey, this patch was waiting long
enough, maybe someone could take a look" would be more appropriate
IMO. I remember during previous commitfests some CF managers created a
list of patches that could use more attention. That was useful.

I will review the patch, but probably only tomorrow.

--
Best regards,
Aleksander Alekseev


From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-22 15:43:03
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

> On 22 Jan 2024, at 03:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I still think it would be a good idea, but I can't deny the lack
> of other interest in it. Unless someone steps up to review,
> let's close it.

Since I had this on my (ever-growing) TODO I re-prioritized today and took a
look at it since I think it's something we should support.

Nothing really sticks out and I was unable to poke any holes so I don't have
too much more to offer than a LGTM.

+ while (len > 0)
+ {
+ const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
+ zoneabbrevtbl->numabbrevs);

My immediate reaction was that we should stop at prefix lengths of two since I
could only think of abbreviations of two or more. Googling and reading found
that there are indeed one-letter timezones (Alpha, Bravo etc..). Not sure if
it's worth mentioning that in the comment to help other readers who aren't neck
deep in timezones?

+ /* FALL THRU */

Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall
through" a few cases up in the same switch. While we are quite inconsistent
across the tree, consistency within a file is preferrable (regardless of
which).

--
Daniel Gustafsson


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-22 16:25:14
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Daniel Gustafsson <daniel(at)yesql(dot)se> writes:
> On 22 Jan 2024, at 03:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> + while (len > 0)
> + {
> + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
> + zoneabbrevtbl->numabbrevs);

> My immediate reaction was that we should stop at prefix lengths of two since I
> could only think of abbreviations of two or more. Googling and reading found
> that there are indeed one-letter timezones (Alpha, Bravo etc..). Not sure if
> it's worth mentioning that in the comment to help other readers who aren't neck
> deep in timezones?

The one I usually think of is "Z" for UTC; I wasn't actually aware
that there were any other single-letter abbrevs. But in any case
I don't see a reason for this code to be making such assumptions.

> + /* FALL THRU */

> Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall
> through" a few cases up in the same switch. While we are quite inconsistent
> across the tree, consistency within a file is preferrable (regardless of
> which).

Fair. I tend to shorten it, but I failed to notice that there was
nearby precedent for the other way.

regards, tom lane


From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Smith <smithpb2250(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-23 13:26:48
Message-ID: CAJ7c6TOPsuzM0AwO1X_Fr077ZFYGi7Pg=TKEtB230f8n=yAr4w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

> Since I had this on my (ever-growing) TODO I re-prioritized today and took a
> look at it since I think it's something we should support.
>
> Nothing really sticks out and I was unable to poke any holes so I don't have
> too much more to offer than a LGTM.
>
> + while (len > 0)
> + {
> + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
> + zoneabbrevtbl->numabbrevs);
>
> My immediate reaction was that we should stop at prefix lengths of two since I
> could only think of abbreviations of two or more. Googling and reading found
> that there are indeed one-letter timezones (Alpha, Bravo etc..). Not sure if
> it's worth mentioning that in the comment to help other readers who aren't neck
> deep in timezones?
>
> + /* FALL THRU */
>
> Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall
> through" a few cases up in the same switch. While we are quite inconsistent
> across the tree, consistency within a file is preferrable (regardless of
> which).

I reviewed the patch and tested it on MacOS and generally concur with
stated above. The only nitpick I have is the apparent lack of negative
tests for to_timestamp(), e.g. when the string doesn't match the
specified format.

--
Best regards,
Aleksander Alekseev


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Smith <smithpb2250(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-23 22:33:34
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Aleksander Alekseev <aleksander(at)timescale(dot)com> writes:
> I reviewed the patch and tested it on MacOS and generally concur with
> stated above. The only nitpick I have is the apparent lack of negative
> tests for to_timestamp(), e.g. when the string doesn't match the
> specified format.

That's an excellent suggestion indeed, because when I tried

SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ'); -- error

I got

ERROR: invalid value "JU" for "TZ"
DETAIL: Value must be an integer.

which seems pretty off-point. In the attached I made it give an
error message about a bad zone abbreviation if the input starts
with a letter, but perhaps the dividing line between "probably
meant as a zone name" and "probably meant as numeric" should be
drawn differently?

Anyway, v2-0001 below is the previous patch rebased up to current
(only line numbers change), and then v2-0002 responds to your
and Daniel's review comments.

regards, tom lane

Attachment Content-Type Size
v2-0001-Support-timezone-abbreviations-in-to_timestamp.patch text/x-diff 22.0 KB
v2-0002-Respond-to-review-comments.patch text/x-diff 3.7 KB

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-23 23:33:36
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

> On 23 Jan 2024, at 23:33, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Anyway, v2-0001 below is the previous patch rebased up to current
> (only line numbers change), and then v2-0002 responds to your
> and Daniel's review comments.

LGTM.

--
Daniel Gustafsson


From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Smith <smithpb2250(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-24 12:49:51
Message-ID: CAJ7c6TOVPxAhqc6PyAtmT1=pnPjQTpx4oSXWtU=oOMW6cJnYYg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

> > Anyway, v2-0001 below is the previous patch rebased up to current
> > (only line numbers change), and then v2-0002 responds to your
> > and Daniel's review comments.
>
> LGTM.

```
+SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI
TZ'); -- error
+ERROR: invalid value "JUNK" for "TZ"
+DETAIL: Time zone abbreviation is not recognized.
+SELECT to_timestamp('2011-12-18 11:38 ...', 'YYYY-MM-DD HH12:MI TZ'); -- error
+ERROR: invalid value ".." for "TZ"
```

Shouldn't the second error display the full value "..." (three dots)
similarly to the previous one? Also I think we need at least one
negative test for OF.

Other than that v2 looks OK.

--
Best regards,
Aleksander Alekseev


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Smith <smithpb2250(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Support TZ format code in to_timestamp()
Date: 2024-01-24 16:13:13
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Aleksander Alekseev <aleksander(at)timescale(dot)com> writes:
> +SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ'); -- error
> +ERROR: invalid value "JUNK" for "TZ"
> +DETAIL: Time zone abbreviation is not recognized.
> +SELECT to_timestamp('2011-12-18 11:38 ...', 'YYYY-MM-DD HH12:MI TZ'); -- error
> +ERROR: invalid value ".." for "TZ"

> Shouldn't the second error display the full value "..." (three dots)
> similarly to the previous one?

That's coming from the pre-existing OF code, which is looking for
a integer of at most two digits. I'm not especially inclined to
mess with that, and even if I were I'd think it should be a separate
patch.

> Also I think we need at least one
> negative test for OF.

OK.

regards, tom lane