Skip to content

PharData created from zip has incorrect timestamp #12532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danepowell opened this issue Oct 27, 2023 · 18 comments
Closed

PharData created from zip has incorrect timestamp #12532

danepowell opened this issue Oct 27, 2023 · 18 comments

Comments

@danepowell
Copy link

danepowell commented Oct 27, 2023

Description

The attached zip file contains one text file with a modified time of 1680284661:
test.zip

Running this code in a directory with the zip file on macOS:

<?php
$phar = new PharData('test.zip');
print($phar->getMTime());

Resulted in this output:

1680317060

But I expected this output instead:

1680284661

Some people also get 1680284660 as output. Note that 1680317060 and 1680284661 are exactly 9 hours and 1 second apart, making me think this could be a timezone issue and the 1 second might be a rounding error.

Actually, I would perhaps expect the mtime to be that of the zipfile itself, not of its contents. I don't think the behavior of setting the mtime based on contents is documented anywhere. Regardless, the mtime should have some correlation to a real value 😄

PHP Version

8.2.11 from Homebrew

Operating System

macOS 13.6 (2021 Macbook Pro, Apple M1 Max)

@nielsdos
Copy link
Member

I don't see the bug here?
The file inside the zip was modified at 31 March '23 19:44(GMT+2) or in UTC time: 17:44.
The time reported back by getMTime() is 31 March '23 19:44(GMT+2) or in UTC time: 17:44.
So they match.

Keep in mind that unix time is always the number of seconds since Jan 1 1970 in UTC.

@nielsdos
Copy link
Member

Ah wait, your code actually returns 1680284660 on Linux so it is correct for me, but you're getting 1680317060...
Looks like a bug in getMTime then? I'll check more closely later.

@nielsdos
Copy link
Member

The commit/issue you linked couldn't have introduced the issue, because prior to that commit the date was always in 1980.
I don't see how 1680317060 is possible, we also don't have apple-specific code. The bug does not manifest on Linux.
I don't have access to a macOS install, so someone else will have to debug this...

@devnexen
Copy link
Member

I get same result in both Linux/macOs (same as posted, brew 8.2.11), GMT+1 1680288260

@nielsdos
Copy link
Member

Thank you David.
@danepowell Please provide a full stand-alone reproducer. It's possible there's more going on as we can't reproduce it.

@theofidry
Copy link

theofidry commented Oct 27, 2023

maybe pointless but should we give each other's machine specs? Both of my results are fairly different too

result: 1680284660
----
PHP 8.2.1
----
MacBook Pro M1
macOS: 13.5.1 (22G90)

Running the same code in a docker image:

result: 1680291860
----
PHP 8.1.24
5.15.49-linuxkit

@devnexen
Copy link
Member

devnexen commented Oct 27, 2023

Not much to say, I have 8.2.11 for both Mac M1 and Linux Fedora 38 (no VM involved at all).

@danepowell
Copy link
Author

It sounds like @devnexen and @theofidry can both reproduce this in Linux. Their values are different than mine (1680288260, 1680291860 vs 1680317060), but still nowhere close to correct (1680284660 or 1680284661).

@danepowell
Copy link
Author

Here's a minimum reproducible example: https://github.com/danepowell/phar-bug

@devnexen
Copy link
Member

If I change my date/time to your location (San Diego ?) I get the same as you on mac M1 if I set to let's say Paris then I get the "correct" value. Does not seem to be an OS issue to me at all, get same on Linux.

@nielsdos
Copy link
Member

nielsdos commented Oct 27, 2023

(...) but still nowhere close to correct (1680284660 or 1680284661).

I misread devnexen's answer, so yeah he can repro it. And looks like it's exactly 3600 seconds between his and my output. Which probably means the original file was made in GMT+2 where I am in, which explains why I couldn't repro it...

@nielsdos
Copy link
Member

This happens because phar wrongly uses localtime_r instead of gmtime_r. At least I think so. I'll try a fix.

@theofidry
Copy link

Would there be a way to get around it in PHP?

@nielsdos
Copy link
Member

After having slept a night about it I figured out what is actually going on.

First of all: The zip timestamp is not stored as a unix timestamp, but as a local time DOS-style timestamp. So there's literally the bytes representing 19:44 local time in the zip. The seconds part of that timestamp is zero, because DOS-style timestamps can only store even seconds (so it is rounded down to 0). This explains why you expect 1680284661 but I get 1680284660.

There is no timezone information in the zip, so it is treating 19:44 as local time. As unix time is UTC but the reader's timezone may not be UTC, we're all getting different results based on the timezone we're in. It only works properly if the timezone of the zip creator is equal to that of the zip reader. This is why I couldn't reproduce the issue at first: the zip was made in GMT+2 and I am in GMT+2.

The actual UTC timestamp is in the zip though! There's an extension to the zip file format that allows storing the unix timestamp. PHP does not support this however, which results in the wonky behaviour. The following patch adds support for this and appears to fix the problem you reported.
@theofidry @devnexen Could you please try this out and reason if it makes sense? 🙂

The patch applies to branch PHP-8.1.

diff --git a/ext/phar/pharzip.h b/ext/phar/pharzip.h
index 2daca5b339..1d5c07cc1c 100644
--- a/ext/phar/pharzip.h
+++ b/ext/phar/pharzip.h
@@ -146,6 +146,13 @@ typedef struct _phar_zip_unix3 {
 /* (var.)        variable    symbolic link filename */
 } phar_zip_unix3;
 
+/* See https://libzip.org/specifications/extrafld.txt */
+typedef struct _phar_zip_unix_time {
+	phar_zip_extra_field_header header;
+	char flags; /* flags							1 byte */
+	char time[4]; /* time in standard Unix format 	4 bytes	*/
+} phar_zip_unix_time;
+
 typedef struct _phar_zip_central_dir_file {
 	char signature[4];            /* central file header signature   4 bytes  (0x02014b50) */
 	char madeby[2];               /* version made by                 2 bytes */
diff --git a/ext/phar/zip.c b/ext/phar/zip.c
index 675cb7f616..2c6df53034 100644
--- a/ext/phar/zip.c
+++ b/ext/phar/zip.c
@@ -44,6 +44,7 @@ static int phar_zip_process_extra(php_stream *fp, phar_entry_info *entry, uint16
 	union {
 		phar_zip_extra_field_header header;
 		phar_zip_unix3 unix3;
+		phar_zip_unix_time time;
 	} h;
 	size_t read;
 
@@ -52,6 +53,34 @@ static int phar_zip_process_extra(php_stream *fp, phar_entry_info *entry, uint16
 			return FAILURE;
 		}
 
+		if (h.header.tag[0] == 'U' && h.header.tag[1] == 'T') {
+			/* Unix timestamp header found.
+			 * The flags field indicates which timestamp fields are present.
+			 * The size with a timestamp is at least 5 (== 9 - tag size) bytes, but may be larger.
+			 * We only store the modification time in the entry, so only read that.
+			 */
+			uint16_t header_size = PHAR_GET_16(h.header.size);
+			if (header_size >= 5) {
+				read = php_stream_read(fp, &h.time.flags, 5);
+				if (read != 5) {
+					return FAILURE;
+				}
+				if (h.time.flags & (1 << 0)) {
+					/* Modification time set */
+					entry->timestamp = PHAR_GET_32(h.time.time);
+				}
+
+				len -= header_size + 4;
+
+				/* Consume remaining bytes */
+				if (header_size != read) {
+					php_stream_seek(fp, header_size - read, SEEK_CUR);
+				}
+				continue;
+			}
+			/* Fallthrough to next if to skip header */
+		}
+
 		if (h.header.tag[0] != 'n' || h.header.tag[1] != 'u') {
 			/* skip to next header */
 			php_stream_seek(fp, PHAR_GET_16(h.header.size), SEEK_CUR);

The following test script:

<?php
date_default_timezone_set('Europe/Amsterdam'); // GMT+2 so the output matches what I expect from my file browser (19:44)
$phar = new PharData('test.zip');
echo $phar->getMTime(), "\n";
echo $phar->getFileInfo()->getMTime(), "\n";
echo date('Y-m-d H:i:s', $phar->getMTime()), "\n";
echo date('Y-m-d H:i:s', $phar->getCTime()), "\n";
echo date('Y-m-d H:i:s', $phar->getATime()), "\n";

Now produces:

1680284661
1680284661
2023-03-31 19:44:21
2023-03-31 19:44:21
2023-03-31 19:44:21

which is what I expect. You may need to change your timezone in the first line of the PHP test script to get the proper local time representation though.

@devnexen
Copy link
Member

works for me.

@danepowell
Copy link
Author

Very impressive sleuthing! Thanks for getting to the bottom of this.

I wonder if the DOS timestamp is an artifact of how this zip/txt file was created or if that's pretty standard? It's part of a broken test fixture box-project/box#1120 so I'm just trying to figure out if we can work around this prior to a PHP bugfix release.

@nielsdos
Copy link
Member

I wonder if the DOS timestamp is an artifact of how this zip/txt file was created or if that's pretty standard?

This is the standard way of storing date & time in zip files. The unix timestamp is actually an extension. The zip file format is quite old.

I'm just trying to figure out if we can work around this prior to a PHP bugfix release.

You can work around it by calling putenv("TZ=GMT-2"); prior to your test, and restore that environment variable afterwards to whatever it was before changing it. This should result in the timestamp 1680284660.

@danepowell
Copy link
Author

Thanks again, that worked indeed!

nielsdos added a commit to nielsdos/php-src that referenced this issue Oct 28, 2023
The datetime stored in the DOS time fields, which is what zip standard
uses, is local time without a timezone. There's an extension to the zip
file format since '97 that allows storing a unix timestamp (in UTC) in
the header for both the central directory and the local entries.
This patch adds support for this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants