Skip to content

RFC: Make unserialize() emit a warning for trailing bytes #9630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 1, 2023

Conversation

TimWolla
Copy link
Member

@TimWolla TimWolla commented Sep 28, 2022

@@ -1406,6 +1406,11 @@ PHPAPI void php_unserialize_with_options(zval *return_value, const char *buf, co
zval_ptr_dtor(return_value);
}
RETVAL_FALSE;
} else if ((char*)p < buf + buf_len) {
Copy link
Contributor

@TysonAndre TysonAndre Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert this to an else { then a standalone if and always execute the level+refcounted checks that were executed before? I think we still want to copy retval to return_value right now when unserialize triggers another call to unserialize (e.g. class implementation of ::__serialize) calling \unserialize() with the serialization of an array with extra data), and dump the unserialized result

Add a test that this works properly with that - I expect it's broken due to the lack of ZVAL_COPY(return_value, retval)

	if (BG(unserialize).level > 1) {
		retval = var_tmp_var(&var_hash);
	} else {
		retval = return_value;
	}
} else {
    if (BG(unserialize).level > 1) {
		ZVAL_COPY(return_value, retval);
	} else if (Z_REFCOUNTED_P(return_value)) {
		zend_refcounted *ref = Z_COUNTED_P(return_value);
		gc_check_possible_root(ref);
	}
} /* End the new else */

Are you planning to remove this in a subsequent PHP release? E_DEPRECATED is another option, though it might be better to wait until we know what types of setups are effected (e.g. is anything storing data with trailing null bytes, are any extensions calling it with a length off by one, etc)

EDIT: I hadn't caught up on https://externals.io/message/118566 - To me, E_WARNING makes some sense for existing unserialize() failures because it warns that unserialize is not returning data (though it'll start throwing later), but for this, E_DEPRECATED might make sense for something that works in 8.x and older but will stop working in 9

  • I'd went with E_WARNING in igbinary partly because I didn't know how many users would be affected, but it seems like virtually none

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert this to an else { then a standalone if and always execute the level+refcounted checks that were executed before?

Indeed. The diff looked fishy to me, but I could not pinpoint what was wrong with it. You were right in that it breaks nested unserialize() calls (though only for Serializable).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to remove this in a subsequent PHP release?

I'm not currently planning anything specifically. I came across this, while working with the code and I wasted to keep track of it, so I created a PR.

To me, E_WARNING makes some sense for existing unserialize() failures because it warns that unserialize is not returning data (though it'll start throwing later), but for this, E_DEPRECATED might make sense for something that works in 8.x and older but will stop working in 9

unserialize() will also emit E_WARNING when it successfully returns data, but the data may be malformed:

<?php

echo file_get_contents(__FILE__), PHP_EOL, "====================", PHP_EOL;

var_dump(unserialize('i:111111111111111111111111111111111;'));

====================

Warning: unserialize(): Numerical result out of range in /home/timwolla/Schreibtisch/php-src/test.php on line 5
int(9223372036854775807)

I believe this is similar: unserialize is able to make sense of the input, but not all of the input. Thus the operation might be “lossy”. E_WARNING seems fitting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. The diff looked fishy to me, but I could not pinpoint what was wrong with it. You were right in that it breaks nested unserialize() calls (though only for Serializable).

That looks right - I wasn't familiar with the purpose of BG(serialize_lock). I took another look at the implementation and it's deliberately incremented before calling __wakeup, __sleep, __serialize, __unserialize, autoloading, and unserialize_callback_func from the optional ini setting (e.g. everything except for the legacy Serializable, I'd believe)

Looking into it just to see if any additional tests would be needed for this functionality in php 9.0 - the existing tests should be fine

So it'd probably be safe to remove serialize_lock in php 9.0 if the Serializable functionality is removed for internal classes, which the rfc mentioned we planned to do
https://wiki.php.net/rfc/phase_out_serializable

In PHP 9.0 the Serializable interface will be removed and unserialize() will reject payloads using the C serialization format. Code needing to support both PHP < 7.4 and PHP >= 9.0 may polyfill the Serializable interface, though it will have no effect on serialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is similar: unserialize is able to make sense of the input, but not all of the input. Thus the operation might be “lossy”. E_WARNING seems fitting.

Oh, I missed that this is still returning the data (I was distracted by the bug previously failing to set the return_value). In that case, I'd agree with making it an E_WARNING

@TysonAndre
Copy link
Contributor

TysonAndre commented Oct 7, 2022

For reference, I don't see any issues reported for igbinary/igbinary#223 and igbinary/igbinary#64 when I discovered/fixed the same bug in https://www.php.net/igbinary_unserialize , which would have also had applications using it

(I didn't notice that PHP's unserialize() had the same bug)

EDIT: There was one question for "received more data to unserialize than expected" total, but that was caused by calling fopen with 'rb' (binary mode) and non-utf8 data getting corrupted - https://github.com/igbinary/igbinary/issues/355

Copy link
Contributor

@TysonAndre TysonAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks correct and warning about this makes sense to me.

This would also let users know if the data had been corrupted silently (only in a tiny number of edge cases) or was manually being built outside of serialize() in an incorrect way
(much, much less likely in php's unserialize() compared to the bug in igbinary_unserialize() I mentioned fixing earlier. Similar issues are possible but far less likely since the serialize() text format means that valid utf-8 strings in serialized data will result in utf-8 data serialize() output coincidentally being valid utf-8 for most but not all classes (excluding user-defined or internal classes that serialize to binary data, e.g. along the lines of pack()/unpack()), as well as the redundancy of the text format making most forms of corruption typically result in missing ;/} end markers or having the wrong length)

} else {
if ((char*)p < buf + buf_len) {
if (!EG(exception)) {
php_error_docref(NULL, E_WARNING, "Extra data starting at offset " ZEND_LONG_FMT " of %zd bytes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a note of this since I forgot how this works, and not many people are familiar with how the unserializer works:

If this were to start throwing an exception instead in the next php major version, this would call zval_ptr_dtor to free the allocated values instead of falling through, then RETVAL_FALSE, like the above branch does right now.

The implementation of zval_ptr_dtor calls gc_check_possible_root(), so if that's done, we'll properly collect reference cycles when the gc runs if this throws.

@TimWolla
Copy link
Member Author

TimWolla commented Oct 8, 2022

Thank you for the review. As this likely (?) requires an RFC, I'll create a separate one, once https://wiki.php.net/rfc/improve_unserialize_error_handling was voted on, so that it's clear how unserialize will handle errors going forward before adding new error cases.

@TimWolla TimWolla force-pushed the unserialize-extra-data branch from 1251416 to d22e247 Compare January 12, 2023 20:08
@TimWolla TimWolla force-pushed the unserialize-extra-data branch from d22e247 to 2b7489b Compare March 19, 2023 21:16
@TimWolla
Copy link
Member Author

The RFC was accepted.

@TimWolla TimWolla requested review from Girgias and iluuu1994 April 27, 2023 07:35
@TimWolla TimWolla changed the title Unserialize: Warn if extra data is appended to the serialized string RFC: Make unserialize() emit a warning for trailing bytes Apr 27, 2023
@TimWolla TimWolla added the RFC label Apr 27, 2023
Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although @TysonAndre note looks interesting and maybe should be put somewhere else with more visibility

@TimWolla TimWolla merged commit bf727cf into php:master May 1, 2023
@TimWolla TimWolla deleted the unserialize-extra-data branch May 1, 2023 17:06
@TimWolla
Copy link
Member Author

TimWolla commented May 1, 2023

Now merged, thank you for the reviews.


maybe should be put somewhere else with more visibility

@Girgias Please do so. I don't know where such a place would be 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants