-
Notifications
You must be signed in to change notification settings - Fork 7.8k
RFC: Make unserialize() emit a warning for trailing bytes #9630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ext/standard/var.c
Outdated
@@ -1406,6 +1406,11 @@ PHPAPI void php_unserialize_with_options(zval *return_value, const char *buf, co | |||
zval_ptr_dtor(return_value); | |||
} | |||
RETVAL_FALSE; | |||
} else if ((char*)p < buf + buf_len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert this to an else {
then a standalone if and always execute the level+refcounted checks that were executed before? I think we still want to copy retval to return_value right now when unserialize triggers another call to unserialize (e.g. class implementation of ::__serialize
) calling \unserialize()
with the serialization of an array with extra data), and dump the unserialized result
Add a test that this works properly with that - I expect it's broken due to the lack of ZVAL_COPY(return_value, retval)
if (BG(unserialize).level > 1) {
retval = var_tmp_var(&var_hash);
} else {
retval = return_value;
}
} else {
if (BG(unserialize).level > 1) {
ZVAL_COPY(return_value, retval);
} else if (Z_REFCOUNTED_P(return_value)) {
zend_refcounted *ref = Z_COUNTED_P(return_value);
gc_check_possible_root(ref);
}
} /* End the new else */
Are you planning to remove this in a subsequent PHP release? E_DEPRECATED is another option, though it might be better to wait until we know what types of setups are effected (e.g. is anything storing data with trailing null bytes, are any extensions calling it with a length off by one, etc)
EDIT: I hadn't caught up on https://externals.io/message/118566 - To me, E_WARNING makes some sense for existing unserialize() failures because it warns that unserialize is not returning data (though it'll start throwing later), but for this, E_DEPRECATED might make sense for something that works in 8.x and older but will stop working in 9
- I'd went with E_WARNING in igbinary partly because I didn't know how many users would be affected, but it seems like virtually none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert this to an else { then a standalone if and always execute the level+refcounted checks that were executed before?
Indeed. The diff looked fishy to me, but I could not pinpoint what was wrong with it. You were right in that it breaks nested unserialize()
calls (though only for Serializable
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to remove this in a subsequent PHP release?
I'm not currently planning anything specifically. I came across this, while working with the code and I wasted to keep track of it, so I created a PR.
To me, E_WARNING makes some sense for existing unserialize() failures because it warns that unserialize is not returning data (though it'll start throwing later), but for this, E_DEPRECATED might make sense for something that works in 8.x and older but will stop working in 9
unserialize()
will also emit E_WARNING when it successfully returns data, but the data may be malformed:
<?php
echo file_get_contents(__FILE__), PHP_EOL, "====================", PHP_EOL;
var_dump(unserialize('i:111111111111111111111111111111111;'));
====================
Warning: unserialize(): Numerical result out of range in /home/timwolla/Schreibtisch/php-src/test.php on line 5
int(9223372036854775807)
I believe this is similar: unserialize is able to make sense of the input, but not all of the input. Thus the operation might be “lossy”. E_WARNING seems fitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. The diff looked fishy to me, but I could not pinpoint what was wrong with it. You were right in that it breaks nested unserialize() calls (though only for Serializable).
That looks right - I wasn't familiar with the purpose of BG(serialize_lock)
. I took another look at the implementation and it's deliberately incremented before calling __wakeup
, __sleep
, __serialize
, __unserialize
, autoloading, and unserialize_callback_func from the optional ini setting (e.g. everything except for the legacy Serializable, I'd believe)
Looking into it just to see if any additional tests would be needed for this functionality in php 9.0 - the existing tests should be fine
So it'd probably be safe to remove serialize_lock
in php 9.0 if the Serializable functionality is removed for internal classes, which the rfc mentioned we planned to do
https://wiki.php.net/rfc/phase_out_serializable
In PHP 9.0 the Serializable interface will be removed and unserialize() will reject payloads using the C serialization format. Code needing to support both PHP < 7.4 and PHP >= 9.0 may polyfill the Serializable interface, though it will have no effect on serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is similar: unserialize is able to make sense of the input, but not all of the input. Thus the operation might be “lossy”. E_WARNING seems fitting.
Oh, I missed that this is still returning the data (I was distracted by the bug previously failing to set the return_value). In that case, I'd agree with making it an E_WARNING
For reference, I don't see any issues reported for igbinary/igbinary#223 and igbinary/igbinary#64 when I discovered/fixed the same bug in https://www.php.net/igbinary_unserialize , which would have also had applications using it (I didn't notice that PHP's unserialize() had the same bug) EDIT: There was one question for "received more data to unserialize than expected" total, but that was caused by calling fopen with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks correct and warning about this makes sense to me.
This would also let users know if the data had been corrupted silently (only in a tiny number of edge cases) or was manually being built outside of serialize() in an incorrect way
(much, much less likely in php's unserialize() compared to the bug in igbinary_unserialize() I mentioned fixing earlier. Similar issues are possible but far less likely since the serialize() text format means that valid utf-8 strings in serialized data will result in utf-8 data serialize() output coincidentally being valid utf-8 for most but not all classes (excluding user-defined or internal classes that serialize to binary data, e.g. along the lines of pack()/unpack()), as well as the redundancy of the text format making most forms of corruption typically result in missing ;
/}
end markers or having the wrong length)
} else { | ||
if ((char*)p < buf + buf_len) { | ||
if (!EG(exception)) { | ||
php_error_docref(NULL, E_WARNING, "Extra data starting at offset " ZEND_LONG_FMT " of %zd bytes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a note of this since I forgot how this works, and not many people are familiar with how the unserializer works:
If this were to start throwing an exception instead in the next php major version, this would call zval_ptr_dtor to free the allocated values instead of falling through, then RETVAL_FALSE, like the above branch does right now.
The implementation of zval_ptr_dtor
calls gc_check_possible_root()
, so if that's done, we'll properly collect reference cycles when the gc runs if this throws.
Thank you for the review. As this likely (?) requires an RFC, I'll create a separate one, once https://wiki.php.net/rfc/improve_unserialize_error_handling was voted on, so that it's clear how unserialize will handle errors going forward before adding new error cases. |
1251416
to
d22e247
Compare
d22e247
to
2b7489b
Compare
The RFC was accepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although @TysonAndre note looks interesting and maybe should be put somewhere else with more visibility
Now merged, thank you for the reviews.
@Girgias Please do so. I don't know where such a place would be 😃 |
RFC: https://wiki.php.net/rfc/unserialize_warn_on_trailing_data
Found while working on #9425 and #9629