Skip to content

Fix GH-10627: mb_convert_encoding crashes PHP on Windows #10628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nielsdos
Copy link
Member

Fixes GH-10627

The php_mb_convert_encoding() function can return NULL on error, but this case was not handled, which led to a NULL pointer dereference and hence a crash.

Fixes phpGH-10627

The php_mb_convert_encoding() function can return NULL on error, but
this case was not handled, which led to a NULL pointer dereference and
hence a crash.
var_dump(mb_convert_encoding($data, 'UTF-8', 'auto'));
$data = [$str => 'abc', 'abc' => 'def'];
var_dump(mb_convert_encoding($data, 'UTF-8', 'auto'));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reproduced --enable-debug on WSL on Ubuntu and it is seems C NULL problem. But 3v4l looks works fine and result is wrong because maybe without --enable-debug option.
@alexdowad what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That is interesting. I'm not sure of the reason right now.

This bug occurs if 'auto' is passed as the desired encoding, and mbstring is not able to detect the encoding of the string.

I might need to check under which circumstances that can happen. In any case, this bug fix is definitely needed.

@alexdowad
Copy link
Contributor

@nielsdos This looks good. Thanks very much.

This patch will definitely not apply cleanly on master. You will need to fix up merge conflicts when merging down into master.

It will be appreciated if, when fixing the merge conflicts, you make sure that all calls to php_mb_convert_encoding are safe.

@Girgias Girgias closed this in ed0c0df Feb 20, 2023
@Girgias
Copy link
Member

Girgias commented Feb 20, 2023

@alexdowad Can you check that the behaviour on master makes sense? As mbstring seems to be able to decode this now without any issues and not sure if this is meant to be the right behaviour or not.

@alexdowad
Copy link
Contributor

Dear @Girgias, I did look at the code on master and definitely there are a couple of places where the value of php_mb_convert_encoding is used without a NULL check.

If @nielsdos wants to merge into master using an --ours merge so that nothing is changed on master, I can submit another PR to fix the same bug on master.

@nielsdos
Copy link
Member Author

@alexdowad Thank you for the review.
Just to be clear (because I feel like there's some confusion here): I don't have merge access, so I can't do merge-fixups.

@alexdowad
Copy link
Contributor

Ah, sorry, I see. You already merged down into master.

First thing I notice is that @MaxKellermann's recent fix was wiped out during the merge.

@nielsdos
Copy link
Member Author

I'm not sure who the "you" refers to. Just to be clear: I did not merge it. 😅

@alexdowad
Copy link
Contributor

"You" is @Girgias in this case.

@Girgias
Copy link
Member

Girgias commented Feb 20, 2023

Ah, sorry, I see. You already merged down into master.

First thing I notice is that @MaxKellermann's recent fix was wiped out during the merge.

Oh FFS, I didn't triple check everything as I was already working through the merge conflict with that...

@alexdowad
Copy link
Contributor

@Girgias I think the test case also doesn't actually exercise the code which it is intended to exercise on master.

The NULL checks which you added are needed if 1) mb_convert_encoding is passed an array as input and 2) it contains strings which are not valid in any of the candidate encodings. (Passing 'auto' as a candidate encoding is equivalent to including the entire list of default text encodings for the selected mb_language.)

Hmm... I really wonder why that test case 'tripped' the bug even on PHP 8.1... there might be something else going on there which I should look into. It doesn't look to me right now like it should. But definitely the bug was a real bug.

To exercise the bug fix, we need to provide strings which are not valid in any of the provided text encodings (and there needs to be more than one).

Using a string like "\x80" and then UTF-8, UTF-16 as possible encodings would do the trick.

@alexdowad
Copy link
Contributor

As for behavior of the fix on master...

What this means is that if someone passes an array to mb_convert_encoding, with multiple candidate encodings, and any of the strings inside the array is invalid in all of them, then mb_convert_encoding will just leave those strings alone and won't touch them.

This seems like a reasonable choice to me, but we should try to get it documented in the PHP manual at some point.

@Girgias
Copy link
Member

Girgias commented Feb 20, 2023

You're the encoding and MBstring expert here, so I'll let you deep dive to figure what's going on :D

I can always review whatever PR you come up with.

RE: Docs: I think the mbstring docs probably need an overhaul and documenting those quirks makes sense, although somewhat suboptimal.

@alexdowad
Copy link
Contributor

@Girgias Fair enough, I'll submit some improvements soon.

@alexdowad
Copy link
Contributor

Well, I was wrong about @MaxKellerman's fix getting wiped out by the merge. It's just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants