-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Request #66024 - mb_chr() and mb_ord() #1100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…_illegal_substchar)
I would like to merge this to master. Any comments on this? |
I merged. Added a little optimization for encoding checks and white space fix. Test cases are added. Thank you for PR!! |
@yohgaki Please squash commits before merging. |
@nikic Sorry. I was blindly following qa.php.net merge instruction :( |
…grekas) This PR was merged into the 1.2-dev branch. Discussion ---------- Polyfills for mb_chr(), mb_ord() and mb_scrub() As implemented in upcoming php 7.2, see: - php/php-src#1099 - php/php-src#1100 Commits ------- 624adfc Polyfills for mb_chr(), mb_ord() and mb_scrub()
…grekas) This PR was merged into the 1.2-dev branch. Discussion ---------- Polyfills for mb_chr(), mb_ord() and mb_scrub() As implemented in upcoming php 7.2, see: - php/php-src#1099 - php/php-src#1100 Commits ------- 624adfc Polyfills for mb_chr(), mb_ord() and mb_scrub()
I am concerned about the semantics of the functions as implemented. If I understand it correctly, the behavior is heavily dependent on the used encoding:
This seems like a very confusing API to me. It should do one or the other. Either always take a Unicode codepoint and encode it, or always give you the big-endian binary encoding. But not either one depending on the encoding. (Of course the first behavior is much more useful, because that's the part that requires knowledge of codepoint mappings in different encodings. The latter is just chr() called in a loop.) It looks like I'm not the only one who is confused by this, because the symfony/polyfill implementation referenced above (symfony/polyfill-mbstring@53ad9fa) implements this function by interpreting the input always as a Unicode codepoint. If we can reach a consensus on this (quickly) that would be great. Otherwise I will motion for a revert of these functions in PHP 7.2 and go through the RFC process for 7.3, so the behavior can be discussed in more detail. |
In light of #1094 (the substitute_character is always given as a Unicode codepoint regardless of encoding -- that is we have a precedent on how we handle codepoint values in mbstring), I have switched the mb_chr/mb_ord implementation to always accept/return Unicode codepoints in 41e9ba6. While at it, I've also changed the return value for an invalid codepoint in mb_chr() to false instead of the substitution character in e53162a. |
This pull request add mb_chr and mb_ord.