You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the previous commit, the branch in mb_strlen which implements the
function using the mblen_table (when one is available) was removed.
This made mb_strlen faster for just about every legacy text encoding
which had an mblen_table... except for CP936, which became much slower.
This indicated that our decoding filter for CP936 was slow. I checked
and found iterating over the PUA table was a major bottleneck. After
optimizing that bottleneck out, benchmarks for text encoding conversion
speed were as follows:
CP936, short - to UTF-8 - faster by 10.44% (0.0003 vs 0.0003)
CP936, short - to UTF-16BE - faster by 11.45% (0.0003 vs 0.0003)
CP936, medium - to UTF-8 - faster by 139.09% (0.0012 vs 0.0005)
CP936, medium - to UTF-16BE - faster by 140.34% (0.0013 vs 0.0005)
CP936, long - to UTF-16BE - faster by 215.88% (0.0538 vs 0.0170)
CP936, long - to UTF-8 - faster by 232.41% (0.0528 vs 0.0159)
This does not fully express how much faster the CP936 decoder is now,
since these conversion benchmarks are not only measuring the speed of
decoding CP936, but then also re-encoding the codepoints as UTF-8 or
UTF-16.
For functions like mb_strlen, which just need to decode but not
re-encode the text, the gain in performance is much larger.
0 commit comments