-
Notifications
You must be signed in to change notification settings - Fork 7.8k
ctype_alnum 5 times slower in PHP 8.1 or greater #11997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Strangely, it was already way slower in 8.0.30 on my system. In 8.1.0 there is additional slowdown that seems to come from the indirect call when the code was refactored from macro into function: dc80ea7 If I apply the following patch, the ctype_isalnum becomes almost twice as fast as the regex solution. Probably because it avoid C library call overhead: diff --git a/ext/ctype/ctype.c b/ext/ctype/ctype.c
index 939959bc09..a0a09f0cf1 100644
--- a/ext/ctype/ctype.c
+++ b/ext/ctype/ctype.c
@@ -61,7 +61,7 @@ static PHP_MINFO_FUNCTION(ctype)
}
/* }}} */
-static void ctype_impl(
+static zend_always_inline void ctype_impl(
INTERNAL_FUNCTION_PARAMETERS, int (*iswhat)(int), bool allow_digits, bool allow_minus) {
zval *c;
@@ -99,10 +99,15 @@ static void ctype_impl(
}
}
+static zend_always_inline int php_isalnum(int c)
+{
+ return (c >= '0' && c <= '9') || ((unsigned int) c | 32) - 'a' < 26;
+}
+
/* {{{ Checks for alphanumeric character(s) */
PHP_FUNCTION(ctype_alnum)
{
- ctype_impl(INTERNAL_FUNCTION_PARAM_PASSTHRU, isalnum, 1, 0);
+ ctype_impl(INTERNAL_FUNCTION_PARAM_PASSTHRU, php_isalnum, 1, 0);
}
/* }}} */
Could even be further improved by moving the deprecated code section into a separate no-inline function such that the code cache pressure doesn't increase too much. We could make our own version for each of the iswhatever functions, but idk if we want this? Maybe some other people can voice there opinion here :-) EDIT: this may be undesirable because they depend on the locale, even for the isalnum case... |
Meh, I don't really mind. However, a large part of the complexity of those functions is related to needing to deal with integers. When this is removed and a proper string ZPP check can be used, it might make more sense to just inline everything instead of passing by a common generic implementation. |
According to callgrind, the most time is spent in the isalnum call and the if check. The overhead of the C library call is quite high. That being said, hand-rolling our own iswhatever functions would give the biggest speedup, but would also introduce complexity because we actually have to take into account the current C locale... @joanhey If you compiled PHP yourself, are you able to test what the performance difference is on your system when you add |
I maybe should go forward with actually deprecating |
@Girgias Probably we can just refer to the |
I don't think it's simple, but is probably worthwhile. Also |
Note that on musl's libc, these functions don't take into account locale. You can see the sources here: https://git.musl-libc.org/cgit/musl/plain/src/ctype/ though I've copied relevant pieces here: int isalnum(int c)
{
return isalpha(c) || isdigit(c);
}
int isalpha(int c)
{
return ((unsigned)c|32)-'a' < 26;
}
int isdigit(int c)
{
return (unsigned)c-'0' < 10;
} Of course, their locale support is... well I'll just quote them:
Just adding this as more background information when making a decision. |
LLVM's libc implementation also does not support locales here, which is expected there since they don't support locales yet. Honestly, I'm not certain locale support is even desirable for these newer libc implementations. The only system which is not totally bonkers broken is UTF-8, and ctypes are definitely not that, so... |
I took a look at this again. Sample patch: https://gist.github.com/nielsdos/0faacfefb4844be0cf107d775c5d0ac7 It's worth noting that glibc & musl ctype implementation comes in two variants: function-based and macro-based. e.g. we have an isalnum macro and an isalnum function. Currently, the function-based version is used because ctype function are passed as a function pointer to OP's example code is around ~0.02s faster on my system (glibc-based) after this patch. So it's still much slower than OP claims it was or in comparison to pcre. Mainly we got rid of some call overhead and can do the check inline. Thinking about it, the benchmark code that compares pcre with ctype isn't even fair because the pcre code doesn't take into account the locale... However, on Alpine there is indeed a much higher win and I get an enormous speedup, the time for ctype is about half the time for pcre. This is practically the same result as OP got for PHP 8.0, so I suspect OP is on Alpine or another musl-based system. On those systems the sample patch is more worthwhile. Any thoughts? |
So it's basically reverting dc80ea7, which would explain the difference in speed from 8.0 to 8.1. I don't really care, even if I don't really like macros, but this case has always been pretty readable, even in macro form. I wonder if it still doesn't make sense to change the behaviour to not depend on the C locale, as we have done with other functions recently (strtolower coming to mind). |
Currently, a common function is used where a function pointer gets passed to check the character class type. If we instead use a macro, the macro version of these character class type checkers can be used. While this gives only a minor speed-up for glibc-based systems, on Alpine this gives a multi-facor speed-up This is essentially a manual revert of dc80ea7. Full discussion in phpGH-11997.
Indeed.
Yeah I'm quite neutral too, but I guess it doesn't hurt, and it does get rid of a regression.
Good question. I don't know what the actual user expectation is of these functions. I think I'd expect them to behave like how they would in C (i.e. respecting the locale) because they borrowed the c header's name as a prefix. |
I don't know what the user expectation is either. Maybe something to add to php-task repo (as a function that depends on the locale, which might make sense to try and list everything that does to see what the scope of the impact would be about removing it) |
Description
The following code:
Resulted in this output:
A lot slower in 8.1 and 8.2
But I expected this output instead:
The same that in 8.0 or smaller.
Here start the performance degradation:

https://3v4l.org/fjSaZ
https://onlinephp.io/code/e3b9ddc370fdb0f00ff40447eadc24def0dd1abe
PD: I didn't test other ctype_* functions
PHP Version
8.1
Operating System
No response
The text was updated successfully, but these errors were encountered: