-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Implement frameless internal function calls #12461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea and the implementation is mostly good.
I don't like the compile time restrictions. I think it shouldn't be a big problem to get rid of them.
I would also recommend to add frameless handlers for trivial and offten used functions: trim(), ord(), chr(), strtolower() functions.
60101e6
to
bb24c28
Compare
As requested by Dmitry in GH-12461 for easier reviewing.
bb24c28
to
9708901
Compare
6db666c
to
4dfd4df
Compare
75c9f74
to
fc6452a
Compare
Do these frameless functions influence how stack limit works (e.g. max callstack size exceeded)? |
@staabm This doesn't significantly affect stack limits. Recursion with VM reentry is subject to Cs stack overflow. Userspace recursion doesn't have a stack limit, as much as a memory limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see major problems.
I would prefer some better naming against "frameless" and "flf".
JIT support shouldn't be a big problem. See comments.
In case of observer you may disable compilation to ZEND_FRAMLESS_ICALL_*
.
dfc04c0
to
76ac618
Compare
I wonder whether the reverse way should be done for RCN marking: instead of marking what args may increase the refcount, mark what args will never increase the refcount and assume RCN otherwise. That's safer in case the implementer forgets. OTOH it's more boilerplate. Just something to think about. |
@nielsdos I agree. I will change this so that omitting the flags sets it to |
Huh, it worked yesterday. It complains about the distro version being 22.04.3, but that is the same version as yesterday... |
@nielsdos It gave back a 403 just 10 minutes ago. I suppose they have (or had) server issues. |
@@ -979,20 +971,41 @@ ZEND_FUNCTION(property_exists) | |||
} | |||
RETURN_FALSE; | |||
} | |||
|
|||
/* {{{ Checks if the object or class has a property */ | |||
ZEND_FUNCTION(property_exists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce repetition, is it realistic to write a macro like:
#define ZEND_FUNCTION_CALL_FRAMELESS(name, minArgs, maxArgs) \
ZEND_FUNCTION(name) { \
uint32_t args = ZEND_NUM_ARGS(); \
if (args > maxArgs || args < minArgs) { /* handle arg count mismatch */ } \
switch (args) { \
/* here, do some macro magic to only compile the cases between min and maxargs, to avoid undefined function references */ \
case 0: ZEND_FRAMELESS_FUNCTION_NAME(name, 0)(return_value); break; \
case 1: ZEND_FRAMELESS_FUNCTION_NAME(name, 1)(return_value, EX_VAR(0)); break; \
case 2: ZEND_FRAMELESS_FUNCTION_NAME(name, 2)(return_value, EX_VAR(0), EX_VAR(1)); break; \
case 3: ZEND_FRAMELESS_FUNCTION_NAME(name, 3)(return_value, EX_VAR(0), EX_VAR(1), EX_VAR(2)); break; \
} \
}
Making it more effortless and straightforward. Also probably avoids compiling the body of the function twice (not requiring an extra inline function too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the review comments.
It looks like my idea with cache slot sharing doesn't make sense. In most cases after I like the idea of the patch and don't see technical problems, but unfortunately I don't see any visible speed improvement. Do you see? Symfony demo looks even a bit slower for me. This may be caused by unrelated code locality changes, but also because of the hot code size increase (PHP becomes more than 64K larger). I tried to inline I noticed, the patch somehow enables more Also JMP_FRAMELESS switching may lead to code explosion. <?php
namespace T;
function foo($a, $b) {
return strpos(trim($a . "foo"), $b);
} php -d opcache.opt_debug_level=0x20000 test.php
So, I'm not sure if this should be merged or not. |
No worries.
I don't know for sure, but it might be related to the fact that
In terms of speed this implements all the related ideas I had. I saw better results without I saw a big improvement for frameless function calls in tight, hot loops. This might be good for some algorithms, but is admittedly more synthetic. I will try to give some concrete numbers here too.
Oof, I agree. Nesting frameless calls is bad. I haven't considered that. @dstogov How do you measure real performance? I usually use |
It may make sense to investigate this and enable optimization for ven more cases.
yes. I think the improvement may be achieved through reducing the code size.
yes. please do.
Nothing is wrong. This is a rare case, and I'm not sure if implementation may be improved to generate code for N2 cals instead of NN.
For now I run just a single thread |
Disabling TurboBoost helps. Conveniently, this doesn't require a restart.
I also used |
Ok, here are the results.
From this benchmark, it does seem that the version without the shared cache slot, and without specializations offers the best trade-offs. The improvement isn't huge but it's measurable. Most significantly, it seems the code is faster in all variants, including with non-fqn function calls (i.e. JMP_FRAMELESS). bench.php was performed with more iterations for higher accuracy. As for hot loops I tested this by running sebastianbergmann/diff on a 5 000 line file, but without sebastianbergmann/diff#118, and by implementing a specialized version of
As expected, the difference is bigger when frameless calls may be used in hot loops. Just to clarify, @dstogov Do you think this is enough to justify the added complexity? |
33e41e7
to
4e73214
Compare
I don't have a string opinion, but I don't object. I see - this improves performance! @bwoebi @derickr this may make troubles to debuggers/profiles and extending this with support for observer API may remove all the gain. Will you need any special support? |
It doesn't seem that this patch introduces any breaking changes to Xdebug. At least, all the tests pass (although ABI changed, so I had segfaults before I recompiled Xdebug). |
Let land this. |
4e73214
to
b183904
Compare
b183904
to
8a9d933
Compare
Opcodes grow exponentially for nested calls.
go forward |
@dstogov Great! Thank you for all your reviews! |
This PR adds struct forward declarations, but a supermajority of voters said that it should not be "allowed to forward-declare structs/unions/typedefs": https://wiki.php.net/rfc/include_cleanup |
Secondary votes only matter when the primary vote is accepted, otherwise they take no effect. |
That is formally correct, but it ignores the will of the PHP community, even if it is not formally binding. |
I suppose you refer to this part of the contributing guidelines?
You're right that it doesn't list anything other than tests, fixes and RFCs. However, "welcomes" doesn't mean "allows only". |
That, and: "New features require an RFC" (README.md). This PR clearly implements a new feature.
If you look at just CONTRIBUTING.md, you can argue that, because the wording there is indeed weak; but it was argued differently when it was about my code submissions (which, unlike this PR, did not implement a new feature; it did not even add any any actual code; thus the stronger wording in README.md did not even apply to me). |
The wording in README seems to be inconsistent with the actual applied practices and should be corrected. An RFC is required when it's a non-self-contained feature or when there is opposition against the change. Although I admit that is subjective. E.g. looking at the mailing list you can every so often see an email asking if anyone opposes a feature, usually followed by two weeks. When someone raises an issue then they definitely have to go through the RFC process even if it's a small self-contained feature. The most recent source talking about RFC-requiring vs non-RFC-requiring features is this: https://wiki.php.net/rfc/release_cycle_update |
But the (strong) wording in README is consistent with the (weak) wording in CONTRIBUTING. If README needs corrections, then CONTRIBUTING needs corrections as well, which is something I also tried (to allow PRs exactly like this one without an RFC!), but failed. My real point here (before you dragged me into lawyering) was: when I submitted struct forward declarations, the whole idea was rejected, there was a supermajority vote against it, and my already-merged commits were even reverted (by one guy, without a vote). Now this PR does essentially the same, and surprisingly the exact same guy approves it. That is not just subjective (which would be okay), but also inconsistent and arbitrary. We can agree or disagree on coding styles, that's okay, that's personal taste. But if, after a long discussion, a certain coding style is ultimately rejected (and even reverted), it should be rejected for everybody. It was not rejected here, that's why I complained. |
I agree both places need their wordings adjusted.
I think this happened when I first started contributing, or at least before I started following discussions closely, so I don't know/remember all the details here. I can only say that the secondary votes of your RFC specified how to implement the include cleanup, and as part of that struct forward declarations were rejected. However, secondary votes only matter if primary votes are accepted. Note that even if the primary vote was accepted, my understanding is that it would only apply to include cleanup, so forward declarations would've only been disallowed for include cleanup in that case.
I can't comment on this part though.
The include cleanup commits were reverted because you were forced to go through the RFC process after some developers didn't like it. Although the RFC process isn't ideal for these cases because it's a purely technical change instead of a user-facing change. That said, coding style should be enforced consistently yes. |
@@ -38,6 +38,7 @@ struct _zend_call_info { | |||
bool send_unpack; /* Parameters passed by SEND_UNPACK or SEND_ARRAY */ | |||
bool named_args; /* Function has named arguments */ | |||
bool is_prototype; /* An overridden child method may be called */ | |||
bool is_frameless; /* A frameless function sends arguments through operands */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really mean operands here and not opcodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do mean operands.
Coercion. Coercion happens in-place. However, since args to frameless functions aren't copied coercing them in-place can escape the function boundary (for VAR|CONST). We should either copy these values in the handler for these op types, or handle the coercion+freeing in the functions.Edit: This was only relevant to string for the current frameless handlers. It was solved by creating a temporary zval that may be used if the parameter is not a string. After the function, the zval must be cleaned up if used, which unfortunately requires manual handling in the frameless handler.We probably need observer supportFrameless calls are disabled when there's an observer extension for now