-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Min/max optimization for int/floats #11194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just a small suggested change to improve performance, which seems to help even if doubles and longs are not mixed.
Benchmarks (using 1/10th of the iterations from the script in the original issue, because I'm on a slow laptop).
Baseline
Time (mean ± σ): 783.6 ms ± 9.3 ms [User: 781.5 ms, System: 2.3 ms]
Range (min … max): 771.2 ms … 796.5 ms 10 runs
Your patch:
Time (mean ± σ): 676.8 ms ± 12.8 ms [User: 674.4 ms, System: 1.9 ms]
Range (min … max): 665.0 ms … 699.1 ms 10 runs
Your patch + my suggested change
Time (mean ± σ): 659.9 ms ± 6.5 ms [User: 655.9 ms, System: 3.3 ms]
Range (min … max): 651.5 ms … 669.4 ms 10 runs
My generic zend_compare improvement
Time (mean ± σ): 716.6 ms ± 7.1 ms [User: 714.0 ms, System: 2.1 ms]
Range (min … max): 706.8 ms … 727.3 ms 10 runs
Changes since my review look good. |
} | ||
} else if (Z_TYPE(args[i]) == IS_DOUBLE) { | ||
min_dval = (double) min_lval; | ||
goto double_compare; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really safe? As currently written, when a double is observed, all comparions are done using double, which might compare different longs as the same value: https://3v4l.org/EgniH/rfc#vgit.master_jit
The original issue is primary because of tracing overhead, IMHO microoptimization for <10% is not much helpful vs. the much worse code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvorisek Note that the zend_compare function (which was previously always used) also casts to a double. I created a test case for min: https://3v4l.org/ZsosL and this behaves the same with this patch and without this patch. It is indeed true that there is a loss of precision when comparing very big doubles, but this is already the case now. There is afaik no change in behaviour.
Or did you test this patch and found an issue? If so, please provide the reproducer.
The original issue is primary because of tracing overhead, IMHO microoptimization for <10% is not much helpful vs. the much worse code.
It turns out that this actually improves the diff real-world workload from the original issue report a lot. The benchmarks I did were only very isolated but it's important to look at the real-world workload. Also there are already a couple of functions which have a fast-path for longs & doubles iirc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/php/php-src/blob/24771fb08b/ext/standard/array.c#L1267 will produce a bad result when there are doubles between large integers, I did not test it, but originally there was no jump to double cmp only
actually improves the diff real-world workload
the issue (in the diff lib) will be still present, at the main slowdown is due tracing overhead, not due slow php-src
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/php/php-src/blob/24771fb08b/ext/standard/array.c#L1267 will produce a bad result when there are doubles between large integers, I did not tested it, but originally there was no jump to double cmp only
I see what you mean. Here's a test with a behaviour difference:
<?php
var_dump(PHP_INT_MAX*3);
var_dump(min(PHP_INT_MAX*2, PHP_INT_MAX, PHP_INT_MAX-1));
// int(9223372036854775806) previously
// int(9223372036854775807) with this patch
This can probably be worked around. I would prefer a solution that doesn't restart the loop to prevent a performance impact. I also checked with my generic zend_compare improvement and it doesn't suffer from that problem, so we still have that as a possibility.
the issue (in the diff lib) will be still present, at the main slowdown is due tracing overhead, not due slow php-src
Yes, but I didn't say it fully got rid of all the overhead, I said this is a big improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #11194 (comment)
To fix the found issue you can apply this patch: diff --git a/ext/standard/array.c b/ext/standard/array.c
index 136dfda34c..f7546bff34 100644
--- a/ext/standard/array.c
+++ b/ext/standard/array.c
@@ -1268,12 +1268,8 @@ PHP_FUNCTION(min)
min_dval = Z_DVAL(args[i]);
min = &args[i];
}
- } else if (Z_TYPE(args[i]) == IS_LONG) {
- if (min_dval > (double)Z_LVAL(args[i])) {
- min_dval = (double)Z_LVAL(args[i]);
- min = &args[i];
- }
} else {
+ /* Cannot do a fast-path for long+double mix, because of precision loss for longs >= 2**52 */
goto generic_compare;
}
}
@@ -1355,12 +1351,8 @@ PHP_FUNCTION(max)
max_dval = Z_DVAL(args[i]);
max = &args[i];
}
- } else if (Z_TYPE(args[i]) == IS_LONG) {
- if (max_dval < (double)Z_LVAL(args[i])) {
- max_dval = (double)Z_LVAL(args[i]);
- max = &args[i];
- }
} else {
+ /* Cannot do a fast-path for long+double mix, because of precision loss for longs >= 2**52 */
goto generic_compare;
}
}
I benchmarked it and found that there are no regressions w.r.t. before this PR for mixed longs&doubles, but also no improvements of course because we can't do the fast path to avoid the aforementioned issue. This will still allow us to have the benefit for long-only and double-only inputs (which are likely very common).
|
Can't we just guard this by a |
24771fb
to
fac58cd
Compare
|
It looks weird, but casting twice is necessary, because otherwise C does a double-double comparison instead of checking if the long lost precision. As for your suggestion about the switch cases: zend_compare currently is a bunch of switch cases. In my alternative zend_compare optimisation I grouped the most common ones together, so it's kinda like what you suggest. The performance was a bit worse than this PR.
Maybe, I'm not sure what is the best solution. There's an extra cost in comparing and casting, so some benchmarking would need to be done I guess. |
This is standard procedure, and we do this to check that a PHP |
FWIW I benched the current version of the PR and it's roughly the same performance as without the extra checks. I do have one question though: } else if (Z_TYPE(args[i]) == IS_DOUBLE && ((zend_long)(double) max_lval == max_lval)) {
max_dval = (double) max_lval;
goto double_compare;
} I don't think you need the extra check here, because if the check fails, then zend_compare will be used which does casting internally anyway. (Same for the min case). |
As php developer, I expect unified perfomance. AFAICT one loop impl. can have about the same perf, be simpler and handle all types more consistently. The main overhead is probably comming from |
Well, it depends on what you want to unify. This patch brings the performance of
This is not what the benchmarks are saying though. It of course also depends on how far we would like to take an optimisation, i.e. when does an the difference between two optimisation approaches not matter anymore.
Yep this is true. In fact I explored this here: nielsdos#6. Specialising min/max for float&int seems to get a little more performance than my approach, but ofc it depends on how far we want to go with the optimisations. That's not to say that a more general optimisation cannot be applied on top of this PR. |
Overhead is significant when |
One example of a function where we have a dedicated opcode is strlen(), so it certainly is possible. But there's a caveat here: namespaces can override internal functions. For example: <?php
namespace Foo;
function x() {
return strlen("abc");
} Relevant 3v4l: https://3v4l.org/RisbL/vld As for the opcodes: it would indeed help performance, but we must be careful adding new opcodes. Every opcode we add puts more pressure on the instruction cache for the VM. Although caches nowadays are growing bigger and bigger, on shared infrastructure (e.g. shared hosting) this will still be something to look out for. |
Sigh.. Although
AFIK opcodes can be removed without affecting BC, so there is no problem to change mind in the future. Edit: I hope this fallback to global namespace will be deprecated someday. |
This discussion reminds me of the 7 year old PR #1679. Perhaps, with all the improvements done on the VM, the relative improvement of a change like described in that PR might be more important nowadays. And - obviously - it will also fare well with such small ICALLs, where the most important overhead is setting up the frame and copying arguments. With further possibilities for eliding execute_data on internal leaf functions etc. I don't really see opcodes as the solution, more that our function calls just have too much overhead. I definitely approve of this optimization to min and max though, just saying that we pretty much reach the ceiling with this change what can be done for such small functions. |
max_lval = Z_LVAL(args[i]); | ||
max = &args[i]; | ||
} | ||
} else if (Z_TYPE(args[i]) == IS_DOUBLE && (zend_dval_to_lval((double) max_lval) == max_lval)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add some comment here as it took me a little bit to figure out what zend_dval_to_lval((double) max_lval) == max_lval
is for. IIUC it's for long to double overflow check, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's to check that when an int
converted to float
it can be represented exactly as a float
or it loses precision (so for integers higher than 52 bits IIRC)
var_dump(max( | ||
PHP_INT_MIN*2, PHP_INT_MIN, PHP_INT_MIN+1) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Couldn't all of those be on a single line. Btw this one is a bit inconsistent formatting from the above ones as well...
I'd like to add that while I enjoy performance improvements, this one is a very small improvement for a special case with very specific arguments. The added extra complexity is probably not really worth it. In hot code like the one mentioned in the original issue, developers can inline the condition and skip
This PR doesn't improve the performance of this scenario. So unless we can find a way to improve the performance of these functions in general, I don't think it's worth the additional complexity for a couple of % speed improvements in rare circumstances. |
d534496
to
e59383e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have some silly nits. Other than that it looks good and does indeed improve the performance. The performance improvement isn't that big though, but might still be valuable.
I tried to break it but didn't find edge cases. Also works under UBSAN.
ext/standard/array.c
Outdated
max = &args[i]; | ||
} | ||
} else if (Z_TYPE(args[i]) == IS_DOUBLE && (zend_dval_to_lval((double) max_lval) == max_lval)) { | ||
/* if max_level can be exactly represented as a float, go to float dedicated code */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wrote max_level instead of max_lval. Also same float remark.
Co-authored-by: Niels Dossche <[email protected]>
6abdcdd
to
af45e84
Compare
Thanks to everyone involved. <3. |
First commit is just a cleanup commit to fix indentation and use
uint32_t
instead ofint
for variadic arguments and the size of an array.The second commit actually performs the optimization, not sure if this is the best approach however.