Min/max optimization for int/floats #11194

Girgias · 2023-05-06T10:21:51Z

First commit is just a cleanup commit to fix indentation and use uint32_t instead of int for variadic arguments and the size of an array.

The second commit actually performs the optimization, not sure if this is the best approach however.

nielsdos

Looks good to me. Just a small suggested change to improve performance, which seems to help even if doubles and longs are not mixed.

Benchmarks (using 1/10th of the iterations from the script in the original issue, because I'm on a slow laptop).

Baseline
  Time (mean ± σ):     783.6 ms ±   9.3 ms    [User: 781.5 ms, System: 2.3 ms]
  Range (min … max):   771.2 ms … 796.5 ms    10 runs

Your patch:
  Time (mean ± σ):     676.8 ms ±  12.8 ms    [User: 674.4 ms, System: 1.9 ms]
  Range (min … max):   665.0 ms … 699.1 ms    10 runs

Your patch + my suggested change
  Time (mean ± σ):     659.9 ms ±   6.5 ms    [User: 655.9 ms, System: 3.3 ms]
  Range (min … max):   651.5 ms … 669.4 ms    10 runs

My generic zend_compare improvement
  Time (mean ± σ):     716.6 ms ±   7.1 ms    [User: 714.0 ms, System: 2.1 ms]
  Range (min … max):   706.8 ms … 727.3 ms    10 runs

ext/standard/array.c

nielsdos · 2023-05-06T20:17:53Z

Changes since my review look good.

mvorisek · 2023-05-07T06:36:59Z

ext/standard/array.c

+					}
+				} else if (Z_TYPE(args[i]) == IS_DOUBLE) {
+					min_dval = (double) min_lval;
+					goto double_compare;


Is this really safe? As currently written, when a double is observed, all comparions are done using double, which might compare different longs as the same value: https://3v4l.org/EgniH/rfc#vgit.master_jit

The original issue is primary because of tracing overhead, IMHO microoptimization for <10% is not much helpful vs. the much worse code.

@mvorisek Note that the zend_compare function (which was previously always used) also casts to a double. I created a test case for min: https://3v4l.org/ZsosL and this behaves the same with this patch and without this patch. It is indeed true that there is a loss of precision when comparing very big doubles, but this is already the case now. There is afaik no change in behaviour.

Or did you test this patch and found an issue? If so, please provide the reproducer.

The original issue is primary because of tracing overhead, IMHO microoptimization for <10% is not much helpful vs. the much worse code.

It turns out that this actually improves the diff real-world workload from the original issue report a lot. The benchmarks I did were only very isolated but it's important to look at the real-world workload. Also there are already a couple of functions which have a fast-path for longs & doubles iirc.

https://github.com/php/php-src/blob/24771fb08b/ext/standard/array.c#L1267 will produce a bad result when there are doubles between large integers, I did not test it, but originally there was no jump to double cmp only

actually improves the diff real-world workload

the issue (in the diff lib) will be still present, at the main slowdown is due tracing overhead, not due slow php-src

https://github.com/php/php-src/blob/24771fb08b/ext/standard/array.c#L1267 will produce a bad result when there are doubles between large integers, I did not tested it, but originally there was no jump to double cmp only

I see what you mean. Here's a test with a behaviour difference:

<?php var_dump(PHP_INT_MAX*3); var_dump(min(PHP_INT_MAX*2, PHP_INT_MAX, PHP_INT_MAX-1)); // int(9223372036854775806) previously // int(9223372036854775807) with this patch

This can probably be worked around. I would prefer a solution that doesn't restart the loop to prevent a performance impact. I also checked with my generic zend_compare improvement and it doesn't suffer from that problem, so we still have that as a possibility.

the issue (in the diff lib) will be still present, at the main slowdown is due tracing overhead, not due slow php-src

Yes, but I didn't say it fully got rid of all the overhead, I said this is a big improvement.

nielsdos

See #11194 (comment)

nielsdos · 2023-05-07T12:30:22Z

To fix the found issue you can apply this patch:

diff --git a/ext/standard/array.c b/ext/standard/array.c
index 136dfda34c..f7546bff34 100644
--- a/ext/standard/array.c
+++ b/ext/standard/array.c
@@ -1268,12 +1268,8 @@ PHP_FUNCTION(min)
 						min_dval = Z_DVAL(args[i]);
 						min = &args[i];
 					}
-				} else if (Z_TYPE(args[i]) == IS_LONG) {
-					if (min_dval > (double)Z_LVAL(args[i])) {
-						min_dval = (double)Z_LVAL(args[i]);
-						min = &args[i];
-					}
 				} else {
+					/* Cannot do a fast-path for long+double mix, because of precision loss for longs >= 2**52 */
 					goto generic_compare;
 				}
 			}
@@ -1355,12 +1351,8 @@ PHP_FUNCTION(max)
 						max_dval = Z_DVAL(args[i]);
 						max = &args[i];
 					}
-				} else if (Z_TYPE(args[i]) == IS_LONG) {
-					if (max_dval < (double)Z_LVAL(args[i])) {
-						max_dval = (double)Z_LVAL(args[i]);
-						max = &args[i];
-					}
 				} else {
+					/* Cannot do a fast-path for long+double mix, because of precision loss for longs >= 2**52 */
 					goto generic_compare;
 				}
 			}

I benchmarked it and found that there are no regressions w.r.t. before this PR for mixed longs&doubles, but also no improvements of course because we can't do the fast path to avoid the aforementioned issue. This will still allow us to have the benefit for long-only and double-only inputs (which are likely very common).

Only longs:
  './sapi/cli/php_patched test.php' ran
    1.18 ± 0.03 times faster than './sapi/cli/php test.php'

Only doubles:
  './sapi/cli/php_patched test.php' ran
    1.19 ± 0.12 times faster than './sapi/cli/php test.php'

Mixed long & double:
  './sapi/cli/php_patched test.php' ran
    1.02 ± 0.01 times faster than './sapi/cli/php test.php'

Girgias · 2023-05-07T13:56:13Z

Can't we just guard this by a && ( (zend_long)(double) lval == lval) ? as using such high integers seems unlikely. But I will add the test case in any regards

mvorisek · 2023-05-07T14:22:04Z

&& ( (zend_long)(double) lval == lval) seems super hacky - comparison between double <> long is supported natively by C, did you evaluated a performance with one loop with switch with 5 cases - each combination - with no casts at all + generic case?

nielsdos · 2023-05-07T15:19:38Z

&& ( (zend_long)(double) lval == lval) seems super hacky - comparison between double <> long is supported natively by C, did you evaluated a performance with one loop with switch with 5 cases - each combination - with no casts at all + generic case?

It looks weird, but casting twice is necessary, because otherwise C does a double-double comparison instead of checking if the long lost precision.

As for your suggestion about the switch cases: zend_compare currently is a bunch of switch cases. In my alternative zend_compare optimisation I grouped the most common ones together, so it's kinda like what you suggest. The performance was a bit worse than this PR.

Can't we just guard this by a && ( (zend_long)(double) lval == lval) ? as using such high integers seems unlikely. But I will add the test case in any regards

Maybe, I'm not sure what is the best solution. There's an extra cost in comparing and casting, so some benchmarking would need to be done I guess.

Girgias · 2023-05-07T17:05:23Z

&& ( (zend_long)(double) lval == lval) seems super hacky

This is standard procedure, and we do this to check that a PHP float (C double) is compatible with a PHP int (C zend_long).

nielsdos · 2023-05-07T17:30:10Z

FWIW I benched the current version of the PR and it's roughly the same performance as without the extra checks.

I do have one question though:

} else if (Z_TYPE(args[i]) == IS_DOUBLE && ((zend_long)(double) max_lval == max_lval)) {
	max_dval = (double) max_lval;
	goto double_compare;
}

I don't think you need the extra check here, because if the check fails, then zend_compare will be used which does casting internally anyway. (Same for the min case).
So I think we can remove the check here?

mvorisek · 2023-05-07T18:16:24Z

&& ( (zend_long)(double) lval == lval) seems super hacky

This is standard procedure, and we do this to check that a PHP float (C double) is compatible with a PHP int (C zend_long).

As php developer, I expect unified perfomance. AFAICT one loop impl. can have about the same perf, be simpler and handle all types more consistently. The main overhead is probably comming from zend_compare call + zend_compare more complex impl.

nielsdos · 2023-05-07T18:24:41Z

As php developer, I expect unified perfomance.

Well, it depends on what you want to unify. This patch brings the performance of $min=min($a,$b); closer to if ($a < $b) $min=$a;else $min=$b; . So in a way this does unify it. Isn't it a good thing that some functions get a fast path for the most common use cases?

AFAICT one loop impl. can have about the same perf, be simpler and handle all types more consistently.

This is not what the benchmarks are saying though. It of course also depends on how far we would like to take an optimisation, i.e. when does an the difference between two optimisation approaches not matter anymore.

The main overhead is probably comming from zend_compare call + zend_compare more complex impl.

Yep this is true. In fact I explored this here: nielsdos#6. Specialising min/max for float&int seems to get a little more performance than my approach, but ofc it depends on how far we want to go with the optimisations. That's not to say that a more general optimisation cannot be applied on top of this PR.

KapitanOczywisty · 2023-05-07T20:01:28Z

The main overhead is probably comming from zend_compare call + zend_compare more complex impl.

Yep this is true. In fact I explored this here: nielsdos#6. Specialising min/max for float&int seems to get a little more performance than my approach, but ofc it depends on how far we want to go with the optimisations. That's not to say that a more general optimisation cannot be applied on top of this PR.

Overhead is significant when min/max is called many, many times, this often means pattern similar to: while { $min = min($min, $element); }. As this is rather common use (e.g. in the original issue), maybe there could be a special opcode for min/max with exactly 2 arguments?

nielsdos · 2023-05-07T20:09:07Z

The main overhead is probably comming from zend_compare call + zend_compare more complex impl.

Yep this is true. In fact I explored this here: nielsdos#6. Specialising min/max for float&int seems to get a little more performance than my approach, but ofc it depends on how far we want to go with the optimisations. That's not to say that a more general optimisation cannot be applied on top of this PR.

Overhead is significant when min/max is called many, many times, this often means pattern similar to: while { $min = min($min, $element); }. As this is rather common use (e.g. in the original issue), maybe there could be a special opcode for min/max with exactly 2 arguments?

One example of a function where we have a dedicated opcode is strlen(), so it certainly is possible. But there's a caveat here: namespaces can override internal functions. For example:

<?php
namespace Foo;
function x() {
  return strlen("abc");
}

Relevant 3v4l: https://3v4l.org/RisbL/vld
As we can see, the optimizer cannot replace the function call with the opcode, because there might be a Foo\strlen() function. If we however replace strlen with \strlen, then the optimisation happens.
In any case, that doesn't really prevent us from implementing an opcode for min/max, I just wanted to point out that there's no automatic gain, unless the code prefixes the call with a backslash (or uses the global one with a use statement).

As for the opcodes: it would indeed help performance, but we must be careful adding new opcodes. Every opcode we add puts more pressure on the instruction cache for the VM. Although caches nowadays are growing bigger and bigger, on shared infrastructure (e.g. shared hosting) this will still be something to look out for.
There's also the question: if we add an opcode for min/max, what function is next? We can't just add an opcode for every small internal function. So some extra thought needs to be put into this if we would go this route.

KapitanOczywisty · 2023-05-07T20:21:33Z

But there's a caveat here: namespaces can override internal functions.

Sigh.. Although use function max or \max are also pretty common.

There's also the question: if we add an opcode for min/max, what function is next?

AFIK opcodes can be removed without affecting BC, so there is no problem to change mind in the future.

Edit: I hope this fallback to global namespace will be deprecated someday.

bwoebi · 2023-05-07T21:07:55Z

This discussion reminds me of the 7 year old PR #1679. Perhaps, with all the improvements done on the VM, the relative improvement of a change like described in that PR might be more important nowadays.

And - obviously - it will also fare well with such small ICALLs, where the most important overhead is setting up the frame and copying arguments. With further possibilities for eliding execute_data on internal leaf functions etc.

I don't really see opcodes as the solution, more that our function calls just have too much overhead.

I definitely approve of this optimization to min and max though, just saying that we pretty much reach the ceiling with this change what can be done for such small functions.

bukka · 2023-05-08T13:34:39Z

ext/standard/array.c

+						max_lval = Z_LVAL(args[i]);
+						max = &args[i];
+					}
+				} else if (Z_TYPE(args[i]) == IS_DOUBLE && (zend_dval_to_lval((double) max_lval) == max_lval)) {


Might be good to add some comment here as it took me a little bit to figure out what zend_dval_to_lval((double) max_lval) == max_lval is for. IIUC it's for long to double overflow check, right?

No it's to check that when an int converted to float it can be represented exactly as a float or it loses precision (so for integers higher than 52 bits IIRC)

bukka · 2023-05-08T13:34:46Z

ext/standard/tests/array/max_int_float_optimisation.phpt

+var_dump(max(
+    PHP_INT_MIN*2, PHP_INT_MIN, PHP_INT_MIN+1)
+);


NIT: Couldn't all of those be on a single line. Btw this one is a bit inconsistent formatting from the above ones as well...

kamil-tekiela · 2023-05-19T14:52:15Z

I'd like to add that while I enjoy performance improvements, this one is a very small improvement for a special case with very specific arguments. The added extra complexity is probably not really worth it. In hot code like the one mentioned in the original issue, developers can inline the condition and skip max() altogether. Additionally, the biggest performance issue happens when users pass an array instead of variadics.

time ./sapi/cli/php -r '$b = 1; $c = 2; for ($i = 0; $i < 430000000; ++$i) { $a = max([$b, $c]); }'

This PR doesn't improve the performance of this scenario. So unless we can find a way to improve the performance of these functions in general, I don't think it's worth the additional complexity for a couple of % speed improvements in rare circumstances.

nielsdos

I just have some silly nits. Other than that it looks good and does indeed improve the performance. The performance improvement isn't that big though, but might still be valuable.
I tried to break it but didn't find edge cases. Also works under UBSAN.

ext/standard/array.c

nielsdos · 2023-06-01T19:40:11Z

ext/standard/array.c

+						max = &args[i];
+					}
+				} else if (Z_TYPE(args[i]) == IS_DOUBLE && (zend_dval_to_lval((double) max_lval) == max_lval)) {
+					/* if max_level can be exactly represented as a float, go to float dedicated code */


You wrote max_level instead of max_lval. Also same float remark.

ext/standard/array.c

Co-authored-by: Niels Dossche <[email protected]>

staabm · 2023-06-02T09:37:59Z

Thanks to everyone involved. <3.

Girgias requested a review from nielsdos May 6, 2023 10:21

github-actions bot added the Extension: standard label May 6, 2023

Girgias linked an issue May 6, 2023 that may be closed by this pull request

improve max() performance #11192

Closed

nielsdos approved these changes May 6, 2023

View reviewed changes

ext/standard/array.c Outdated Show resolved Hide resolved

Girgias requested a review from bukka as a code owner May 6, 2023 12:45

mvorisek reviewed May 7, 2023

View reviewed changes

nielsdos requested changes May 7, 2023

View reviewed changes

Girgias force-pushed the min-max-optimization branch from 24771fb to fac58cd Compare May 7, 2023 14:16

bukka reviewed May 8, 2023

View reviewed changes

Girgias force-pushed the min-max-optimization branch from d534496 to e59383e Compare May 31, 2023 10:28

Girgias requested a review from nielsdos May 31, 2023 11:45

nielsdos approved these changes Jun 1, 2023

View reviewed changes

Girgias and others added 3 commits June 2, 2023 10:25

ext/standard/array.c: Optimize min/max functions for int/float

4ea2ccd

Use zend_compare

a08f5a3

Co-authored-by: Niels Dossche <[email protected]>

Cleanup + max()

fb609dc

Girgias added 5 commits June 2, 2023 10:25

Fix test + int losing precision with float fix

b7a11f8

Elide UB

1dff2be

Reformat tests

6edcc6b

[skip ci] add comments

2a8940d

[skip ci] Comments fix

af45e84

Girgias force-pushed the min-max-optimization branch from 6abdcdd to af45e84 Compare June 2, 2023 09:27

Girgias merged commit 1540245 into php:master Jun 2, 2023

Girgias deleted the min-max-optimization branch June 2, 2023 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Min/max optimization for int/floats #11194

Min/max optimization for int/floats #11194

Girgias commented May 6, 2023

nielsdos left a comment

nielsdos commented May 6, 2023

mvorisek May 7, 2023

nielsdos May 7, 2023

mvorisek May 7, 2023 •

edited

Loading

nielsdos May 7, 2023

nielsdos left a comment

nielsdos commented May 7, 2023

Girgias commented May 7, 2023

mvorisek commented May 7, 2023 •

edited

Loading

nielsdos commented May 7, 2023

Girgias commented May 7, 2023 •

edited

Loading

nielsdos commented May 7, 2023

mvorisek commented May 7, 2023

nielsdos commented May 7, 2023

KapitanOczywisty commented May 7, 2023

nielsdos commented May 7, 2023

KapitanOczywisty commented May 7, 2023 •

edited

Loading

bwoebi commented May 7, 2023 •

edited

Loading

bukka May 8, 2023

Girgias May 31, 2023

bukka May 8, 2023

kamil-tekiela commented May 19, 2023

nielsdos left a comment

nielsdos Jun 1, 2023

staabm commented Jun 2, 2023

Min/max optimization for int/floats #11194

Min/max optimization for int/floats #11194

Conversation

Girgias commented May 6, 2023

nielsdos left a comment

Choose a reason for hiding this comment

nielsdos commented May 6, 2023

mvorisek May 7, 2023

Choose a reason for hiding this comment

nielsdos May 7, 2023

Choose a reason for hiding this comment

mvorisek May 7, 2023 • edited Loading

Choose a reason for hiding this comment

nielsdos May 7, 2023

Choose a reason for hiding this comment

nielsdos left a comment

Choose a reason for hiding this comment

nielsdos commented May 7, 2023

Girgias commented May 7, 2023

mvorisek commented May 7, 2023 • edited Loading

nielsdos commented May 7, 2023

Girgias commented May 7, 2023 • edited Loading

nielsdos commented May 7, 2023

mvorisek commented May 7, 2023

nielsdos commented May 7, 2023

KapitanOczywisty commented May 7, 2023

nielsdos commented May 7, 2023

KapitanOczywisty commented May 7, 2023 • edited Loading

bwoebi commented May 7, 2023 • edited Loading

bukka May 8, 2023

Choose a reason for hiding this comment

Girgias May 31, 2023

Choose a reason for hiding this comment

bukka May 8, 2023

Choose a reason for hiding this comment

kamil-tekiela commented May 19, 2023

nielsdos left a comment

Choose a reason for hiding this comment

nielsdos Jun 1, 2023

Choose a reason for hiding this comment

staabm commented Jun 2, 2023

mvorisek May 7, 2023 •

edited

Loading

mvorisek commented May 7, 2023 •

edited

Loading

Girgias commented May 7, 2023 •

edited

Loading

KapitanOczywisty commented May 7, 2023 •

edited

Loading

bwoebi commented May 7, 2023 •

edited

Loading