Skip to content

Min/max optimization for int/floats #11194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 2, 2023
Merged

Conversation

Girgias
Copy link
Member

@Girgias Girgias commented May 6, 2023

First commit is just a cleanup commit to fix indentation and use uint32_t instead of int for variadic arguments and the size of an array.

The second commit actually performs the optimization, not sure if this is the best approach however.

@Girgias Girgias requested a review from nielsdos May 6, 2023 10:21
@Girgias Girgias linked an issue May 6, 2023 that may be closed by this pull request
Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just a small suggested change to improve performance, which seems to help even if doubles and longs are not mixed.

Benchmarks (using 1/10th of the iterations from the script in the original issue, because I'm on a slow laptop).

Baseline
  Time (mean ± σ):     783.6 ms ±   9.3 ms    [User: 781.5 ms, System: 2.3 ms]
  Range (min … max):   771.2 ms … 796.5 ms    10 runs

Your patch:
  Time (mean ± σ):     676.8 ms ±  12.8 ms    [User: 674.4 ms, System: 1.9 ms]
  Range (min … max):   665.0 ms … 699.1 ms    10 runs

Your patch + my suggested change
  Time (mean ± σ):     659.9 ms ±   6.5 ms    [User: 655.9 ms, System: 3.3 ms]
  Range (min … max):   651.5 ms … 669.4 ms    10 runs

My generic zend_compare improvement
  Time (mean ± σ):     716.6 ms ±   7.1 ms    [User: 714.0 ms, System: 2.1 ms]
  Range (min … max):   706.8 ms … 727.3 ms    10 runs

@Girgias Girgias requested a review from bukka as a code owner May 6, 2023 12:45
@nielsdos
Copy link
Member

nielsdos commented May 6, 2023

Changes since my review look good.

}
} else if (Z_TYPE(args[i]) == IS_DOUBLE) {
min_dval = (double) min_lval;
goto double_compare;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really safe? As currently written, when a double is observed, all comparions are done using double, which might compare different longs as the same value: https://3v4l.org/EgniH/rfc#vgit.master_jit

The original issue is primary because of tracing overhead, IMHO microoptimization for <10% is not much helpful vs. the much worse code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvorisek Note that the zend_compare function (which was previously always used) also casts to a double. I created a test case for min: https://3v4l.org/ZsosL and this behaves the same with this patch and without this patch. It is indeed true that there is a loss of precision when comparing very big doubles, but this is already the case now. There is afaik no change in behaviour.

Or did you test this patch and found an issue? If so, please provide the reproducer.

The original issue is primary because of tracing overhead, IMHO microoptimization for <10% is not much helpful vs. the much worse code.

It turns out that this actually improves the diff real-world workload from the original issue report a lot. The benchmarks I did were only very isolated but it's important to look at the real-world workload. Also there are already a couple of functions which have a fast-path for longs & doubles iirc.

Copy link
Contributor

@mvorisek mvorisek May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/php/php-src/blob/24771fb08b/ext/standard/array.c#L1267 will produce a bad result when there are doubles between large integers, I did not test it, but originally there was no jump to double cmp only

actually improves the diff real-world workload

the issue (in the diff lib) will be still present, at the main slowdown is due tracing overhead, not due slow php-src

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/php/php-src/blob/24771fb08b/ext/standard/array.c#L1267 will produce a bad result when there are doubles between large integers, I did not tested it, but originally there was no jump to double cmp only

I see what you mean. Here's a test with a behaviour difference:

<?php

var_dump(PHP_INT_MAX*3);
var_dump(min(PHP_INT_MAX*2, PHP_INT_MAX, PHP_INT_MAX-1));
// int(9223372036854775806) previously
// int(9223372036854775807) with this patch

This can probably be worked around. I would prefer a solution that doesn't restart the loop to prevent a performance impact. I also checked with my generic zend_compare improvement and it doesn't suffer from that problem, so we still have that as a possibility.

the issue (in the diff lib) will be still present, at the main slowdown is due tracing overhead, not due slow php-src

Yes, but I didn't say it fully got rid of all the overhead, I said this is a big improvement.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nielsdos
Copy link
Member

nielsdos commented May 7, 2023

To fix the found issue you can apply this patch:

diff --git a/ext/standard/array.c b/ext/standard/array.c
index 136dfda34c..f7546bff34 100644
--- a/ext/standard/array.c
+++ b/ext/standard/array.c
@@ -1268,12 +1268,8 @@ PHP_FUNCTION(min)
 						min_dval = Z_DVAL(args[i]);
 						min = &args[i];
 					}
-				} else if (Z_TYPE(args[i]) == IS_LONG) {
-					if (min_dval > (double)Z_LVAL(args[i])) {
-						min_dval = (double)Z_LVAL(args[i]);
-						min = &args[i];
-					}
 				} else {
+					/* Cannot do a fast-path for long+double mix, because of precision loss for longs >= 2**52 */
 					goto generic_compare;
 				}
 			}
@@ -1355,12 +1351,8 @@ PHP_FUNCTION(max)
 						max_dval = Z_DVAL(args[i]);
 						max = &args[i];
 					}
-				} else if (Z_TYPE(args[i]) == IS_LONG) {
-					if (max_dval < (double)Z_LVAL(args[i])) {
-						max_dval = (double)Z_LVAL(args[i]);
-						max = &args[i];
-					}
 				} else {
+					/* Cannot do a fast-path for long+double mix, because of precision loss for longs >= 2**52 */
 					goto generic_compare;
 				}
 			}

I benchmarked it and found that there are no regressions w.r.t. before this PR for mixed longs&doubles, but also no improvements of course because we can't do the fast path to avoid the aforementioned issue. This will still allow us to have the benefit for long-only and double-only inputs (which are likely very common).

Only longs:
  './sapi/cli/php_patched test.php' ran
    1.18 ± 0.03 times faster than './sapi/cli/php test.php'

Only doubles:
  './sapi/cli/php_patched test.php' ran
    1.19 ± 0.12 times faster than './sapi/cli/php test.php'

Mixed long & double:
  './sapi/cli/php_patched test.php' ran
    1.02 ± 0.01 times faster than './sapi/cli/php test.php'

@Girgias
Copy link
Member Author

Girgias commented May 7, 2023

Can't we just guard this by a && ( (zend_long)(double) lval == lval) ? as using such high integers seems unlikely. But I will add the test case in any regards

@Girgias Girgias force-pushed the min-max-optimization branch from 24771fb to fac58cd Compare May 7, 2023 14:16
@mvorisek
Copy link
Contributor

mvorisek commented May 7, 2023

&& ( (zend_long)(double) lval == lval) seems super hacky - comparison between double <> long is supported natively by C, did you evaluated a performance with one loop with switch with 5 cases - each combination - with no casts at all + generic case?

@nielsdos
Copy link
Member

nielsdos commented May 7, 2023

&& ( (zend_long)(double) lval == lval) seems super hacky - comparison between double <> long is supported natively by C, did you evaluated a performance with one loop with switch with 5 cases - each combination - with no casts at all + generic case?

It looks weird, but casting twice is necessary, because otherwise C does a double-double comparison instead of checking if the long lost precision.

As for your suggestion about the switch cases: zend_compare currently is a bunch of switch cases. In my alternative zend_compare optimisation I grouped the most common ones together, so it's kinda like what you suggest. The performance was a bit worse than this PR.

Can't we just guard this by a && ( (zend_long)(double) lval == lval) ? as using such high integers seems unlikely. But I will add the test case in any regards

Maybe, I'm not sure what is the best solution. There's an extra cost in comparing and casting, so some benchmarking would need to be done I guess.

@Girgias
Copy link
Member Author

Girgias commented May 7, 2023

&& ( (zend_long)(double) lval == lval) seems super hacky

This is standard procedure, and we do this to check that a PHP float (C double) is compatible with a PHP int (C zend_long).

@nielsdos
Copy link
Member

nielsdos commented May 7, 2023

FWIW I benched the current version of the PR and it's roughly the same performance as without the extra checks.

I do have one question though:

} else if (Z_TYPE(args[i]) == IS_DOUBLE && ((zend_long)(double) max_lval == max_lval)) {
	max_dval = (double) max_lval;
	goto double_compare;
}

I don't think you need the extra check here, because if the check fails, then zend_compare will be used which does casting internally anyway. (Same for the min case).
So I think we can remove the check here?

@mvorisek
Copy link
Contributor

mvorisek commented May 7, 2023

&& ( (zend_long)(double) lval == lval) seems super hacky

This is standard procedure, and we do this to check that a PHP float (C double) is compatible with a PHP int (C zend_long).

As php developer, I expect unified perfomance. AFAICT one loop impl. can have about the same perf, be simpler and handle all types more consistently. The main overhead is probably comming from zend_compare call + zend_compare more complex impl.

@nielsdos
Copy link
Member

nielsdos commented May 7, 2023

As php developer, I expect unified perfomance.

Well, it depends on what you want to unify. This patch brings the performance of $min=min($a,$b); closer to if ($a < $b) $min=$a;else $min=$b; . So in a way this does unify it. Isn't it a good thing that some functions get a fast path for the most common use cases?

AFAICT one loop impl. can have about the same perf, be simpler and handle all types more consistently.

This is not what the benchmarks are saying though. It of course also depends on how far we would like to take an optimisation, i.e. when does an the difference between two optimisation approaches not matter anymore.

The main overhead is probably comming from zend_compare call + zend_compare more complex impl.

Yep this is true. In fact I explored this here: nielsdos#6. Specialising min/max for float&int seems to get a little more performance than my approach, but ofc it depends on how far we want to go with the optimisations. That's not to say that a more general optimisation cannot be applied on top of this PR.

@KapitanOczywisty
Copy link

The main overhead is probably comming from zend_compare call + zend_compare more complex impl.

Yep this is true. In fact I explored this here: nielsdos#6. Specialising min/max for float&int seems to get a little more performance than my approach, but ofc it depends on how far we want to go with the optimisations. That's not to say that a more general optimisation cannot be applied on top of this PR.

Overhead is significant when min/max is called many, many times, this often means pattern similar to: while { $min = min($min, $element); }. As this is rather common use (e.g. in the original issue), maybe there could be a special opcode for min/max with exactly 2 arguments?

@nielsdos
Copy link
Member

nielsdos commented May 7, 2023

The main overhead is probably comming from zend_compare call + zend_compare more complex impl.

Yep this is true. In fact I explored this here: nielsdos#6. Specialising min/max for float&int seems to get a little more performance than my approach, but ofc it depends on how far we want to go with the optimisations. That's not to say that a more general optimisation cannot be applied on top of this PR.

Overhead is significant when min/max is called many, many times, this often means pattern similar to: while { $min = min($min, $element); }. As this is rather common use (e.g. in the original issue), maybe there could be a special opcode for min/max with exactly 2 arguments?

One example of a function where we have a dedicated opcode is strlen(), so it certainly is possible. But there's a caveat here: namespaces can override internal functions. For example:

<?php
namespace Foo;
function x() {
  return strlen("abc");
}

Relevant 3v4l: https://3v4l.org/RisbL/vld
As we can see, the optimizer cannot replace the function call with the opcode, because there might be a Foo\strlen() function. If we however replace strlen with \strlen, then the optimisation happens.
In any case, that doesn't really prevent us from implementing an opcode for min/max, I just wanted to point out that there's no automatic gain, unless the code prefixes the call with a backslash (or uses the global one with a use statement).

As for the opcodes: it would indeed help performance, but we must be careful adding new opcodes. Every opcode we add puts more pressure on the instruction cache for the VM. Although caches nowadays are growing bigger and bigger, on shared infrastructure (e.g. shared hosting) this will still be something to look out for.
There's also the question: if we add an opcode for min/max, what function is next? We can't just add an opcode for every small internal function. So some extra thought needs to be put into this if we would go this route.

@KapitanOczywisty
Copy link

KapitanOczywisty commented May 7, 2023

But there's a caveat here: namespaces can override internal functions.

Sigh.. Although use function max or \max are also pretty common.

There's also the question: if we add an opcode for min/max, what function is next?

AFIK opcodes can be removed without affecting BC, so there is no problem to change mind in the future.

Edit: I hope this fallback to global namespace will be deprecated someday.

@bwoebi
Copy link
Member

bwoebi commented May 7, 2023

This discussion reminds me of the 7 year old PR #1679. Perhaps, with all the improvements done on the VM, the relative improvement of a change like described in that PR might be more important nowadays.

And - obviously - it will also fare well with such small ICALLs, where the most important overhead is setting up the frame and copying arguments. With further possibilities for eliding execute_data on internal leaf functions etc.

I don't really see opcodes as the solution, more that our function calls just have too much overhead.

I definitely approve of this optimization to min and max though, just saying that we pretty much reach the ceiling with this change what can be done for such small functions.

max_lval = Z_LVAL(args[i]);
max = &args[i];
}
} else if (Z_TYPE(args[i]) == IS_DOUBLE && (zend_dval_to_lval((double) max_lval) == max_lval)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add some comment here as it took me a little bit to figure out what zend_dval_to_lval((double) max_lval) == max_lval is for. IIUC it's for long to double overflow check, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's to check that when an int converted to float it can be represented exactly as a float or it loses precision (so for integers higher than 52 bits IIRC)

Comment on lines 65 to 67
var_dump(max(
PHP_INT_MIN*2, PHP_INT_MIN, PHP_INT_MIN+1)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Couldn't all of those be on a single line. Btw this one is a bit inconsistent formatting from the above ones as well...

@kamil-tekiela
Copy link
Member

I'd like to add that while I enjoy performance improvements, this one is a very small improvement for a special case with very specific arguments. The added extra complexity is probably not really worth it. In hot code like the one mentioned in the original issue, developers can inline the condition and skip max() altogether. Additionally, the biggest performance issue happens when users pass an array instead of variadics.

time ./sapi/cli/php -r '$b = 1; $c = 2; for ($i = 0; $i < 430000000; ++$i) { $a = max([$b, $c]); }'

This PR doesn't improve the performance of this scenario. So unless we can find a way to improve the performance of these functions in general, I don't think it's worth the additional complexity for a couple of % speed improvements in rare circumstances.

@Girgias Girgias force-pushed the min-max-optimization branch from d534496 to e59383e Compare May 31, 2023 10:28
@Girgias Girgias requested a review from nielsdos May 31, 2023 11:45
Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have some silly nits. Other than that it looks good and does indeed improve the performance. The performance improvement isn't that big though, but might still be valuable.
I tried to break it but didn't find edge cases. Also works under UBSAN.

max = &args[i];
}
} else if (Z_TYPE(args[i]) == IS_DOUBLE && (zend_dval_to_lval((double) max_lval) == max_lval)) {
/* if max_level can be exactly represented as a float, go to float dedicated code */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote max_level instead of max_lval. Also same float remark.

@Girgias Girgias force-pushed the min-max-optimization branch from 6abdcdd to af45e84 Compare June 2, 2023 09:27
@Girgias Girgias merged commit 1540245 into php:master Jun 2, 2023
@Girgias Girgias deleted the min-max-optimization branch June 2, 2023 09:27
@staabm
Copy link
Contributor

staabm commented Jun 2, 2023

Thanks to everyone involved. <3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve max() performance
8 participants