Skip to content

Optimize match(true) #18423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

TimWolla
Copy link
Member

Resolves #18411

@TimWolla TimWolla requested a review from iluuu1994 April 24, 2025 17:45
@TimWolla TimWolla requested a review from dstogov as a code owner April 24, 2025 17:45
@TimWolla TimWolla force-pushed the optimize-match-true branch 2 times, most recently from aace896 to d14138a Compare April 24, 2025 18:18
@iluuu1994
Copy link
Member

Of note: The optimizer is cautious with top-level code. Moving the entire code to a function gives:

test:
     ; (lines=1, args=0, vars=0, tmps=0)
     ; (after optimizer)
     ; /home/ilutov/Developer/php-src/Zend/tests/gh18411.php:3-13
0000 RETURN string("fr")

That said, the optimizations are still lacking when $text is unknown.

Details

test:
     ; (lines=23, args=1, vars=2, tmps=3)
     ; (after optimizer)
     ; /home/ilutov/Developer/php-src/Zend/tests/gh18411.php:3-12
0000 CV0($text) = RECV 1
0001 T2 = FRAMELESS_ICALL_2(str_contains) CV0($text) string("Welcome")
0002 T3 = BOOL T2
0003 T2 = IS_IDENTICAL T3 bool(true)
0004 JMPNZ T2 0018
0005 T4 = FRAMELESS_ICALL_2(str_contains) CV0($text) string("Hello")
0006 T3 = BOOL T4
0007 T2 = IS_IDENTICAL T3 bool(true)
0008 JMPNZ T2 0018
0009 T4 = FRAMELESS_ICALL_2(str_contains) CV0($text) string("Bienvenue")
0010 T3 = BOOL T4
0011 T2 = IS_IDENTICAL T3 bool(true)
0012 JMPNZ T2 0020
0013 T4 = FRAMELESS_ICALL_2(str_contains) CV0($text) string("Bonjour")
0014 T3 = BOOL T4
0015 T2 = IS_IDENTICAL T3 bool(true)
0016 JMPNZ T2 0020
0017 MATCH_ERROR bool(true)
0018 T2 = QM_ASSIGN string("en")
0019 JMP 0021
0020 T2 = QM_ASSIGN string("fr")
0021 CV1($result) = QM_ASSIGN T2
0022 RETURN CV1($result)

I wonder if this is better implemented as a DFA and SCCP pass. Namely, BOOL should be removed in DFA when the operand is already a boolean (DCE may also be an option, but it doesn't handle special cases at this point, so it may be better to keep it generic). SCCP may propagate the result for CHECK_TYPE if the inferred type matches the checked type, and may then be turned into a NOP or FREE. This may work for more use-cases.

@TimWolla
Copy link
Member Author

That said, the optimizations are still lacking when $text is unknown.

I'm not sure if I follow. Is the code snippet from master or this branch? For me, my PR also successfully optimizes:

<?php

$text = 'Bienvenue chez nous';
function foo($text) {
    $result = match (true) {
        !!preg_match('/Welcome/', $text), !!preg_match('/Hello/', $text) => 'en',
        !!preg_match('/Bienvenue/', $text), !!preg_match('/Bonjour/', $text) => 'fr',
        default => 'other',
    };

    var_dump($result);
}

foo($text);

Both the cast, as well as the IS_IDENTICAL disappears there, leaving only the call and JMPNZ.

@iluuu1994
Copy link
Member

I'm not sure if I follow. Is the code snippet from master or this branch? For me, my PR also successfully optimizes:

I was referring to the behavior before your patch.

@staabm
Copy link
Contributor

staabm commented Apr 25, 2025

do we expect this patch to improve performance? if so, how much?

@edorian
Copy link
Member

edorian commented Apr 25, 2025

Quick test on my MacBook:

cat bench-match-true.php

<?php

function foo($text) {
    $result = match (true) {
        !!preg_match('/Welcome/', $text), !!preg_match('/Hello/', $text) => 'en',
        !!preg_match('/Bienvenue/', $text), !!preg_match('/Bonjour/', $text) => 'fr',
        default => 'other',
    };

    return $result;
}

$i = 1_000_000;
$text = 'Bienvenue chez nous';
while($i--) {
	foo($text);
}

hyperfine -L version master,optimize-match-true './{version}/target/bin/php -d zend_extension=$(pwd)/{version}/modules/opcache.so -d opcache.enable_cli=1 bench-match-true.php'

Benchmark 1: ./master/target/bin/php -d zend_extension=$(pwd)/master/modules/opcache.so -d opcache.enable_cli=1 bench-match-true.php
  Time (mean ± σ):     121.6 ms ±   4.5 ms    [User: 118.4 ms, System: 1.7 ms]
  Range (min … max):   112.9 ms … 132.1 ms    22 runs

Benchmark 2: ./optimize-match-true/target/bin/php -d zend_extension=$(pwd)/optimize-match-true/modules/opcache.so -d opcache.enable_cli=1 bench-match-true.php
  Time (mean ± σ):     103.7 ms ±   4.0 ms    [User: 100.6 ms, System: 1.7 ms]
  Range (min … max):   100.6 ms … 118.5 ms    24 runs

Summary
  ./optimize-match-true/target/bin/php -d zend_extension=$(pwd)/optimize-match-true/modules/opcache.so -d opcache.enable_cli=1 bench-match-true.php ran
    1.17 ± 0.06 times faster than ./master/target/bin/php -d zend_extension=$(pwd)/master/modules/opcache.so -d opcache.enable_cli=1 bench-match-true.php

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization looks right to me.

The only possible problem, is the case when the eliminated temporary variable might be used several times. (e.g. in context of ?? operator).

@iluuu1994 could you please check this.
If you don't see problems, I would support this.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only possible problem, is the case when the eliminated temporary variable might be used several times. (e.g. in context of ?? operator).

This would indeed be a problem for:

T2 = BOOL T1
T3 = CASE T2 ; Anything in keeps_op1_alive(), essentially.
T4 = TYPE_CHECK T2 false

; After

T4 = BOOL_NOT T1
T3 = CASE T2 ; T2 is no longer declared
NOP

However, I don't believe such code can currently be emitted by the compiler. CASE, CASE_STRICT, SWITCH are emitted for switch/match in a very specific pattern, e.g. multiple CASE ops in sequence, followed by a single FREE instruction for the subject. There's no way to make the terminating instruction something else like a TYPE_CHECK. Thus, I think this is correct.

This optimization is already happening in the compiler for explicit `===`
expressions, but not for `match()`, which also compiles to `IS_IDENTICAL`.
@TimWolla TimWolla force-pushed the optimize-match-true branch from d14138a to 8ffcee2 Compare April 28, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge T2 = BOOL T1 and T3 = IS_IDENTICAL T2 bool(true) into T3 = BOOL T1
5 participants