Skip to content

PCRE regular expressions with JIT enabled gives different result #11956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
antman3351 opened this issue Aug 13, 2023 · 4 comments
Closed

PCRE regular expressions with JIT enabled gives different result #11956

antman3351 opened this issue Aug 13, 2023 · 4 comments

Comments

@antman3351
Copy link

Description

Hi,
The fallowing regex preg_match( '/<(\w+)[\s\w\-]+ id="S44_i89ew">/', '<br><div id="S44_i89ew">', $matches ); in PHP >= 8.1 gives different results if PCRE JIT is enabled or not. ( Note, I tried to simplify the regex to isolate the problem, that's why it doesn't make much sense ).

I had this problem with PHP 7.3 where I found a bug report that sated the problem was the PCRE library and not PHP.
I've recently migrated the code the regex is used in to PHP 8.2 and found there's still an issue, so I tested it on 3v4l.org and it seems to be fixed in PHP <= 8.0 and broken in 8.1.

I noted the issue on the PCRE2 github but the author doesn't think the issue is with any changes in the PCRE library: PCRE2Project/pcre2#261 (comment)

Code example: https://3v4l.org/189jh

Thanks,
Antonio

PHP Version

8.2.7

Operating System

No response

@nielsdos
Copy link
Member

I can reproduce this locally. I'll run a bisect. This will also reveal if the problem is in PHP or in PCRE.

@nielsdos
Copy link
Member

nielsdos commented Aug 13, 2023

Bisect finished.

It points to 5d42900 and 79755ca.
This is indeed a pcre2lib problem in 2.37. Given that the pcre2lib maintainer says it does not reproduce in their testings on 2.42, I'd say the solution here is updating PHP's bundled pcre2lib to the latest version.

As a temporary workaround you can configure & build with the flag to use your system's pcre2lib, and hope that version is high enough to not contain the bug.

@hormus
Copy link

hormus commented Aug 21, 2023

Or for compatibility

$equal = preg_quote('=');
$pattern = '/<(\\w+)[\\s\\w\\-]*id' . $equal . '"S44_i89ew">/';

// or for other regex
$equal = preg_quote('=');
$pattern = '/<(\\w+)[\\s\\w\\-]*id' . $equal . '"S44_123">/';
Pattern /<(\w+)[\s\w\-]+ id="S44_i89ew">/ not + but * and without white space id
for subject  <br><div id="S44_i89ew">

@antman3351
Copy link
Author

Or for compatibility

$equal = preg_quote('=');
$pattern = '/<(\\w+)[\\s\\w\\-]*id' . $equal . '"S44_i89ew">/';

// or for other regex
$equal = preg_quote('=');
$pattern = '/<(\\w+)[\\s\\w\\-]*id' . $equal . '"S44_123">/';
Pattern /<(\w+)[\s\w\-]+ id="S44_i89ew">/ not + but * and without white space id
for subject  <br><div id="S44_i89ew">

Your regex doesn't cause the issue where there is a difference between JIT and non JIT.

@nielsdos nielsdos self-assigned this Sep 2, 2023
nielsdos added a commit to nielsdos/php-src that referenced this issue Sep 2, 2023
…erent result

The code in the attached test used to work correctly in PHP 8.0, but not
in 8.1+. This is because PHP 8.1+ uses a more modern version of pcre2
than PHP 8.0, and that pcre2 versions has a regression.

While upgrading pcre2lib seems to be only done for the master branch, it
is possible to backport upstream fixes to stable branches. This has been
already done in the past in for JIT regressions [1], so it is not
unprecedented.

We backport the upstream pcre2 fix [2].

[1] php@788a701e222
[2] PCRE2Project/pcre2#135
nielsdos added a commit that referenced this issue Sep 18, 2023
* PHP-8.1:
  Fix GH-11956: PCRE regular expressions with JIT enabled gives different result
nielsdos added a commit that referenced this issue Sep 18, 2023
* PHP-8.2:
  Fix GH-11956: PCRE regular expressions with JIT enabled gives different result
nielsdos added a commit that referenced this issue Sep 18, 2023
* PHP-8.3:
  Fix GH-11956: PCRE regular expressions with JIT enabled gives different result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants