Skip to content

Commit ac33c51

Browse files
committed
PATCH: [perl 127537] /\W/ regression with UTF-8
This bug is apparently uncommon in the field, as I was the one who discovered it. It requires a UTF-8 pattern containing a complemented posix class, like \W or \S, in an inverted character class, like [^\Wfoo] in a pattern that also has a synthetic start class generated by the regex optimizer for it . The fix is trivial.
1 parent ce54a8b commit ac33c51

File tree

3 files changed

+14
-2
lines changed

3 files changed

+14
-2
lines changed

pod/perldelta.pod

+8
Original file line numberDiff line numberDiff line change
@@ -577,6 +577,14 @@ specification is very close to one of the 14 legal POSIX classes. (See
577577
L<perlrecharclass/POSIX Character Classes>.)
578578
[perl #8904]
579579

580+
=item *
581+
582+
Certain regex patterns involving a complemented posix class in an
583+
inverted bracketed character class, and matching something else
584+
optionally would improperly fail to match. An example of one that could
585+
fail is C</qr/_?[^\Wbar]\x{100}/>. This has been fixed.
586+
[perl #127537]
587+
580588
=back
581589

582590
=head1 Known Problems

regcomp.c

+4-2
Original file line numberDiff line numberDiff line change
@@ -1420,8 +1420,10 @@ S_get_ANYOF_cp_list_for_ssc(pTHX_ const RExC_state_t *pRExC_state,
14201420
}
14211421

14221422
/* If this can match all upper Latin1 code points, have to add them
1423-
* as well */
1424-
if (OP(node) == ANYOFD
1423+
* as well. But don't add them if inverting, as when that gets done below,
1424+
* it would exclude all these characters, including the ones it shouldn't
1425+
* that were added just above */
1426+
if (! (ANYOF_FLAGS(node) & ANYOF_INVERT) && OP(node) == ANYOFD
14251427
&& (ANYOF_FLAGS(node) & ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER))
14261428
{
14271429
_invlist_union(invlist, PL_UpperLatin1, &invlist);

t/re/re_tests

+2
Original file line numberDiff line numberDiff line change
@@ -1615,6 +1615,8 @@ a(.)\4294967298 ab\o{42}94967298 ya $1 b \d not converted to native; \o{} is
16151615
^m?(\d)(.*)\1$ 5b5 y $1 5
16161616
^m?(\d)(.*)\1$ aba n - -
16171617

1618+
^_?[^\W_0-9]\w\z \xAA\x{100} y $& \xAA\x{100} [perl #127537]
1619+
16181620
# 17F is 'Long s'; This makes sure the a's in /aa can be separate
16191621
/s/ai \x{17F} y $& \x{17F}
16201622
/s/aia \x{17F} n - -

0 commit comments

Comments
 (0)