diff options
author | Jeremy Evans <[email protected]> | 2022-06-06 13:50:03 -0700 |
---|---|---|
committer | GitHub <[email protected]> | 2022-06-06 13:50:03 -0700 |
commit | ec3542229b29ec93062e9d90e877ea29d3c19472 (patch) | |
tree | 1f96f5d38b24e9605bccf074765a23c6a6e018a8 /test/ruby/test_regexp.rb | |
parent | c85d1cda86d75ee2c3f7b42f22c543409cb5a186 (diff) |
Ignore invalid escapes in regexp comments
Invalid escapes are handled at multiple levels. The first level
is in parse.y, so skip invalid unicode escape checks for regexps
in parse.y.
Make rb_reg_preprocess and unescape_nonascii accept the regexp
options. In unescape_nonascii, if the regexp is an extended
regexp, when "#" is encountered, ignore all characters until the
end of line or end of regexp.
Unfortunately, in extended regexps, you can use "#" as a non-comment
character inside a character class, so also parse "[" and "]"
specially for extended regexps, and only skip comments if "#" is
not inside a character class. Handle nested character classes as well.
This issue doesn't just affect extended regexps, it also affects
"(#?" comments inside all regexps. So for those comments, scan
until trailing ")" and ignore content inside.
I'm not sure if there are other corner cases not handled. A
better fix would be to redesign the regexp parser so that it
unescaped during parsing instead of before parsing, so you already
know the current parsing state.
Fixes [Bug #18294]
Co-authored-by: Nobuyoshi Nakada <[email protected]>
Notes
Notes:
Merged: https://github.com/ruby/ruby/pull/5721
Merged-By: jeremyevans <[email protected]>
Diffstat (limited to 'test/ruby/test_regexp.rb')
-rw-r--r-- | test/ruby/test_regexp.rb | 53 |
1 files changed, 53 insertions, 0 deletions
diff --git a/test/ruby/test_regexp.rb b/test/ruby/test_regexp.rb index 84687c5380..71d56ad027 100644 --- a/test/ruby/test_regexp.rb +++ b/test/ruby/test_regexp.rb @@ -91,6 +91,59 @@ class TestRegexp < Test::Unit::TestCase assert_warn('', '[ruby-core:82328] [Bug #13798]') {re.to_s} end + def test_extended_comment_invalid_escape_bug_18294 + assert_separately([], <<-RUBY) + re = / C:\\\\[a-z]{5} # e.g. C:\\users /x + assert_match(re, 'C:\\users') + assert_not_match(re, 'C:\\user') + + re = / + foo # \\M-ca + bar + /x + assert_match(re, 'foobar') + assert_not_match(re, 'foobaz') + + re = / + f[#o]o # \\M-ca + bar + /x + assert_match(re, 'foobar') + assert_not_match(re, 'foobaz') + + re = / + f[[:alnum:]#]o # \\M-ca + bar + /x + assert_match(re, 'foobar') + assert_not_match(re, 'foobaz') + + re = / + f(?# \\M-ca)oo # \\M-ca + bar + /x + assert_match(re, 'foobar') + assert_not_match(re, 'foobaz') + + re = /f(?# \\M-ca)oobar/ + assert_match(re, 'foobar') + assert_not_match(re, 'foobaz') + + re = /[-(?# fca)]oobar/ + assert_match(re, 'foobar') + assert_not_match(re, 'foobaz') + + re = /f(?# ca\0\\M-ca)oobar/ + assert_match(re, 'foobar') + assert_not_match(re, 'foobaz') + RUBY + + assert_raise(SyntaxError) {eval "/\\users/x"} + assert_raise(SyntaxError) {eval "/[\\users]/x"} + assert_raise(SyntaxError) {eval "/(?<\\users)/x"} + assert_raise(SyntaxError) {eval "/# \\users/"} + end + def test_union assert_equal :ok, begin Regexp.union( |