From: jiri.marsik@... Date: 2020-11-23T18:03:50+00:00 Subject: [ruby-core:101030] [Ruby master Bug#17341] Unsound quantifier reduction with nested quantifiers Issue #17341 has been reported by jirkamarsik (Jirka Marsik). ---------------------------------------- Bug #17341: Unsound quantifier reduction with nested quantifiers https://bugs.ruby-lang.org/issues/17341 * Author: jirkamarsik (Jirka Marsik) * Status: Open * Priority: Normal * ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- The rules for reducing nested quantifiers can produce quantifiers with semantics which differ from the original quantifiers. This can then lead to the regular expressions matching different strings. ``` irb(main):001:0> /(?:a+?)*/.match('aa') (irb):1: warning: nested repeat operator '+?' and '*' was replaced with '+? and ?' in regular expression: /(?:a+?)*/ => # irb(main):002:0> /(a+?)*/.match('aa') => # ``` In the above, we can see that by inserting a capture group between the two quantifiers, we prevent quantifier reduction from occurring and we get a regexp that matches the whole input. If we let quantifier reduction happen, we get a resulting regexp that only matches the first character. I think quantifier reduction should not change the behavior of a regexp, as it is just an optimization. I found the quantifier reduction rules in `ReduceTypeTable` in `regparse.c`. I haven't checked them all but the ones that replace two quantifiers by two other quantifiers caught my eye. -- https://bugs.ruby-lang.org/ Unsubscribe: