From: "brixen (Brian Ford)" <brixen@...>
Date: 2012-12-16T03:13:57+09:00
Subject: [ruby-core:50919] [ruby-trunk - Bug #7566] Escape (\u{}) forms in Regexp literals


Issue #7566 has been updated by brixen (Brian Ford).


I'd argue that's a malformed Regexp and "round-tripping" shouldn't be expected to work.

sasha:rubinius brian$ irb
1.9.3p327 :001 > re = /[\\\u{5d}]/
 => /[\\\u{5d}]/ 
1.9.3p327 :002 > re2 = Regexp.new re
 => /[\\\u{5d}]/ 
1.9.3p327 :003 > re3 = Regexp.new re.source
 => /[\\\u{5d}]/ 
1.9.3p327 :004 > "ab]c" =~ re
 => 2 
1.9.3p327 :005 > "ab]c" =~ re2
 => 2 
1.9.3p327 :006 > "ab]c" =~ re3
 => 2 

The consequence of storing the source with escape sequences and the fact that 7-bit clean source even using UTF escapes is encoded as US-ASCII is that the underlying Oniguruma data must be maintained separately and the string potentially unescaped every match. At least, that is the best understanding I have of the MRI source code. AFAIK, this is not defined anywhere.

Thanks,
Brian
----------------------------------------
Bug #7566: Escape (\u{}) forms in Regexp literals
https://bugs.ruby-lang.org/issues/7566#change-34772

Author: brixen (Brian Ford)
Status: Open
Priority: Normal
Assignee: 
Category: core
Target version: 2.0.0
ruby -v: ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin10.8.0]


Why are \u{} escape sequences in Regexp literals not converted to bytes like they are in String literals?

https://gist.github.com/4290155

Thanks,
Brian


-- 
http://bugs.ruby-lang.org/