[ruby/prism] Better handle regexp in the parser translator

Turns out, it was already almost correct. If you disregard \c and \M style escapes, only a single character is allowed to be escaped in a regex so most tests passed already. There was also a mistake where the wrong value was constructed for the ast, this is now fixed. One test fails because of this, but I'm fairly sure it is because of a parser bug. For `/\“/`, the backslash is supposed to be removed because it is a multibyte character. But tbh, I don't entirely understand all the rules. Fixes more than half of the remaining ast differences for rubocop tests https://github.com/ruby/prism/commit/e1c75f304b
author: Earlopain <[email protected]> 2025-01-14 20:20:05 +0100
committer: Kevin Newton <[email protected]> 2025-03-18 13:36:53 -0400
commit: 5d138f2b436dc84b1efed86ac3328e67638887cb (patch)
tree: f4e85d9728801d53d6d10099216720ac5c7309a1
parent: 177adf6fa543663334bfb8918b356b4771e5ff1a (diff)
2 files changed, 15 insertions, 0 deletions
diff --git a/lib/prism/translation/parser/lexer.rb b/lib/prism/translation/parser/lexer.rb
index a2d4e84a13..687b4f6043 100644
--- a/lib/prism/translation/parser/lexer.rb
+++ b/lib/prism/translation/parser/lexer.rb
@@ -825,6 +825,11 @@ module Prism
           quote == "/" || quote.start_with?("%r")
         end
 
+        # Regexp allow interpolation but are handled differently during unescaping
+        def regexp?(quote)
+          quote == "/" || quote.start_with?("%r")
+        end
+
         # Determine if the string is part of a %-style array.
         def percent_array?(quote)
           quote.start_with?("%w", "%W", "%i", "%I")
diff --git a/test/prism/ruby/parser_test.rb b/test/prism/ruby/parser_test.rb
index 4a6ba17aa7..b5cc45b824 100644
--- a/test/prism/ruby/parser_test.rb
+++ b/test/prism/ruby/parser_test.rb
@@ -74,6 +74,7 @@ module Prism
 
       # Contains an escaped multibyte character. This is supposed to drop to backslash
       "seattlerb/regexp_escape_extended.txt",
+<<<<<<< HEAD
 
       # https://github.com/whitequark/parser/issues/1020
       # These contain consecutive \r characters, followed by \n. Prism only receives
@@ -82,12 +83,21 @@ module Prism
       "seattlerb/heredoc_with_extra_carriage_returns_windows.txt",
       "seattlerb/heredoc_with_only_carriage_returns_windows.txt",
       "seattlerb/heredoc_with_only_carriage_returns.txt",
+=======
+>>>>>>> e1c75f304b (Better handle regexp in the parser translator)
     ]
 
     # These files are either failing to parse or failing to translate, so we'll
     # skip them for now.
     skip_all = skip_incorrect | [
       "unescaping.txt",
+<<<<<<< HEAD
+=======
+      "seattlerb/heredoc_with_extra_carriage_returns_windows.txt",
+      "seattlerb/heredoc_with_only_carriage_returns_windows.txt",
+      "seattlerb/heredoc_with_only_carriage_returns.txt",
+      "seattlerb/pctW_lineno.txt",
+>>>>>>> e1c75f304b (Better handle regexp in the parser translator)
       "seattlerb/regexp_esc_C_slash.txt",
     ]
author	Earlopain <[email protected]>	2025-01-14 20:20:05 +0100
committer	Kevin Newton <[email protected]>	2025-03-18 13:36:53 -0400
commit	5d138f2b436dc84b1efed86ac3328e67638887cb (patch)
tree	f4e85d9728801d53d6d10099216720ac5c7309a1
parent	177adf6fa543663334bfb8918b356b4771e5ff1a (diff)