From: "Eregon (Benoit Daloze) via ruby-core" Date: 2024-07-26T11:09:46+00:00 Subject: [ruby-core:118693] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 Issue #20652 has been updated by Eregon (Benoit Daloze). FWIW, what TruffleRuby does for this is to store `$~` as a frame-local thread-local variable, but thread-local only if more than 1 thread has been seen, otherwise it's stored directly in the frame: https://github.com/oracle/truffleruby/blob/3cd422433deebe3fa664f8c4540811c42ca02e93/src/main/java/org/truffleruby/language/threadlocal/ThreadAndFrameLocalStorage.java I'm not sure how it works on CRuby, but `$~` is stored directly in the frame then threads might see a different `$~` than they expect which could lead to very subtle bugs. I don't really like a Regexp flag for this because a Regexp might be used in different contexts and some usages might want `$~` and some might not. I think in general a good fix to simplify this and avoid this kind of races would be to store `$~` in the caller frame (even if that's a block's frame) but not higher. In this case it would be stored in the `lambda`'s frame and not outside. That's also quite a bit faster. Of course it would be somewhat incompatible, but how much code uses a `$~` outside a block when the Regexp call is made inside a block? We could warn that such code should not rely on that for a release or so, before changing it. ---------------------------------------- Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3 https://bugs.ruby-lang.org/issues/20652#change-109229 * Author: orisano (Nao Yonashiro) * Status: Open * Assignee: jeremyevans0 (Jeremy Evans) ---------------------------------------- I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased. When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased. The problem was code like this: ```ruby s = "foo " s.gsub(/ (\s+)/) { " #{' ' * Regexp.last_match(1).length}" } ``` When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing. https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6 https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11 I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions. Is there a way to reuse backref while still avoiding race conditions? -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/