From: samuel@... Date: 2018-07-08T11:35:22+00:00 Subject: [ruby-core:87880] [Ruby trunk Bug#14900] Extra allocation in String#byteslice Issue #14900 has been updated by ioquatix (Samuel Williams). I played around with my assumptions here. By far the worst from a memory POV was `slice!`, which given a string of 5MB, produces 7.5MB allocations. The equivalent sequence of `byteslice` as above only allocates 2.5MB. Here were my comparisons: ``` measure_memory("Initial allocation") do string = "a" * 5*1024*1024 string.freeze end # => 5.0 MB measure_memory("Byteslice from start to middle") do # Why does this need to allocate memory? Surely it can share the original allocation? x = string.byteslice(0, string.bytesize / 2) end # => 2.5 MB measure_memory("Byteslice from middle to end") do string.byteslice(string.bytesize / 2, string.bytesize) end # => 0.0 MB measure_memory("Slice! from start to middle") do string.dup.slice!(0, string.bytesize / 2) # dup doesn't make any difference to size of allocations end # => 7.5 MB measure_memory("Byte slice into two halves") do head = string.byteslice(0, string.bytesize / 2) remainder = string.byteslice(string.bytesize / 2, string.bytesize) end # 2.5 MB ``` (examples are also here: https://github.com/socketry/async-io/blob/master/examples/allocations/byteslice.rb) In the best case, the last example should be able to reuse the source string entirely, but Ruby doesn't seem capable of doing that yet. Perhaps a specific implementation of `byteslice!` could address this use case with zero allocations? ---------------------------------------- Bug #14900: Extra allocation in String#byteslice https://bugs.ruby-lang.org/issues/14900#change-72893 * Author: janko (Janko Marohni��) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- When executing `String#byteslice` with a range, I noticed that sometimes the original string is allocated again. When I run the following script: ~~~ ruby require "objspace" string = "a" * 100_000 GC.start GC.disable generation = GC.count ObjectSpace.trace_object_allocations do string.byteslice(50_000..-1) ObjectSpace.each_object(String) do |string| p string.bytesize if ObjectSpace.allocation_generation(string) == generation end end ~~~ it outputs ~~~ 50000 100000 6 5 ~~~ The one with 50000 bytes is the result of `String#byteslice`, but the one with 100000 bytes is the duplicated original string. I expected only the result of `String#byteslice` to be amongst new allocations. If instead of the last 50000 bytes I slice the *first* 50000 bytes, the extra duplication doesn't occur. ~~~ ruby # ... string.byteslice(0, 50_000) # ... ~~~ ~~~ 50000 5 ~~~ It's definitely ok if the implementation of `String#bytesize` allocates extra strings as part of the implementation, but it would be nice if they were deallocated before returning the result. EDIT: It seems that `String#slice` has the same issue. -- https://bugs.ruby-lang.org/ Unsubscribe: