From: "tenderlovemaking (Aaron Patterson) via ruby-core" Date: 2025-04-09T23:15:05+00:00 Subject: [ruby-core:121615] [Ruby Feature#21254] Inlining Class#new Issue #21254 has been updated by tenderlovemaking (Aaron Patterson). tenderlovemaking (Aaron Patterson) wrote in #note-7: > I made a patch for it [here](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db), but I haven't tested it in CI yet. @jhawthorn pointed out a problem to me with this patch that I didn't think about. If we consider this code: ```ruby class A private_class_method :new def self.make new # fast path end end A.make A.make ``` When we try to [look up the `new` method](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db#diff-f8c174347e6ea8889b5036064a1ff4fe5e7c53a821befa9bdc5ccbf17800a649R2352) it will fill out the inline cache with the ZSUPER entry. But since the ZSUPER entry [won't pass the `rb_class_new_instance_pass_kw` check](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db#diff-f8c174347e6ea8889b5036064a1ff4fe5e7c53a821befa9bdc5ccbf17800a649R2359), we'll end up [looking up the `new` method in from the superclass](https://github.com/ruby/ruby/commit/b8e37fd5cc588d05576b24c13f94c54409b2a9db#diff-f8c174347e6ea8889b5036064a1ff4fe5e7c53a821befa9bdc5ccbf17800a649R2364), which will fill out the inline cache with the method from the superclass. On the next call to `A.make`, it will miss cache again because the receiver is `A`, basically repeating the above steps. In this case the inline cache will keep ping-ponging between the ZSUPER method and the superclasses method; never hitting. Maybe we can figure out a way to do this in the future, but I'm not sure if it's a good idea right now. This particular case might actually be better handled by the JIT than the interpreter. ---------------------------------------- Feature #21254: Inlining Class#new https://bugs.ruby-lang.org/issues/21254#change-112667 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the `Class#new` method. In order to support inlining this method, we would like to introduce a new YARV instruction `opt_new`. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a ���slow path��� for calling a method. `Class#new` especially benefits from inlining for two reasons: 1. Calling `initialize` directly means we don't need to allocate a temporary hash for keyword arguments 2. We are able to use an inline cache when calling the `initialize` method The patch can be found [here](https://github.com/ruby/ruby/pull/13080), but please find implementation details below. ## Implementation Details This patch modifies the compiler to emit special instructions when it sees a callsite that uses ���new���. Before this patch, calling `Object.new` would result in bytecode like this: ``` ruby --dump=insns -e'Object.new' == disasm: #@-e:1 (1,0)-(1,10)> 0000 opt_getconstant_path ( 1)[Li] 0002 opt_send_without_block 0004 leave ``` With this patch, the bytecode looks like this: ``` ./ruby --dump=insns -e'Object.new' == disasm: #@-e:1 (1,0)-(1,10)> 0000 opt_getconstant_path ( 1)[Li] 0002 putnil 0003 swap 0004 opt_new , 11 0007 opt_send_without_block 0009 jump 14 0011 opt_send_without_block 0013 swap 0014 pop 0015 leave ``` The new `opt_new` instruction checks whether or not the `new` implementation is the default ���allocator��� implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to `new`. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions. Performance Improvements This patch improves performance of all allocations that use the normal ���new��� method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch): A simple `Object.new` in a hot loop improves by about 24%: ``` hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' Time (mean �� ��): 436.6 ms �� 3.3 ms [User: 432.3 ms, System: 3.8 ms] Range (min ��� max): 430.5 ms ��� 442.6 ms 10 runs Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' Time (mean �� ��): 351.1 ms �� 3.6 ms [User: 347.4 ms, System: 3.3 ms] Range (min ��� max): 343.9 ms ��� 357.4 ms 10 runs Summary ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran 1.24 �� 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ``` Using a single keyword argument is improved by about 72%: ``` > hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' Time (mean �� ��): 1.082 s �� 0.007 s [User: 1.074 s, System: 0.008 s] Range (min ��� max): 1.071 s ��� 1.091 s 10 runs Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' Time (mean �� ��): 627.6 ms �� 4.8 ms [User: 622.6 ms, System: 4.5 ms] Range (min ��� max): 622.1 ms ��� 637.2 ms 10 runs Summary ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran 1.72 �� 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ``` The performance increase depends on the number and type of parameters passed to `initialize`. For example, an `initialize` method that takes 3 parameters can see a speed improvement of ~3x: ``` aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb class Foo def initialize a:, b:, c: end end i = 0 while i < 10_000_000 Foo.new(a: 1, b: 2, c: 3) Foo.new(a: 1, b: 2, c: 3) Foo.new(a: 1, b: 2, c: 3) i += 1 end aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb" Benchmark 1: ruby --disable-gems test.rb Time (mean �� ��): 3.700 s �� 0.033 s [User: 3.681 s, System: 0.018 s] Range (min ��� max): 3.636 s ��� 3.751 s 10 runs Benchmark 2: ./ruby --disable-gems test.rb Time (mean �� ��): 1.182 s �� 0.013 s [User: 1.173 s, System: 0.008 s] Range (min ��� max): 1.165 s ��� 1.203 s 10 runs Summary ./ruby --disable-gems test.rb ran 3.13 �� 0.04 times faster than ruby --disable-gems test.rb ``` One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling ���through��� the C implementation of `Class#new`: ``` aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb class Foo def initialize a:, b:, c: end end def allocs x = GC.stat(:total_allocated_objects) yield GC.stat(:total_allocated_objects) - x end def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end test p test aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24] 2 aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24] 1 ``` ## Memory Increase Of course this patch is not ���free���. Inlining the method call adds extra YARV instructions. We estimate this patch increases `new` call sites by about 122 bytes: ``` aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb require "objspace" class Foo def initialize end end def test Foo.new end puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test))) aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24] 544 aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24] 656 ``` We���ve tested this in Shopify���s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size): ``` irb(main):001> 737191972 - 733354388 => 3837584 ``` However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2: ``` aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt 789545 sizes-inline.txt aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt 789479 sizes-3.4.txt ``` We see total heap size as reported by `memsize` to only increase by about 1MB: ``` irb(main):001> 3981075617 - 3979926505 => 1149112 ``` ## Changes to `caller` This patch changes `caller` reporting in the `initialize` method: ``` aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb require "objspace" class Foo def initialize puts caller end end def test Foo.new end test aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24] test.rb:10:in 'Class#new' test.rb:10:in 'Object#test' test.rb:13:in '
' aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24] test.rb:10:in 'Object#test' test.rb:13:in '
' ``` As you can see in the above output, the `Class#new` frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require [changes to ERB for emitting warnings](https://github.com/ruby/ruby/pull/13080/files#diff-7624f95f521b3333de8c687d70c2574aa31616cebf9504d8bcf673865fbf6ecdR475-R486). That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful: ``` aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb require "objspace" class Foo def test Object.new end end ObjectSpace.trace_object_allocations do obj = Foo.new.test puts ObjectSpace.allocation_class_path(obj) puts ObjectSpace.allocation_method_id(obj) end aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24] Class new aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24] Foo test ``` Before inlining, `ObjectSpace` would report the allocation class path and method id as `Class#new` which isn't very helpful. With the inlining patch, we can see that the object is allocated in `Foo#test`. ## Summary I think the overall memory increase is modest, and the change to `caller` is acceptable especially given the performance increase this patch provides. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/