From: samuel@... Date: 2019-07-16T02:44:58+00:00 Subject: [ruby-core:93802] [Ruby master Feature#15997] Improve performance of fiber creation by using pool allocation strategy. Issue #15997 has been updated by ioquatix (Samuel Williams). Here is comparison on Linux: ``` /home/samuel/.rvm/rubies/ruby-2.6.3/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::/home/samuel/.rvm/rubies/ruby-2.6.3/bin/ruby --disable=gems -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ $(find ../benchmark -maxdepth 1 -name '*vm2_fiber*.yml' -o -name '*vm2_fiber*.rb' | sort) Calculating ------------------------------------- compare-ruby built-ruby vm2_fiber_allocate 123.108k 171.839k i/s - 100.000k times in 0.812295s 0.581938s vm2_fiber_count 2.548k 82.950k i/s - 100.000k times in 39.248735s 1.205547s vm2_fiber_reuse 158.703 953.842 i/s - 200.000 times in 1.260218s 0.209678s vm2_fiber_switch 10.127M 13.016M i/s - 20.000M times in 1.974979s 1.536628s Comparison: vm2_fiber_allocate built-ruby: 171839.5 i/s compare-ruby: 123108.0 i/s - 1.40x slower vm2_fiber_count built-ruby: 82949.9 i/s compare-ruby: 2547.9 i/s - 32.56x slower vm2_fiber_reuse built-ruby: 953.8 i/s compare-ruby: 158.7 i/s - 6.01x slower vm2_fiber_switch built-ruby: 13015509.6 i/s compare-ruby: 10126692.4 i/s - 1.29x slower ``` With `#define FIBER_POOL_ALLOCATION_FREE`: ``` Calculating ------------------------------------- compare-ruby built-ruby vm2_fiber_allocate 123.144k 170.006k i/s - 100.000k times in 0.812060s 0.588216s vm2_fiber_count 2.528k 76.265k i/s - 100.000k times in 39.560078s 1.311221s vm2_fiber_reuse 149.002 446.903 i/s - 200.000 times in 1.342268s 0.447525s vm2_fiber_switch 10.112M 13.104M i/s - 20.000M times in 1.977840s 1.526270s Comparison: vm2_fiber_allocate built-ruby: 170005.5 i/s compare-ruby: 123143.6 i/s - 1.38x slower vm2_fiber_count built-ruby: 76264.8 i/s compare-ruby: 2527.8 i/s - 30.17x slower vm2_fiber_reuse built-ruby: 446.9 i/s compare-ruby: 149.0 i/s - 3.00x slower vm2_fiber_switch built-ruby: 13103837.6 i/s compare-ruby: 10112039.4 i/s - 1.30x slower ``` One unexpected benefit of this, is that due to better/simpler stack management, `vm2_fiber_switch` became 30% faster too. ---------------------------------------- Feature #15997: Improve performance of fiber creation by using pool allocation strategy. https://bugs.ruby-lang.org/issues/15997#change-79668 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Assignee: ko1 (Koichi Sasada) * Target version: ---------------------------------------- https://github.com/ruby/ruby/pull/2224 This PR improves the performance of fiber allocation and reuse by implementing a better stack cache. The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc. ``` // // base = +-------------------------------+-----------------------+ + // |VM Stack |VM Stack | | | // | | | | | // | | | | | // +-------------------------------+ | | // |Machine Stack |Machine Stack | | | // | | | | | // | | | | | // | | | . . . . | | size // | | | | | // | | | | | // | | | | | // | | | | | // | | | | | // +-------------------------------+ | | // |Guard Page |Guard Page | | | // +-------------------------------+-----------------------+ v // // +-------------------------------------------------------> // // count // ``` The performance improvement depends on usage: ``` Calculating ------------------------------------- compare-ruby built-ruby vm2_fiber_allocate 132.900k 180.852k i/s - 100.000k times in 0.752447s 0.552939s vm2_fiber_count 5.317k 110.724k i/s - 100.000k times in 18.806479s 0.903145s vm2_fiber_reuse 160.128 347.663 i/s - 200.000 times in 1.249003s 0.575269s vm2_fiber_switch 13.429M 13.490M i/s - 20.000M times in 1.489303s 1.482549s Comparison: vm2_fiber_allocate built-ruby: 180851.6 i/s compare-ruby: 132899.7 i/s - 1.36x slower vm2_fiber_count built-ruby: 110724.3 i/s compare-ruby: 5317.3 i/s - 20.82x slower vm2_fiber_reuse built-ruby: 347.7 i/s compare-ruby: 160.1 i/s - 2.17x slower vm2_fiber_switch built-ruby: 13490282.4 i/s compare-ruby: 13429100.0 i/s - 1.00x slower ``` This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is `master`, and "built-ruby" is `master+fiber-pool`. Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is. -- https://bugs.ruby-lang.org/ Unsubscribe: