Age | Commit message (Collapse) | Author |
|
Notes:
Merged-By: k0kubun <[email protected]>
|
|
Previously YARV bytecode implemented constant caching by having a pair
of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a
series of getconstant calls (with putobject providing supporting
arguments).
This commit replaces that pattern with a new instruction,
opt_getconstant_path, handling both getting/setting the inline cache and
fetching the constant on a cache miss.
This is implemented by storing the full constant path as a
null-terminated array of IDs inside of the IC structure. idNULL is used
to signal an absolute constant reference.
$ ./miniruby --dump=insns -e '::Foo::Bar::Baz'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
0000 opt_getconstant_path <ic:0 ::Foo::Bar::Baz> ( 1)[Li]
0002 leave
The motivation for this is that we had increasingly found the need to
disassemble the instructions between the opt_getinlinecache and
opt_setinlinecache in order to determine the constant we are fetching,
or otherwise store metadata.
This disassembly was done:
* In opt_setinlinecache, to register the IC against the constant names
it is using for granular invalidation.
* In rb_iseq_free, to unregister the IC from the invalidation table.
* In YJIT to find the position of a opt_getinlinecache instruction to
invalidate it when the cache is populated
* In YJIT to register the constant names being used for invalidation.
With this change we no longe need disassemly for these (in fact
rb_iseq_each is now unused), as the list of constant names being
referenced is held in the IC. This should also make it possible to make
more optimizations in the future.
This may also reduce the size of iseqs, as previously each segment
required 32 bytes (on 64-bit platforms) for each constant segment. This
implementation only stores one ID per-segment.
There should be no significant performance change between this and the
previous implementation. Previously opt_getinlinecache was a "leaf"
instruction, but it included a jump (almost always to a separate cache
line). Now opt_getconstant_path is a non-leaf (it may
raise/autoload/call const_missing) but it does not jump. These seem to
even out.
Notes:
Merged: https://github.com/ruby/ruby/pull/6187
|
|
I'm planning to introduce mjit_compiler.rb, and I want to make this
consistent with it. Consistency with compile.c doesn't seem important
for MJIT anyway.
|
|
on interruption.
The cancellation code was originally written for leave insn, but re-entering
opt_invokebuiltin_delegate_leave insn on a cancellation is not safe, because
a builtin function is executed twice.
|
|
Make the code a bit modern and consistent with some other places.
|
|
* Inline getconstant on JIT
* Support USE_MJIT=0
Notes:
Merged-By: k0kubun <[email protected]>
|
|
and fix inconsistent indentation in mjit_compile.inc.erb
|
|
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' --repeat-count=12 --alternate --output=all benchmark.yml
before --jit: ruby 3.0.0dev (2020-11-27T06:41:15Z master 8ce1711c25) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-11-27T08:36:02Z master 2c592126b9) +JIT [x86_64-linux]
last_commit=Cache access to reg_cfp->self on JIT
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 82.40522392468650 82.66023870551237 fps
82.67998539899482 83.08660305312587
85.51280693947453 87.09311989553235
86.32925337181406 87.16115255191410
87.35617494926235 87.30699391518075
87.91865339426212 88.47590342996875
88.11573661006648 88.64778616696353
88.16060826662158 88.67015079203991
88.21639244865058 89.19630739497482
88.47241577897603 89.23443637947730
89.37087287229809 89.57052723997015
89.46969964699964 89.97803363889025
```
|
|
Instead of doubling the invokebuiltin logic here and there, use the same
insns.def definition for both MJIT/non-JIT situations.
Notes:
Merged: https://github.com/ruby/ruby/pull/3305
|
|
Noticed that struct rb_builtin_function is a purely compile-time
constant. MJIT can eliminate some runtime calculations by statically
generate dedicated C code generator for each builtin functions.
Notes:
Merged: https://github.com/ruby/ruby/pull/3305
|
|
for opt_* insns.
opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is
handled by opt_send_without_block. Therefore we can't decide which insn
should be generated by checking whether it's cfunc cc or not.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux]
last_commit=Decide JIT-ed insn based on cached cfunc
Calculating -------------------------------------
before --jit after --jit
mjit_nil?(1) 73.878M 74.021M i/s - 40.000M times in 0.541432s 0.540391s
mjit_not(1) 72.635M 74.601M i/s - 40.000M times in 0.550702s 0.536187s
mjit_eq(1, nil) 7.331M 7.445M i/s - 8.000M times in 1.091211s 1.074596s
mjit_eq(nil, 1) 49.450M 64.711M i/s - 8.000M times in 0.161781s 0.123627s
Comparison:
mjit_nil?(1)
after --jit: 74020528.4 i/s
before --jit: 73878185.9 i/s - 1.00x slower
mjit_not(1)
after --jit: 74600882.0 i/s
before --jit: 72634507.6 i/s - 1.03x slower
mjit_eq(1, nil)
after --jit: 7444657.4 i/s
before --jit: 7331304.3 i/s - 1.02x slower
mjit_eq(nil, 1)
after --jit: 64710790.6 i/s
before --jit: 49449507.4 i/s - 1.31x slower
```
|
|
only for opt_nil_p and opt_not.
While vm_method_cfunc_is is used for opt_eq too, many fast paths of it
don't call it. So if it's populated, it should generate opt_send,
regardless of cfunc or not. And again, opt_neq isn't relevant due to the
difference in operands.
So opt_nil_p and opt_not are the only variants using vm_method_cfunc_is
like they use.
```
$ benchmark-driver -v --rbenv 'before2 --jit::ruby --jit;before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before2 --jit: ruby 2.8.0dev (2020-06-22T08:37:37Z master 3238641750) +JIT [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-06-23T01:01:24Z master 9ce2066209) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-23T06:58:37Z master 17e9df3157) +JIT [x86_64-linux]
last_commit=Avoid generating opt_send with cfunc cc with JIT
Calculating -------------------------------------
before2 --jit before --jit after --jit
mjit_nil?(1) 54.204M 75.536M 75.031M i/s - 40.000M times in 0.737947s 0.529548s 0.533110s
mjit_not(1) 53.822M 70.921M 71.920M i/s - 40.000M times in 0.743195s 0.564007s 0.556171s
mjit_eq(1, nil) 7.367M 6.496M 7.331M i/s - 8.000M times in 1.085882s 1.231470s 1.091327s
Comparison:
mjit_nil?(1)
before --jit: 75536059.3 i/s
after --jit: 75031409.4 i/s - 1.01x slower
before2 --jit: 54204431.6 i/s - 1.39x slower
mjit_not(1)
after --jit: 71920324.1 i/s
before --jit: 70921063.1 i/s - 1.01x slower
before2 --jit: 53821697.6 i/s - 1.34x slower
mjit_eq(1, nil)
before2 --jit: 7367280.0 i/s
after --jit: 7330527.4 i/s - 1.01x slower
before --jit: 6496302.8 i/s - 1.13x slower
```
|
|
because opt_nil/opt_not/opt_eq populates cc even when it doesn't
fallback to opt_send_without_block because of vm_method_cfunc_is.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master d231b8f95b) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux]
last_commit=Compile opt_send for opt_* only when cc has ISeq
Calculating -------------------------------------
before --jit after --jit
mjit_nil?(1) 54.106M 73.693M i/s - 40.000M times in 0.739288s 0.542795s
mjit_not(1) 53.398M 74.477M i/s - 40.000M times in 0.749090s 0.537075s
mjit_eq(1, nil) 7.427M 6.497M i/s - 8.000M times in 1.077136s 1.231326s
Comparison:
mjit_nil?(1)
after --jit: 73692594.3 i/s
before --jit: 54106108.4 i/s - 1.36x slower
mjit_not(1)
after --jit: 74477487.9 i/s
before --jit: 53398125.0 i/s - 1.39x slower
mjit_eq(1, nil)
before --jit: 7427105.9 i/s
after --jit: 6497063.0 i/s - 1.14x slower
```
Actually opt_eq becomes slower by this. Maybe it's indeed using
opt_send_without_block, but I'll approach that one in another commit.
|
|
[Feature #15589]
Notes:
Merged-By: k0kubun <[email protected]>
|
|
* Remove obsoleted opt_call_c_function insn
* Keep opt_call_c_function with DEFINE_INSN_IF
Notes:
Merged-By: k0kubun <[email protected]>
|
|
|
|
Even if local stack optimization is not used and values are written to
VM stack, the stack pointer itself may not be moved properly. So this
should be always moved on JIT cancellation.
By the way it's hard to write a test for this because if we try to
generate an interrupt, it will be a method call and it consumes the
interrupt by itself on popping a frame.
|
|
to eliminate sp / pc moves by cancelling JIT execution on interrupts.
$ benchmark-driver benchmark.yml -v --rbenv 'before --jit;after --jit' --repeat-count=12 --output=all
before --jit: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-01T04:58:01Z master 39beb26a27) +JIT [x86_64-linux]
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 75.06409603894944 76.06422026555558 fps
75.12025067279242 78.48161731616810
77.42020273492177 79.78958240950033
79.07253675128945 79.88645902325614
79.99179109732327 80.33743931749331
80.07633091008627 80.53790081529166
80.15450942667547 80.99048270668010
80.48372803283709 81.70497146081003
80.57410149187352 82.79494539467382
81.80449157081202 82.85797792223954
82.24629397834902 83.00603891515506
82.63708148686703 83.23221006969828
$ benchmark-driver -v --rbenv 'before;before --jit;after --jit' benchmark/mjit_leave.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-04-01T03:48:56Z master 5a81562dfe) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-04-01T04:58:01Z master 39beb26a27) +JIT [x86_64-linux]
Calculating -------------------------------------
before before --jit after --jit
mjit_leave 106.656M 82.786M 91.635M i/s - 200.000M times in 1.875183s 2.415881s 2.182569s
Comparison:
mjit_leave
before: 106656239.9 i/s
after --jit: 91635143.7 i/s - 1.16x slower
before --jit: 82785537.2 i/s - 1.29x slower
|
|
OpenBSD RubyCI has failed with SEGV since 4bcd5981e80d3e1852c8723741a0069779464128.
https://rubyci.org/logs/rubyci.s3.amazonaws.com/openbsd-current/ruby-master/log/20200312T223005Z.fail.html.gz
This was because `status->cc_entries` could be stale after `realloc` call
for inlined iseqs.
|
|
jit_unit to avoid marking wrong cc entries when inlined iseq is compiled
multiple times, resolving the TODO added by daf7c48d88.
This obviates pseudo jit_unit in inlined iseq introduced by 7ec2359374
and fixes memory leak of the adhoc unit.
|
|
Fixed a TODO in b9007b6c548f91e88fd3f2ffa23de740431fa969
|
|
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Notes:
Merged: https://github.com/ruby/ruby/pull/2888
|
|
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]
Notes:
Merged: https://github.com/ruby/ruby/pull/2564
|
|
|
|
for ISeq including only leaf and no-handles_sp insns except leaf.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67574 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67554 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
for opt_aref with inline cache to minimize the possibility of JIT cancel.
Also opt_aset and opt_mod are added for the targets.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
and functions to clarify the intention and make sure it's not used in a
surprising way (like using 2, 3, ... other than 0, 1 even while it seems
to be a boolean).
This is a retry of r66775. It included some typos...
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66778 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
These settings are now covered by .dir-locals.el.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66584 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
mjit_compile.inc.erb: ditto
common.mk: update dependency for the rename from getivar.erb
=== Optcarrot benchmark ===
```
$ benchmark-driver benchmark.yml --rbenv '2.0.0::2.0.0-p648 --disable-gems;before::before --disable-gems;before+JIT::before --disable-gems --jit;after::after --disable-gems;after+JIT::after --disable-gems --jit' -v --repeat-count 24
2.0.0: ruby 2.0.0p648 (2015-12-16 revision 53162) [x86_64-linux]
before: ruby 2.6.0dev (2018-10-14 trunk 65074) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-10-14 trunk 65074) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-10-14 trunk 65074) [x86_64-linux]
after+JIT: ruby 2.6.0dev (2018-10-14 trunk 65074) +JIT [x86_64-linux]
Calculating -------------------------------------
2.0.0 before before+JIT after after+JIT
Optcarrot Lan_Master.nes 34.434 53.125 84.782 53.321 86.812 fps
Comparison:
Optcarrot Lan_Master.nes
after+JIT: 86.8 fps
before+JIT: 84.8 fps - 1.02x slower
after: 53.3 fps - 1.63x slower
before: 53.1 fps - 1.63x slower
2.0.0: 34.4 fps - 2.52x slower
```
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65076 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
by inlining index (and serial to invalidate that) and simplifying the
branch by using JIT cancellation.
mjit_compile.inc.erb: use the above file
mjit_compile.c: copy USE_IC_FOR_IVAR definition. will move this to
another shared file later.
common.mk: add new dependency
test/ruby/test_jit.rb: cover this case
=== Optcarrot benchmark ===
```
$ benchmark-driver benchmark.yml --rbenv '2.0.0::2.0.0-p648;before::before --disable-gems;before+JIT::before --disable-gems --jit;after::after --disable-gems;after+JIT::after --disable-gems --jit' -v --repeat-count 24
2.0.0: ruby 2.0.0p648 (2015-12-16 revision 53162) [x86_64-linux]
before: ruby 2.6.0dev (2018-10-14 trunk 65072) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-10-14 trunk 65072) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-10-14 trunk 65072) [x86_64-linux]
last_commit=_mjit_compile_getivar.erb: optimize IC-hit getivar
after+JIT: ruby 2.6.0dev (2018-10-14 trunk 65072) +JIT [x86_64-linux]
last_commit=_mjit_compile_getivar.erb: optimize IC-hit getivar
Calculating -------------------------------------
2.0.0 before before+JIT after after+JIT
Optcarrot Lan_Master.nes 36.065 53.896 71.565 53.856 84.747 fps
Comparison:
Optcarrot Lan_Master.nes
after+JIT: 84.7 fps
before+JIT: 71.6 fps - 1.18x slower
before: 53.9 fps - 1.57x slower
after: 53.9 fps - 1.57x slower
2.0.0: 36.1 fps - 2.35x slower
```
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65073 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
This retries r62655, which was reverted at r63863 for r63763.
tool/ruby_vm/views/_mjit_compile_insn.erb: revert the revert.
tool/ruby_vm/views/_mjit_compile_insn_body.erb: ditto.
tool/ruby_vm/views/_mjit_compile_pc_and_sp.erb: ditto.
tool/ruby_vm/views/_mjit_compile_send.erb: ditto.
tool/ruby_vm/views/mjit_compile.inc.erb: ditto.
tool/ruby_vm/views/_insn_entry.erb: revert half of r63763. The commit
was originally reverted since changing pc motion was bad for tracing,
but changing sp motion was totally fine. For JIT, I wanna resurrect
the sp motion change in r62051.
tool/ruby_vm/models/bare_instructions.rb: ditto.
insns.def: ditto.
vm_insnhelper.c: ditto.
vm_insnhelper.h: ditto.
* benchmark
$ benchmark-driver benchmark.yml --rbenv 'before;after;before --jit;after --jit' --repeat-count 12 -v
before: ruby 2.6.0dev (2018-07-19 trunk 63998) [x86_64-linux]
after: ruby 2.6.0dev (2018-07-19 add-sp 63998) [x86_64-linux]
last_commit=mjit_compile.c: reduce sp motion on JIT
before --jit: ruby 2.6.0dev (2018-07-19 trunk 63998) +JIT [x86_64-linux]
after --jit: ruby 2.6.0dev (2018-07-19 add-sp 63998) +JIT [x86_64-linux]
last_commit=mjit_compile.c: reduce sp motion on JIT
Calculating -------------------------------------
before after before --jit after --jit
Optcarrot Lan_Master.nes 51.354 50.238 70.010 72.139 fps
Comparison:
Optcarrot Lan_Master.nes
after --jit: 72.1 fps
before --jit: 70.0 fps - 1.03x slower
before: 51.4 fps - 1.40x slower
after: 50.2 fps - 1.44x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63999 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
Due to trunk-mjit CI failures:
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1130097
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1130196
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63991 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
This optimization was reverted on r63863, but this commit resurrects the
optimization to skip some sp motions on JIT execution.
tool/ruby_vm/views/_mjit_compile_insn_body.erb: ditto
tool/ruby_vm/views/_mjit_compile_insn.erb: ditto
insns.def: resurrect handles_frame as handles_stack, which was deleted
on r63763.
tool/ruby_vm/models/bare_instructions.rb: ditto
vm_insnhelper.c: prevent moving sp outside insns.def to allow modifying
it by JIT.
* Optcarrot benchmark
$ benchmark-driver benchmark.yml --rbenv 'before --jit;after --jit' --repeat-count 12 -v
before --jit: ruby 2.6.0dev (2018-07-17 trunk 63987) +JIT [x86_64-linux]
after --jit: ruby 2.6.0dev (2018-07-17 local-stack 63987) +JIT [x86_64-linux]
last_commit=mjit_compile.c: resurrect local variable stack
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 70.518 72.144 fps
Comparison:
Optcarrot Lan_Master.nes
after --jit: 72.1 fps
before --jit: 70.5 fps - 1.02x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63988 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
r63655 was tightly coupled to handle_frames and some assumptions seems
to have been broken by r63763.
To partially resolve Bug#14892, this reverts the optimization for now. I
want to make MJIT CI happy first and then I'll probably retry r63655 by
partially reverting r63763 for sp changes.
The skipped test is not fixed yet.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63863 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
It never becomes `dispatched: true` with the current code.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63751 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
between branches.
mjit_compile.inc.erb: move the compiled_for_pos reference to
mjit_compile.c
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63331 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
on JIT compilation. r63092 was risky without this check.
mjit_compile.c: update comment about stack consistency check
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63329 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
if catch_except_p is FALSE. If catch_except_p is TRUE, stack values
should be on VM's stack when exception is thrown and the JIT-ed frame
is re-executed by VM's exception handler. If it's FALSE, the JIT-ed
frame won't be re-executed and don't need to keep values on VM's stack.
Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp
is needed only for insns whose handles_frame? is false. So it improves
performance.
_mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP,
STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view.
Use cancel handler created in mjit_compile.c.
_mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is
TRUE, this stops to call mjit_exec directly. I described the reason in
vm_insnhelper.h's comment for EXEC_EC_CFP.
_mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you
can see from thsi file, when status->local_stack_p is TRUE and
insn.handles_frame? is false, moving sp is skipped. But if
insn.handles_frame? is true, values should be rolled back to VM's stack.
common.mk: add dependency for the file
_mjit_compile_insn_body.erb: Set sp value before canceling JIT on
DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros
for the case ocal_stack_p is TRUE and insn.handles_frame? is false.
In that case, values are not available on VM's stack and those macros
should be replaced.
mjit_compile.inc.erb: updated comments of macros which are supported by
JIT compiler. All references to `cfp->sp` should be replaced and thus
INC_SP, SET_SV, PUSH are no longer supported for now, because they are
not used now.
vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's
tighly coupled to CALL_METHOD.
vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h.
Now it triggers mjit_exec for VM, and has the guard for catch_except_p
on JIT-ed code. See comments for details. CALL_METHOD delegates
triggering mjit_exec to EXEC_EC_CFP.
insns.def: Stopped using EXEC_EC_CFP for the case we don't want to
trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are
not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN().
expandarray is changed to pass GET_SP() to replace the macro in
_mjit_compile_insn_body.erb.
vm_insnhelper.c: change to take sp for the above reason.
[close https://github.com/ruby/ruby/pull/1828]
This patch resurrects the performance which was attached in
[Feature #14235].
* Benchmark
Optcarrot (with configuration for benchmark_driver.gem)
https://github.com/benchmark-driver/optcarrot
$ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10
before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
Calculating -------------------------------------
before before+JIT after after+JIT
optcarrot 53.552 59.680 53.697 63.358 fps
Comparison:
optcarrot
after+JIT: 63.4 fps
before+JIT: 59.7 fps - 1.06x slower
after: 53.7 fps - 1.18x slower
before: 53.6 fps - 1.18x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
to count up total calls properly. Some places (especially CALL_METHOD)
invoke mjit_exec twice for one method call. It would be problematic when
debugging, or possibly it would result in a wrong profiling result.
This commit doesn't have impact for performance:
* Optcarrot benchmark
** before
fps: 59.37757770848619
fps: 56.49998488958699
fps: 59.07900362739362
fps: 58.924749807695996
fps: 57.667905665594894
fps: 57.540021018385254
fps: 59.5518055679647
fps: 55.93831555148311
fps: 57.82685112863262
fps: 59.22391754481736
checksum: 59662
** after
fps: 58.461881158098194
fps: 59.32685183081354
fps: 54.11334310279802
fps: 59.2281560439788
fps: 58.60495705318312
fps: 55.696478648491045
fps: 58.49003452654724
fps: 58.387771929393224
fps: 59.24156772816439
fps: 56.68804731968107
checksum: 59662
* Discourse
Your Results: (note for timings- percentile is first, duration is second in millisecs)
** before (without JIT)
categories_admin:
50: 16
75: 17
90: 24
99: 37
home_admin:
50: 20
75: 20
90: 24
99: 42
topic_admin:
50: 16
75: 16
90: 18
99: 28
categories:
50: 36
75: 37
90: 45
99: 68
home:
50: 38
75: 40
90: 53
99: 92
topic:
50: 14
75: 15
90: 17
99: 26
** after (without JIT)
categories_admin:
50: 16
75: 16
90: 24
99: 36
home_admin:
50: 19
75: 20
90: 23
99: 41
topic_admin:
50: 16
75: 16
90: 19
99: 33
categories:
50: 35
75: 36
90: 44
99: 61
home:
50: 38
75: 40
90: 52
99: 101
topic:
50: 14
75: 15
90: 15
99: 24
** before (with JIT)
categories_admin:
50: 19
75: 23
90: 29
99: 44
home_admin:
50: 24
75: 26
90: 32
99: 46
topic_admin:
50: 20
75: 22
90: 27
99: 44
categories:
50: 41
75: 43
90: 51
99: 66
home:
50: 46
75: 49
90: 56
99: 68
topic:
50: 18
75: 19
90: 22
99: 31
** after (with JIT)
categories_admin:
50: 18
75: 21
90: 28
99: 42
home_admin:
50: 23
75: 25
90: 31
99: 51
topic_admin:
50: 19
75: 20
90: 24
99: 31
categories:
50: 41
75: 44
90: 52
99: 69
home:
50: 45
75: 48
90: 61
99: 88
topic:
50: 19
75: 20
90: 24
99: 33
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62641 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
with opt_send_without_block insn if call cache has valid ISeq.
If the receiver is not optimized target of opt_key (i.e. Hash or Array),
it triggers JIT cancel and it would be slow.
This change allows JIT to drop the check for Hash/Array and continue to
execute JIT even if the receiver is not Hash or Array.
See the following benchmark results. It's not improved so much, but it
would be effective when we achieve Ruby method inlining in
_mjit_compile_send.erb.
* Micro benchmark
Given the following bench.rb,
```
class HashWithIndifferentAccess < Hash
def []=(key, value)
super(key.to_s, value)
end
def [](key)
super(key.to_s)
end
end
indhash = HashWithIndifferentAccess.new
indhash[:foo] = 'bar'
key = 'foo'
100000000.times do
indhash[key]
end
```
** before
```
$ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb
JIT success (31.4ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p18206u0.c
JIT success (669.3ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p18206u1.c
Successful MJIT finish
./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 12.21s user 0.04s system 107% cpu 11.394 total
```
** after
```
$ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb
JIT success (41.0ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p17293u0.c
JIT success (679.0ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p17293u1.c
Successful MJIT finish
./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 11.54s user 0.06s system 108% cpu 10.726 total
```
The execution time is shortened.
* optcarrot benchmark
Optcarrot has no room to be improved by this change. Almost nothing is changed.
fps: 59.54 (before) -> 59.51 (after)
* discourse benchmark
I expected this to be improved a little, but it isn't too.
** before (JIT)
```
categories_admin:
50: 12
75: 13
90: 14
99: 22
home_admin:
50: 12
75: 13
90: 16
99: 22
topic_admin:
50: 12
75: 13
90: 15
99: 21
categories:
50: 18
75: 19
90: 23
99: 27
home:
50: 3
75: 4
90: 4
99: 12
topic:
50: 11
75: 11
90: 14
99: 20
```
** after (JIT)
```
categories_admin:
50: 12
75: 12
90: 16
99: 24
home_admin:
50: 12
75: 12
90: 14
99: 21
topic_admin:
50: 12
75: 13
90: 16
99: 21
categories:
50: 17
75: 18
90: 23
99: 32
home:
50: 3
75: 4
90: 4
99: 10
topic:
50: 11
75: 12
90: 13
99: 20
```
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62398 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
mjit_compile.inc.erb: show unsupported insn name on --jit-verbose=1 too.
Also, removed osboleted workaround. Now some insn-related functions are
declared with MAYBE_UNUSED.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
and style of generated code.
I've used 2-space indentation at first but at some moment I started to
use insns.def contents for generated code. So the 4-space indentation
was introduced. But it does no longer make sense.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62255 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <[email protected]>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <[email protected]>
Contributor: wanabe <[email protected]>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|