Age | Commit message (Collapse) | Author |
|
This commit implements Objects on Variable Width Allocation. This allows
Objects with more ivars to be embedded (i.e. contents directly follow the
object header) which improves performance through better cache locality.
Notes:
Merged: https://github.com/ruby/ruby/pull/6117
|
|
mjit_compile.c should be able to access this more easily.
Notes:
Merged: https://github.com/ruby/ruby/pull/6140
|
|
This commit adds a bitfield to the iseq body that stores offsets inside
the iseq buffer that contain values we need to mark. We can use this
bitfield to mark objects instead of disassembling the instructions.
This commit also groups inline storage entries and adds a counter for
each entry. This allows us to iterate and mark each entry without
disassembling instructions
Since we have a bitfield and grouped inline caches, we can mark all
VALUE objects associated with instructions without actually
disassembling the instructions at mark time.
[Feature #18875] [ruby-core:109042]
Notes:
Merged: https://github.com/ruby/ruby/pull/6053
|
|
|
|
|
|
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
Notes:
Merged: https://github.com/ruby/ruby/pull/5698
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4951
|
|
|
|
* Improve perfomance for Integer#size method [Feature #17135]
* re-run ci
* Let MJIT frame skip work for Integer#size
Co-authored-by: Takashi Kokubun <[email protected]>
Notes:
Merged-By: k0kubun <[email protected]>
|
|
It's been a way too much amount of ifdefs.
|
|
[Bug #17584]
Notes:
Merged-By: k0kubun <[email protected]>
|
|
Make the code a bit modern and consistent with some other places.
|
|
* Inline getconstant on JIT
* Support USE_MJIT=0
Notes:
Merged-By: k0kubun <[email protected]>
|
|
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' --repeat-count=12 --alternate --output=all benchmark.yml
before --jit: ruby 3.0.0dev (2020-11-27T06:41:15Z master 8ce1711c25) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-11-27T08:36:02Z master 2c592126b9) +JIT [x86_64-linux]
last_commit=Cache access to reg_cfp->self on JIT
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 82.40522392468650 82.66023870551237 fps
82.67998539899482 83.08660305312587
85.51280693947453 87.09311989553235
86.32925337181406 87.16115255191410
87.35617494926235 87.30699391518075
87.91865339426212 88.47590342996875
88.11573661006648 88.64778616696353
88.16060826662158 88.67015079203991
88.21639244865058 89.19630739497482
88.47241577897603 89.23443637947730
89.37087287229809 89.57052723997015
89.46969964699964 89.97803363889025
```
|
|
This reverts commit 4d2c8edca69884a41d2f843d36023e3decdb9872.
Unfortunately this seems to cause several issues:
https://github.com/ruby/ruby/runs/1462188376?check_suite_focus=true
http://ci.rvm.jp/results/trunk-mjit-wait@phosphorus-docker/3272802
|
|
Performance is probably improved?
$ benchmark-driver -v --rbenv 'before --jit;after --jit' --repeat-count=12 --alternate --output=all benchmark.yml
before --jit: ruby 3.0.0dev (2020-11-27T04:37:47Z master 69e77e81dc) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-11-27T05:28:19Z master df6b05c6dd) +JIT [x86_64-linux]
last_commit=Set VM_FRAME_FLAG_FINISH at once
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 80.89292998533379 82.19497327502751 fps
80.93130641142331 85.13943315260148
81.06214830270119 87.43757879797808
82.29172808453910 87.89942441487113
84.61206450455929 87.91309779491075
85.44545883567997 87.98026086648694
86.02923132404449 88.03081060383973
86.07411817365879 88.14650206137341
86.34348799602836 88.32791633649961
87.90257338977324 88.57599644892220
88.58006509876580 88.67426384743277
89.26611118140011 88.81669430874207
This should have no bad impact on VM because this function is ALWAYS_INLINE.
|
|
to define USE_MJIT.
|
|
Thanks to Ractor (https://github.com/ruby/ruby/pull/2888 and https://github.com/ruby/ruby/pull/3662),
inline caches support parallel access now.
Notes:
Merged-By: k0kubun <[email protected]>
|
|
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.
This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
Notes:
Merged: https://github.com/ruby/ruby/pull/3662
|
|
iseq_inline_iv_cache_entry's index is also size_t.
%"PRIuSIZE" seems to print warnings against st_index_t in some environments.
|
|
when an ISeq has multiple ivar accesses.
Notes:
Merged-By: k0kubun <[email protected]>
|
|
and add a debug log
|
|
for opt_* insns.
opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is
handled by opt_send_without_block. Therefore we can't decide which insn
should be generated by checking whether it's cfunc cc or not.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux]
last_commit=Decide JIT-ed insn based on cached cfunc
Calculating -------------------------------------
before --jit after --jit
mjit_nil?(1) 73.878M 74.021M i/s - 40.000M times in 0.541432s 0.540391s
mjit_not(1) 72.635M 74.601M i/s - 40.000M times in 0.550702s 0.536187s
mjit_eq(1, nil) 7.331M 7.445M i/s - 8.000M times in 1.091211s 1.074596s
mjit_eq(nil, 1) 49.450M 64.711M i/s - 8.000M times in 0.161781s 0.123627s
Comparison:
mjit_nil?(1)
after --jit: 74020528.4 i/s
before --jit: 73878185.9 i/s - 1.00x slower
mjit_not(1)
after --jit: 74600882.0 i/s
before --jit: 72634507.6 i/s - 1.03x slower
mjit_eq(1, nil)
after --jit: 7444657.4 i/s
before --jit: 7331304.3 i/s - 1.02x slower
mjit_eq(nil, 1)
after --jit: 64710790.6 i/s
before --jit: 49449507.4 i/s - 1.31x slower
```
|
|
only for opt_nil_p and opt_not.
While vm_method_cfunc_is is used for opt_eq too, many fast paths of it
don't call it. So if it's populated, it should generate opt_send,
regardless of cfunc or not. And again, opt_neq isn't relevant due to the
difference in operands.
So opt_nil_p and opt_not are the only variants using vm_method_cfunc_is
like they use.
```
$ benchmark-driver -v --rbenv 'before2 --jit::ruby --jit;before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before2 --jit: ruby 2.8.0dev (2020-06-22T08:37:37Z master 3238641750) +JIT [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-06-23T01:01:24Z master 9ce2066209) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-23T06:58:37Z master 17e9df3157) +JIT [x86_64-linux]
last_commit=Avoid generating opt_send with cfunc cc with JIT
Calculating -------------------------------------
before2 --jit before --jit after --jit
mjit_nil?(1) 54.204M 75.536M 75.031M i/s - 40.000M times in 0.737947s 0.529548s 0.533110s
mjit_not(1) 53.822M 70.921M 71.920M i/s - 40.000M times in 0.743195s 0.564007s 0.556171s
mjit_eq(1, nil) 7.367M 6.496M 7.331M i/s - 8.000M times in 1.085882s 1.231470s 1.091327s
Comparison:
mjit_nil?(1)
before --jit: 75536059.3 i/s
after --jit: 75031409.4 i/s - 1.01x slower
before2 --jit: 54204431.6 i/s - 1.39x slower
mjit_not(1)
after --jit: 71920324.1 i/s
before --jit: 70921063.1 i/s - 1.01x slower
before2 --jit: 53821697.6 i/s - 1.34x slower
mjit_eq(1, nil)
before2 --jit: 7367280.0 i/s
after --jit: 7330527.4 i/s - 1.01x slower
before --jit: 6496302.8 i/s - 1.13x slower
```
|
|
because opt_nil/opt_not/opt_eq populates cc even when it doesn't
fallback to opt_send_without_block because of vm_method_cfunc_is.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master d231b8f95b) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux]
last_commit=Compile opt_send for opt_* only when cc has ISeq
Calculating -------------------------------------
before --jit after --jit
mjit_nil?(1) 54.106M 73.693M i/s - 40.000M times in 0.739288s 0.542795s
mjit_not(1) 53.398M 74.477M i/s - 40.000M times in 0.749090s 0.537075s
mjit_eq(1, nil) 7.427M 6.497M i/s - 8.000M times in 1.077136s 1.231326s
Comparison:
mjit_nil?(1)
after --jit: 73692594.3 i/s
before --jit: 54106108.4 i/s - 1.36x slower
mjit_not(1)
after --jit: 74477487.9 i/s
before --jit: 53398125.0 i/s - 1.39x slower
mjit_eq(1, nil)
before --jit: 7427105.9 i/s
after --jit: 6497063.0 i/s - 1.14x slower
```
Actually opt_eq becomes slower by this. Maybe it's indeed using
opt_send_without_block, but I'll approach that one in another commit.
|
|
[Feature #15589]
Notes:
Merged-By: k0kubun <[email protected]>
|
|
by calling combined functions specialized for each cancel type.
I'm hoping to improve locality of hot code, but this patch's impact should
be insignificant.
|
|
|
|
by inlining only hot path.
=== mame/optcarrot ===
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark.yml --repeat-count=24 --output=all
before --jit: ruby 2.8.0dev (2020-05-18T05:21:31Z master 0e5a58b6bf) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-05-18T06:12:04Z master 0e3d71a8d1) +JIT [x86_64-linux]
last_commit=Reduce code size for rb_class_of
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 71.62880463568773 70.95730063273503 fps
71.73973684273152 71.98447841929851
75.03923801841310 75.54262519509039
75.16300287174957 77.64029272984344
75.16834828625935 78.67861469580785
75.17670723726911 78.81879353707393
75.67637908020630 79.18188850392886
76.19843953215396 79.66484891814478
77.28166716118808 79.80278072861037
77.38509903325165 80.05859292679696
78.12693418455953 80.34624804808006
78.73654441746730 80.66326571254345
79.25387513454415 80.69760605740196
79.44137881689524 81.32053489212245
79.50497657368358 81.50250852553751
79.62401328582868 82.27544931834611
79.79178811723664 82.67455264522741
81.20275352937418 82.93857260493297
81.57027048640776 83.15019118788184
81.63373188649095 83.20728816044721
81.93420437766426 83.25027576772972
82.05716136357167 83.27072145898173
82.21070805525066 83.36008265822194
82.56924063784872 83.36112268888493
=== benchmark-driver/sinatra ===
[rps]
before: 13143.49 rps
after: 13505.70 rps
[inlined rb_class_of size]
before: 11.5K
after: 3.8K
(calculated by `dwarftree --die inlined_subroutine --flat --merge --show-size`)
|
|
To fix build failures.
Notes:
Merged: https://github.com/ruby/ruby/pull/3079
|
|
This shall fix compile errors.
Notes:
Merged: https://github.com/ruby/ruby/pull/3079
|
|
Now this one is actually not in use because we override entire leave
definition for JIT.
|
|
Trying to debug errors like
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/2921397
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/2894526
|
|
|
|
|
|
to child compile_status
|
|
I'm trying to make it possible to include all JIT-ed code in a single C
file. This is needed to guarantee uniqueness of all function names
|
|
Split ruby.h
Notes:
Merged-By: shyouhei <[email protected]>
|
|
JIT support of dd723771c11.
$ benchmark-driver -v --rbenv 'before;before --jit;after --jit' benchmark/mjit_exivar.yml --repeat-count=4
before: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-03-31T05:57:24Z mjit-exivar 128625baec) +JIT [x86_64-linux]
Calculating -------------------------------------
before before --jit after --jit
mjit_exivar 57.944M 53.579M 54.471M i/s - 200.000M times in 3.451588s 3.732772s 3.671687s
Comparison:
mjit_exivar
before: 57944345.1 i/s
after --jit: 54470876.7 i/s - 1.06x slower
before --jit: 53579483.4 i/s - 1.08x slower
|
|
OpenBSD RubyCI has failed with SEGV since 4bcd5981e80d3e1852c8723741a0069779464128.
https://rubyci.org/logs/rubyci.s3.amazonaws.com/openbsd-current/ruby-master/log/20200312T223005Z.fail.html.gz
This was because `status->cc_entries` could be stale after `realloc` call
for inlined iseqs.
|
|
jit_unit to avoid marking wrong cc entries when inlined iseq is compiled
multiple times, resolving the TODO added by daf7c48d88.
This obviates pseudo jit_unit in inlined iseq introduced by 7ec2359374
and fixes memory leak of the adhoc unit.
|
|
Fixed a TODO in b9007b6c548f91e88fd3f2ffa23de740431fa969
|
|
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Notes:
Merged: https://github.com/ruby/ruby/pull/2888
|
|
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
Notes:
Merged: https://github.com/ruby/ruby/pull/2888
|
|
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead. This would significantly
speed up incremental builds.
We take the following inclusion order in this changeset:
1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very
first thing among everything).
2. RUBY_EXTCONF_H if any.
3. Standard C headers, sorted alphabetically.
4. Other system headers, maybe guarded by #ifdef
5. Everything else, sorted alphabetically.
Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
Notes:
Merged: https://github.com/ruby/ruby/pull/2711
|
|
https://ci.appveyor.com/project/ruby/ruby/builds/29001248/job/ye80bsrmewdgw294
|
|
Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]
Several features:
(1) Load .rb file at boottime with native binary.
Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.
(2) __builtin_func() in Ruby call func() written in C.
In Ruby file, we can write `__builtin_func()` like method call.
However this is not a method call, but special syntax to call
a function `func()` written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.
Functions (`func` in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however `rb_execution_context_t *ec`
is new requirement.
(3) automatic C code generation from .rb files.
tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of __builtin_
prefix method calls, and generate a part of C code to export
functions.
tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.
Notes:
Merged: https://github.com/ruby/ruby/pull/2655
|
|
Prior to this changeset, majority of inline cache mishits resulted
into the same method entry when rb_callable_method_entry() resolves
a method search. Let's not call the function at the first place on
such situations.
In doing so we extend the struct rb_call_cache from 44 bytes (in
case of 64 bit machine) to 64 bytes, and fill the gap with
secondary class serial(s). Call cache's class serials now behavies
as a LRU cache.
Calculating -------------------------------------
ours 2.7 2.6
vm2_poly_same_method 2.339M 1.744M 1.369M i/s - 6.000M times in 2.565086s 3.441329s 4.381386s
Comparison:
vm2_poly_same_method
ours: 2339103.0 i/s
2.7: 1743512.3 i/s - 1.34x slower
2.6: 1369429.8 i/s - 1.71x slower
Notes:
Merged: https://github.com/ruby/ruby/pull/2583
|
|
To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.
This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.
Complications:
- A new instruction attribute `comptime_sp_inc` is introduced to
calculate SP increase at compile time without using call caches. At
compile time, a `TS_CALLDATA` operand points to a call info struct, but
at runtime, the same operand points to a call data struct. Instruction
that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
- MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
[Misc #16258]
Notes:
Merged: https://github.com/ruby/ruby/pull/2564
|
|
This partially reverts commit 712a66b0741605f5b2db670a292b9bb352f8a716.
The previous fix made CI strange like:
http://ci.rvm.jp/results/trunk-vm-asserts@silicon-docker/2124178
Let me just downgrade the behavior for now and deal with it later.
[Bug #15971]
|