Age | Commit message (Collapse) | Author |
|
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.
Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.
This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]
Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
Notes:
Merged: https://github.com/ruby/ruby/pull/4129
|
|
vm_cc_invalidated_p() returns false when the cme is *NOT*
invalidated.
Notes:
Merged: https://github.com/ruby/ruby/pull/4091
|
|
if cc is invalidated, cc should be released from iseq.
Notes:
Merged: https://github.com/ruby/ruby/pull/4030
|
|
`cd` is passed to method call functions to method invocation
functions, but `cd` can be manipulated by other ractors simultaneously
so it contains thread-safety issue.
To solve this issue, this patch stores `ci` and found `cc` to `calling`
and stops to pass `cd`.
Notes:
Merged: https://github.com/ruby/ruby/pull/3903
|
|
|
|
It was a wrong idea to assume CIs are always embedded.
Notes:
Merged: https://github.com/ruby/ruby/pull/3179
|
|
This changeset reduces the generated binary of rb_vm_call0 from 281
bytes to 211 bytes on my machine. Should reduce GC pressure as well.
Notes:
Merged: https://github.com/ruby/ruby/pull/3179
|
|
This changeset reduces the generated binary of rb_equal_opt from 129 bytes
to 17 bytes on my machine, according to nm(1).
Notes:
Merged: https://github.com/ruby/ruby/pull/3179
|
|
CIs are created on-the-fly, which increases GC pressure. However they
include no references to other objects, and those on-the-fly CIs tend to
be short lived. Why not skip allocation of them. In doing so we need
to add a flag denotes the CI object does not reside inside of objspace.
Notes:
Merged: https://github.com/ruby/ruby/pull/3179
|
|
Missed to commit a staged change.
|
|
According to MSVC manual (*1), cl.exe can skip including a header file
when that:
- contains #pragma once, or
- starts with #ifndef, or
- starts with #if ! defined.
GCC has a similar trick (*2), but it acts more stricter (e. g. there
must be _no tokens_ outside of #ifndef...#endif).
Sun C lacked #pragma once for a looong time. Oracle Developer Studio
12.5 finally implemented it, but we cannot assume such recent version.
This changeset modifies header files so that each of them include
strictly one #ifndef...#endif. I believe this is the most portable way
to trigger compiler optimizations. [Bug #16770]
*1: https://docs.microsoft.com/en-us/cpp/preprocessor/once
*2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html
Notes:
Merged: https://github.com/ruby/ruby/pull/3023
|
|
Accessing past the end of an array is technically UB. Use C99 flexible
array member instead to avoid the UB and simplify allocation size
calculation.
See also: DCL38-C in the SEI CERT C Coding Standard
Notes:
Merged: https://github.com/ruby/ruby/pull/3017
Merged-By: XrXr
|
|
Split ruby.h
Notes:
Merged-By: shyouhei <[email protected]>
|
|
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side. Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.
This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side. Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side. On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash. If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it. So this commit
should make it so that a maximum of a single hash is allocated
during method calls.
To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable. If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.
In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.
On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled. Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not. Then, unless the callinfo flag is set,
the hash needs to be duplicated. The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again. If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed. CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.
This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.
Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:
def kw(a: 1) a end
def kws(**kw) kw end
h = {a: 1}
kw(a: 1) # about same
kw(**h) # 2.37x faster
kws(a: 1) # 1.30x faster
kws(**h) # 2.19x faster
kw(a: 1, **h) # 1.03x slower
kw(**h, **h) # about same
kws(a: 1, **h) # 1.16x faster
kws(**h, **h) # 1.14x faster
Notes:
Merged: https://github.com/ruby/ruby/pull/2945
|
|
```
$ benchmark-driver benchmark.yml -v --rbenv 'before --jit;after --jit' --repeat-count=12 --output=all
before --jit: ruby 2.8.0dev (2020-03-11T07:43:12Z master e89ebdcb87) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-03-11T07:54:18Z master 143776a0da) +JIT [x86_64-linux]
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 73.86976729561439 77.20184819316513 fps
74.46997176460742 78.43493030231805
77.59686308754307 78.55714131655935
78.53693921126656 79.08984255596820
80.10158944910573 79.17751731838183
80.12254974411167 79.60853122429181
80.28678655204945 79.74674066871896
80.38690681095379 79.90624544440300
80.79223498756919 80.57881084206193
80.82857188422419 80.70677614429169
81.06447745878245 81.03868541295149
81.21620802278490 82.16354660940607
```
|
|
vm_cc_fill() fills CC information into stack allocated memory so
it is not cleared. So we need to clear CC->aux.
|
|
|
|
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Notes:
Merged: https://github.com/ruby/ruby/pull/2888
|
|
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
Notes:
Merged: https://github.com/ruby/ruby/pull/2888
|