summaryrefslogtreecommitdiff
path: root/yjit/src
AgeCommit message (Collapse)Author
2022-12-09YJIT: Split send_iseq_complex_callee exit reasons (#6895)Takashi Kokubun
Notes: Merged-By: k0kubun <[email protected]>
2022-12-09YJIT: implement `getconstant` YARV instruction (#6884)Maxime Chevalier-Boisvert
* YJIT: implement getconstant YARV instruction * Constant id is not a pointer * Stack operands must be read after jit_prepare_routine_call Co-authored-by: Takashi Kokubun <[email protected]> Notes: Merged-By: k0kubun <[email protected]>
2022-12-08YJIT: Upgrade bindgen to stabilize and reduce outputAlan Wu
The new version has an option to merge everything into a big `extern "C"` block and it's nicer. More importantly, this upgrade fixes an issue where Ubuntu with Clang 12 and macOS with Clang 14 gave a one line diff for `rb_shape_t`. It was slightly annoying because we use macOS locally. Notes: Merged: https://github.com/ruby/ruby/pull/6887
2022-12-08YJIT: Drop Copy trait from Context (#6889)Takashi Kokubun
Notes: Merged-By: maximecb <[email protected]>
2022-12-08YJIT: implement opt_newarray_min YARV instruction (#6888)Maxime Chevalier-Boisvert
Notes: Merged-By: maximecb <[email protected]>
2022-12-08Introduce `IO.new(..., path:)` and promote `File#path` to `IO#path`. (#6867)Samuel Williams
Notes: Merged-By: ioquatix <[email protected]>
2022-12-06Set max_iv_count (used for object shapes) based on inline cachesJemma Issroff
With this change, we're storing the iv name on an inline cache on setinstancevariable instructions. This allows us to check the inline cache to count instance variables set in initialize and give us an estimate of iv capacity for an object. For the purpose of estimating the number of instance variables required for an object, we're assuming that all initialize methods will call `super`. This change allows us to estimate the number of instance variables required without disassembling instruction sequences. Co-Authored-By: Aaron Patterson <[email protected]> Notes: Merged: https://github.com/ruby/ruby/pull/6870
2022-12-06Introduce BOP_CMP for optimized comparisonDaniel Colson
Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup to determine whether `<=>` was overridden. The result of the lookup was cached, but only for the duration of the specific method that initialized the cmp_opt_data cache structure. With this method lookup, `[x,y].max` is slower than doing `x > y ? x : y` even though there's an optimized instruction for "new array max". (John noticed somebody a proposed micro-optimization based on this fact in https://github.com/mastodon/mastodon/pull/19903.) ```rb a, b = 1, 2 Benchmark.ips do |bm| bm.report('conditional') { a > b ? a : b } bm.report('method') { [a, b].max } bm.compare! end ``` Before: ``` Comparison: conditional: 22603733.2 i/s method: 19820412.7 i/s - 1.14x (± 0.00) slower ``` This commit replaces the method lookup with a new CMP basic op, which gives the examples above equivalent performance. After: ``` Comparison: method: 24022466.5 i/s conditional: 23851094.2 i/s - same-ish: difference falls within error ``` Relevant benchmarks show an improvement to Array#max and Array#min when not using the optimized newarray_max instruction as well. They are noticeably faster for small arrays with the relevant types, and the same or maybe a touch faster on larger arrays. ``` $ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min $ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max ``` The benchmarks added in this commit also look generally improved. Co-authored-by: John Hawthorn <[email protected]>
2022-12-06Move BOP macros to separate fileDaniel Colson
This commit moves ruby_basic_operators and the unredefined macros out of vm_core.h and into basic_operators.h so that we can use them more broadly in places where we currently use a method look up via `rb_method_basic_definition_p` (e.g. object.c, numeric.c, complex.c, enum.c, but also in internal/compar.h after introducing BOP_CMP and elsewhere if we introduce more BOPs) The most controversial part of this change is probably moving redefined_flag out of rb_vm_t. [vm_opt_method_def_table and vm_opt_mid_table](https://github.com/ruby/ruby/blob/9da2a5204f32a4f2ce135fddde2abb6e07d647e9/vm.c) are not part of rb_vm_t either, and I think this fits well with those. But more significantly it seems to result in one fewer instruction. For example: Before: ``` (lldb) disassemble -n vm_opt_str_freeze miniruby`vm_exec_core: miniruby[0x10028233e] <+14558>: movq 0x11a86b(%rip), %rax ; ruby_current_vm_ptr miniruby[0x100282345] <+14565>: testb $0x4, 0x242c(%rax) ``` After: ``` (lldb) disassemble -n vm_opt_str_freeze ruby`vm_exec_core: ruby[0x100280ebe] <+14510>: testb $0x4, 0x120147(%rip) ; ruby_vm_redefined_flag + 43 ``` Co-authored-by: John Hawthorn <[email protected]>
2022-12-05YJIT: Remove --yjit-code-page-size (#6865)Alan Wu
Certain code page sizes don't work and can cause crashes, so having this value available as a command-line option is a bit dangerous. Remove it and turn it into a constant instead. Notes: Merged-By: maximecb <[email protected]>
2022-12-05YJIT: Extract SHAPE_ID_NUM_BITS into a constant (#6863)Jemma Issroff
Notes: Merged-By: k0kubun <[email protected]>
2022-12-02Remove unused rb_shape_flag_shift and rb_shape_flag_maskJemma Issroff
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02Fixed yjit bindings rb_gc_write_barrierJemma Issroff
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02Extracted rb_shape_id_offsetJemma Issroff
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02Update yjit/src/codegen.rsMaxime Chevalier-Boisvert
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02make flag clearing betterAaron Patterson
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02only generate wb when we really need toAaron Patterson
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02bail on compilation if the comptime receiver is frozenAaron Patterson
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02do not fire the wb when writing immediatesAaron Patterson
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02implement IV writesAaron Patterson
Notes: Merged: https://github.com/ruby/ruby/pull/6767
2022-12-02YJIT: Make case-when optimization respect === redefinition (#6846)Alan Wu
* YJIT: Make case-when optimization respect === redefinition Even when a fixnum key is in the dispatch hash, if there is a case such that its basic operations for === is redefined, we need to fall back to checking each case like the interpreter. Semantically we're always checking each case by calling === in order, it's just that this is not observable when basic operations are intact. When all the keys are fixnums, though, we can do the optimization we're doing right now. Check for this condition. * Update yjit/src/cruby_bindings.inc.rs Co-authored-by: Takashi Kokubun <[email protected]> Co-authored-by: Takashi Kokubun <[email protected]> Notes: Merged-By: maximecb <[email protected]>
2022-12-02YJIT: Change the default --yjit-call-threshold to 30 (#6850)Takashi Kokubun
Notes: Merged-By: maximecb <[email protected]>
2022-12-01YJIT: Respect destination num_bits on STUR (#6848)Takashi Kokubun
Notes: Merged-By: k0kubun <[email protected]>
2022-12-01YJIT: Reorder branches for Fixnum opt_case_dispatch (#6841)Takashi Kokubun
* YJIT: Reorder branches for Fixnum opt_case_dispatch Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Alan Wu <[email protected]> * YJIT: Don't support too large values Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Alan Wu <[email protected]> Notes: Merged-By: maximecb <[email protected]>
2022-12-01YJIT: fix 32 and 16 bit register store (#6840)Jemma Issroff
* Fix 32 and 16 bit register store in YJIT Co-Authored-By: Takashi Kokubun <[email protected]> * Remove an unnecessary diff * Reuse an rm_num_bits result * Use u16::MAX instead * Update the link Co-authored-by: Alan Wu <[email protected]> * Just use sturh for 16 bits Co-authored-by: Takashi Kokubun <[email protected]> Co-authored-by: Alan Wu <[email protected]> Notes: Merged-By: maximecb <[email protected]>
2022-11-30YJIT: Optimize rb_int_equal (#6838)Takashi Kokubun
Notes: Merged-By: maximecb <[email protected]>
2022-11-30YJIT: add new counters for deferred compilation and queued blocks (#6837)Maxime Chevalier-Boisvert
Notes: Merged-By: maximecb <[email protected]>
2022-11-30YJIT: Deallocate `struct Block` to plug memory leaksAlan Wu
Previously we essentially never freed block even after invalidation. Their reference count never reached zero for a couple of reasons: 1. `Branch::block` formed a cycle with the block holding the branch 2. Strong count on a branch that has ever contained a stub never reached 0 because we increment the `.clone()` call for `BranchRef::into_raw()` didn't have a matching decrement. It's not safe to immediately deallocate blocks during invalidation since `branch_stub_hit()` can end up running with a branch pointer from an invalidated branch. To plug the leaks, we wait until code GC or global invalidation and deallocate the blocks for iseqs that are definitely not running. Notes: Merged: https://github.com/ruby/ruby/pull/6833
2022-11-30YJIT: Deallocate when assumptions tables are emptyAlan Wu
When we run global invalidation for TracePoints or code GC, we clear out all blocks in our assumptions table but we don't deallocate the backing buffers. Let's reclaim some memory during these rare events. Notes: Merged: https://github.com/ruby/ruby/pull/6833
2022-11-30YJIT: Fix IseqPayload::pages memory bloatAlan Wu
HashSet::clear() doesn't deallocate the backing buffer and shrink the capacity. Replace with a 0-capcity set instead so we reclaim some memory each code GC. Notes: Merged: https://github.com/ruby/ruby/pull/6833
2022-11-29YJIT: Skip checking interrupt_mask (#6825)Takashi Kokubun
Notes: Merged-By: maximecb <[email protected]>
2022-11-27MJIT: Use a String buffer in builtin compilersTakashi Kokubun
instead of FILE*. Using C.fprintf is slower than String manipulation on memory. I'm going to change the way MJIT writes files, and this is a prerequisite for it.
2022-11-24YJIT: rename `InsnOpnd` => `YARVOpnd` (#6801)Maxime Chevalier-Boisvert
Rename InsnOpnd => YARVOpnd Make it more clear this refers to YARV insn/vm operands rather than backend IR, x86 or ARM insn operands. Notes: Merged-By: maximecb <[email protected]>
2022-11-23YJIT: Use a Box for branch targets to save memoryAlan Wu
We frequently make branches that only have one target but we used to always allocate space for two branch targets. This patch moves all the information a branch target has into a struct and refer to them using Option<Box<BranchTarget>>, this way when the second branch target is not present it only takes 8 bytes. Retained heap size on railsbench went from 16.17 MiB to 14.57 MiB, a ratio of about 1.1. Notes: Merged: https://github.com/ruby/ruby/pull/6799
2022-11-23YJIT: Simplify Insn::CCall to obviate Target::FunPtr (#6793)Takashi Kokubun
Notes: Merged-By: maximecb <[email protected]>
2022-11-23YJIT: Use NonNull pointer for CodePtr (#6792)Takashi Kokubun
Notes: Merged-By: maximecb <[email protected]>
2022-11-23YJIT: Stop passing target1 to gen_return_branchTakashi Kokubun
Notes: Merged: https://github.com/ruby/ruby/pull/6794
2022-11-23YJIT: Simplify code for RB_SPECIAL_CONST_P (#6795)Takashi Kokubun
Notes: Merged-By: maximecb <[email protected]>
2022-11-23Fix YJIT backend to account for unsigned int immediates (#6789)Jemma Issroff
YJIT: x86_64: Fix cmp with number where sign bit is set Before this commit, we were unconditionally treating unsigned ints as signed ints when counting the number of bits required for representing the immediate in machine code. When the size of the immediate matches the size of the other operand, no sign extension happens, so this was incorrect. `asm.cmp(opnd64, 0x8000_0000)` panicked even though it's encodable as `CMP r/m32, imm32`. Large shape ids were impacted by this issue. Co-Authored-By: Aaron Patterson <[email protected]> Co-Authored-By: Alan Wu <[email protected]> Co-authored-by: Aaron Patterson <[email protected]> Co-authored-by: Alan Wu <[email protected]> Notes: Merged-By: maximecb <[email protected]>
2022-11-22YJIT: Skip padding jumps to side exits on Arm (#6790)Takashi Kokubun
YJIT: Skip padding jumps to side exits Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Alan Wu <[email protected]> Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Alan Wu <[email protected]> Notes: Merged-By: maximecb <[email protected]>
2022-11-18YJIT: Improve the failure message on enlarging a branch (#6769)Takashi Kokubun
Notes: Merged-By: k0kubun <[email protected]>
2022-11-1832 bit comparison on shape idAaron Patterson
This commit changes the shape id comparisons to use a 32 bit comparison rather than 64 bit. That means we don't need to load the shape id to a register on x86 machines. Given the following program: ```ruby class Foo def initialize @foo = 1 @bar = 1 end def read [@foo, @bar] end end foo = Foo.new foo.read foo.read foo.read foo.read foo.read puts RubyVM::YJIT.disasm(Foo.instance_method(:read)) ``` The machine code we generated _before_ this change is like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ====================== # getinstancevariable 0x559a18623023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x559a18623027: test al, 7 0x559a1862302a: jne 0x559a1862502d 0x559a18623030: cmp rax, 4 0x559a18623034: jbe 0x559a1862502d # guard shape, embedded, and T_OBJECT 0x559a1862303a: mov rcx, qword ptr [rax] 0x559a1862303d: movabs r11, 0xffff00000000201f 0x559a18623047: and rcx, r11 0x559a1862304a: movabs r11, 0xb000000002001 0x559a18623054: cmp rcx, r11 0x559a18623057: jne 0x559a18625046 0x559a1862305d: mov rax, qword ptr [rax + 0x18] 0x559a18623061: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x559a18623064: mov rax, qword ptr [r13 + 0x18] # guard shape, embedded, and T_OBJECT 0x559a18623068: mov rcx, qword ptr [rax] 0x559a1862306b: movabs r11, 0xffff00000000201f 0x559a18623075: and rcx, r11 0x559a18623078: movabs r11, 0xb000000002001 0x559a18623082: cmp rcx, r11 0x559a18623085: jne 0x559a18625099 0x559a1862308b: mov rax, qword ptr [rax + 0x20] 0x559a1862308f: mov qword ptr [rbx + 8], rax ``` After this change, it's like this: ``` == BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ====================== # getinstancevariable 0x5560c986d023: mov rax, qword ptr [r13 + 0x18] # guard object is heap 0x5560c986d027: test al, 7 0x5560c986d02a: jne 0x5560c986f02d 0x5560c986d030: cmp rax, 4 0x5560c986d034: jbe 0x5560c986f02d # guard shape 0x5560c986d03a: cmp word ptr [rax + 6], 0x19 0x5560c986d03f: jne 0x5560c986f046 0x5560c986d045: mov rax, qword ptr [rax + 0x10] 0x5560c986d049: mov qword ptr [rbx], rax == BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes ======================= == BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ====================== # gen_direct_jmp: fallthrough # getinstancevariable # regenerate_branch # getinstancevariable # regenerate_branch 0x5560c986d04c: mov rax, qword ptr [r13 + 0x18] # guard shape 0x5560c986d050: cmp word ptr [rax + 6], 0x19 0x5560c986d055: jne 0x5560c986f099 0x5560c986d05b: mov rax, qword ptr [rax + 0x18] 0x5560c986d05f: mov qword ptr [rbx + 8], rax ``` The first ivar read is a bit more complex, but the second ivar read is much simpler. I think eventually we could teach the context about the shape, then emit only one shape guard. Notes: Merged: https://github.com/ruby/ruby/pull/6737
2022-11-17Fix bug involving .send and overwritten methods. (#6752)Jimmy Miller
@casperisfine reporting a bug in this gist https://gist.github.com/casperisfine/d59e297fba38eb3905a3d7152b9e9350 After investigating I found it was caused by a combination of send and a c_func that we have overwritten in the JIT. For send calls, we need to do some stack manipulation before making the call. Because of the way exits works, we need to do that stack manipulation at the last possible moment. In this case, we weren't doing that stack manipulation at all. Unfortunately, with how the code is structured there isn't a great place to do that stack manipulation for our overridden C funcs. Each overridden C func can return a boolean stating that it shouldn't be used. We would need to do the stack manipulation after all of those checks are done. We could pass a lambda(?) or separate out the logic for "can I run this override" from "now generate the code for it". Since we are coming up on a release, I went with the path of least resistence and just decided to not use these overrides if we are in a send call. We definitely should revist this in the future. Notes: Merged-By: maximecb <[email protected]>
2022-11-16YJIT: Shrink version lists after mutation (#6749)Takashi Kokubun
Notes: Merged-By: k0kubun <[email protected]>
2022-11-16YJIT: Pack BlockId and CodePtr (#6748)Takashi Kokubun
Notes: Merged-By: k0kubun <[email protected]>
2022-11-16YJIT: Add compiled_branch_count stats (#6746)Takashi Kokubun
Notes: Merged-By: k0kubun <[email protected]>
2022-11-16YJIT: Stop wrapping CmePtr with CmeDependency (#6747)Takashi Kokubun
* YJIT: Stop wrapping CmePtr with CmeDependency * YJIT: Fix an outdated comment [ci skip] Notes: Merged-By: k0kubun <[email protected]>
2022-11-16YJIT: Shrink the vectors of Block after mutation (#6739)Takashi Kokubun
Notes: Merged-By: k0kubun <[email protected]>
2022-11-15YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets (#6733)Takashi Kokubun
* YJIT: Always encode Opnd::Value in 64 bits on x86_64 for GC offsets Co-authored-by: Alan Wu <[email protected]> * Introduce heap_object_p * Leave original mov intact * Remove unneeded branches * Add a test for movabs Co-authored-by: Alan Wu <[email protected]> Notes: Merged-By: k0kubun <[email protected]>
2022-11-15YJIT: Include actual memory region size in stats (#6736)Takashi Kokubun
Notes: Merged-By: k0kubun <[email protected]>