summaryrefslogtreecommitdiff
path: root/ruby_parser.c
AgeCommit message (Collapse)Author
2025-02-15Remove rb_enc_associate for ParserS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/12715
2025-02-02Remove rb_exc_raise for ParserS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/12688
2025-01-28Remove rb_usascii_encoding for ParserS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/12643
2025-01-20Remove rb_obj_as_string for ParserS-H-GAMELINKS
Ruby Parser not used rb_obj_as_string. And obj_as_string property can be removed from Universal Parser. Notes: Merged: https://github.com/ruby/ruby/pull/12603
2025-01-06Remove SYM2ID for ParserS-H-GAMELINKS
Ruby Parser not used SYM2ID. And sym2id property can be removed from Universal Parser. Notes: Merged: https://github.com/ruby/ruby/pull/12507
2025-01-02Remove rb_ary_push for parserS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/12493
2024-12-19[Bug #20969] Pass `assignable` from ripperNobuyoshi Nakada
For the universal parser, `rb_reg_named_capture_assign_iter_impl` function is shared between the parser and ripper. However `parser_params` struct is partially different, and `assignable` function depends on that part indirectly. Notes: Merged: https://github.com/ruby/ruby/pull/12400
2024-10-25Remove rb_ary_new for parserS-H-GAMELINKS
rb_ary_new function was not used by the parser and could be removed. Notes: Merged: https://github.com/ruby/ruby/pull/11734
2024-09-28Remove on `RSTRING_END` dependency from parserNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/11713
2024-09-27Reduce `is_ascii_string` function dependency for parserS-H-GAMELINKS
Changed to use `rb_parser_is_ascii_string` function instead of `is_ascii_string` function Notes: Merged: https://github.com/ruby/ruby/pull/11698
2024-09-26Remove rb_str_cat for parserS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/11672
2024-09-22Reuse dedent_string function in rb_ruby_ripper_dedent_string functionS-H-GAMELINKS
This change is reduce Ruby C API dependency for Universal Parser. Reuse dedent_string functions in rb_ruby_ripper_dedent_string functions and remove dependencies on rb_str_modify and rb_str_set_len from the parser. Notes: Merged: https://github.com/ruby/ruby/pull/11658
2024-08-27Remove `enc_coderange_broken` field from `struct rb_parser_config_struct`yui-knk
It has not been used since fcc55dc2261b4c61da711c10a5476d05d4391eca. Notes: Merged: https://github.com/ruby/ruby/pull/11464
2024-06-02Remove unused functions from struct rb_parser_config_structS-H-GAMELINKS
StringValueCStr has not used in parse.y
2024-05-31Revert 528c4501f46fbe1e06028d673a777ef124d29829Yusuke Endoh
Recently, `TestRubyLiteral#test_float` fails randomly. ``` 1) Error: TestRubyLiteral#test_float: ArgumentError: SyntaxError#path changed: "(eval at /home/chkbuild/chkbuild/tmp/build/20240527T050036Z/ruby/test/ruby/test_literal.rb:642)"->"(eval at /home/chkbuild/chkbuild/tmp/build/20240527T050036Z/ruby/test/ruby/test_literal.rb:642)" ``` https://rubyci.s3.amazonaws.com/s390x/ruby-master/log/20240527T050036Z.fail.html.gz According to Launchable, the first failure was on Apr 30. This is just when 528c4501f46fbe1e06028d673a777ef124d29829 was committed. I don't know if the change is really the cause, but I want to revert it once to see if the random failure disappears.
2024-05-28Precompute embedded string literals hash codeJean Boussier
With embedded strings we often have some space left in the slot, which we can use to store the string Hash code. It's probably only worth it for string literals, as they are the ones likely to be used as hash keys. We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString. ``` compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23] last_commit=Precompute embedded string literals hash code | |compare-ruby|built-ruby| |:-----------|-----------:|---------:| |symbol | 39.275M| 39.753M| | | -| 1.01x| |dyn_symbol | 37.348M| 37.704M| | | -| 1.01x| |small_lit | 29.514M| 33.948M| | | -| 1.15x| |frozen_lit | 27.180M| 33.056M| | | -| 1.22x| |iseq_lit | 27.391M| 32.242M| | | -| 1.18x| ``` Co-Authored-By: Étienne Barrié <[email protected]>
2024-05-13Constify encoding type in universal parserNobuyoshi Nakada
Fixed warning about discarding modifiers. ``` ../src/ruby_parser.c:677:48: warning: passing 'rb_encoding *' (aka 'const struct OnigEncodingTypeST *') to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] 677 | ast = rb_parser_compile(p, gets, ptr, len, enc, input, line); | ^~~ ../src/internal/parse.h:58:128: note: passing argument to parameter 'fname_enc' here 58 | rb_ast_t *rb_parser_compile(rb_parser_t *p, rb_parser_lex_gets_func *gets, const char *fname_ptr, long fname_len, rb_encoding *fname_enc, rb_parser_input_data input, int line); | ^ ```
2024-05-04Change return value of `gets` function to be `rb_parser_string_t *` instead ↵yui-knk
of `VALUE` This change reduces parser's dependency on ruby object.
2024-05-03Rename `vast` to `ast_value`yui-knk
There is an English word "vast". This commit changes the name to be more clear name to avoid confusion.
2024-05-02Fix memory leak of `rb_ast_t` in parserNobuyoshi Nakada
Do not allocate `rb_ast_t` in `ast_alloc` to avoid memory leak. For example: 10.times do 100_000.times do eval("") end puts `ps -o rss= -p #{$$}` end Before: 17568 20960 24096 27808 31008 34160 37312 40464 43568 46816 After: 14432 14448 14496 14576 14592 15072 15072 15072 15072 15088
2024-05-02Revert "Fix memory leak of rb_ast_t in parser"yui-knk
This reverts commit e3bfd25bd2202a172d7709e9a2f7b65b523a132d. > "Allocate then wrap" is bad, because the "wrapping" itself can fail. See: https://github.com/ruby/ruby/pull/10618#pullrequestreview-2019294349
2024-04-30Use `rb_parser_string_t *` as `ruby_sourcefile_string`yui-knk
This reduces dependency on VALUE.
2024-04-29Fix memory leak of rb_ast_t in parserPeter Zhu
ast_alloc uses TypedData_Make_Struct, which allocates a rb_ast_t. But it is overwritten when we set the DATA_PTR so the original memory is leaked. For example: 10.times do 100_000.times do eval("") end puts `ps -o rss= -p #{$$}` end Before: 17328 20752 23664 28400 30656 34224 37424 40784 43328 46656 After: 14320 14320 14320 14320 14320 14320 14320 14336 14336 14336
2024-04-29Fix memory leak in ruby_parserPeter Zhu
For example: 10.times do 100_000.times do eval("") end puts `ps -o rss= -p #{$$}` end Before: 19872 26480 32848 39504 45904 52672 59200 65760 72128 78496 After: 17328 20752 23664 28400 30656 34224 37424 40784 43328 46656
2024-04-28Remove `ast_new` field from `struct rb_parser_config_struct`yui-knk
`ast_new` can be embedded into `rb_ast_new`.
2024-04-28[Universal parser] Improve AST structureHASUMI Hitoshi
This patch moves `ast->node_buffer->config` to `ast->config` aiming to improve readability and maintainability of the source. ## Background We could not add the `config` field to the `rb_ast_t *` due to the five-word restriction of the IMEMO object. But it is now doable by merging https://github.com/ruby/ruby/pull/10618 ## About assigning `&rb_global_parser_config` to `ast->config` in `ast_alloc()` The approach of not setting `ast->config` in `ast_alloc()` means that the client, CRuby in this scenario, that directly calls `ast_alloc()` will be responsible for releasing it if a resource that is passed to AST needs to be released. However, we have put on hold whether we can guarantee the above so far, thus, this patch looks like that. ``` // ruby_parser.c static VALUE ast_alloc(void) { rb_ast_t *ast; VALUE vast = TypedData_Make_Struct(0, rb_ast_t, &ast_data_type, ast); #ifdef UNIVERSAL_PARSER ast = (rb_ast_t *)DATA_PTR(vast); ast->config = &rb_global_parser_config; #endif return vast; } ```
2024-04-27Add line_count field to rb_ast_body_tHASUMI Hitoshi
This patch adds `int line_count` field to `rb_ast_body_t` structure. Instead, we no longer cast `script_lines` to Fixnum. ## Background Ref https://github.com/ruby/ruby/pull/10618 In the PR above, we have decoupled IMEMO from `rb_ast_t`. This means we could lift the five-words-restriction of the structure that forced us to unionize `rb_ast_t *` and `FIXNUM` in one field. ## Relating refactor - Remove the second parameter of `rb_ruby_ast_new()` function ## Attention I will remove a code that assigns -1 to line_count, in `rb_binding_add_dynavars()` of vm.c, because I don't think it is necessary. But I will make another PR for this so that we can atomically revert in case I was wrong (See the comment on the code)
2024-04-26Set `SCRIPT_LINES__` outside of parseryui-knk
Parser should not depend on functions defiend on "ruby_parser.c".
2024-04-26[Universal parser] Decouple IMEMO from rb_ast_tHASUMI Hitoshi
This patch removes the `VALUE flags` member from the `rb_ast_t` structure making `rb_ast_t` no longer an IMEMO object. ## Background We are trying to make the Ruby parser generated from parse.y a universal parser that can be used by other implementations such as mruby. To achieve this, it is necessary to exclude VALUE and IMEMO from parse.y, AST, and NODE. ## Summary (file by file) - `rubyparser.h` - Remove the `VALUE flags` member from `rb_ast_t` - `ruby_parser.c` and `internal/ruby_parser.h` - Use TypedData_Make_Struct VALUE which wraps `rb_ast_t` `in ast_alloc()` so that GC can manage it - You can retrieve `rb_ast_t` from the VALUE by `rb_ruby_ast_data_get()` - Change the return type of `rb_parser_compile_XXXX()` functions from `rb_ast_t *` to `VALUE` - rb_ruby_ast_new() which internally `calls ast_alloc()` is to create VALUE vast outside ruby_parser.c - `iseq.c` and `vm_core.h` - Amend the first parameter of `rb_iseq_new_XXXX()` functions from `rb_ast_body_t *` to `VALUE` - This keeps the VALUE of AST on the machine stack to prevent being removed by GC - `ast.c` - Almost all change is replacement `rb_ast_t *ast` with `VALUE vast` (sorry for the big diff) - Fix `node_memsize()` - Now it includes `rb_ast_local_table_link`, `tokens` and script_lines - `compile.c`, `load.c`, `node.c`, `parse.y`, `proc.c`, `ruby.c`, `template/prelude.c.tmpl`, `vm.c` and `vm_eval.c` - Follow-up due to the above changes - `imemo.{c|h}` - If an object with `imemo_ast` appears, considers it a bug Co-authored-by: Nobuyoshi Nakada <[email protected]>
2024-04-24Remove unused functions from `struct rb_parser_config_struct`yui-knk
2024-04-23Remove unused functions from `struct rb_parser_config_struct`yui-knk
2024-04-23Move encoding object conversion outside of parseryui-knk
Reduce the parser's dependence on `VALUE` and `rb_enc_from_encoding`.
2024-04-23Remove unused functions from `struct rb_parser_config_struct`yui-knk
2024-04-23Remove unused functions from `struct rb_parser_config_struct`yui-knk
2024-04-23Refactor parser compile functionsyui-knk
Refactor parser compile functions to reduce the dependence on ruby functions. This commit includes these changes 1. Refactor `gets`, `input` and `gets_` of `parser_params` Parser needs two different data structure to get next line, function (`gets`) and input data (`input`). However `gets_` is used for both function (`call`) and input data (`ptr`). `call` is used for managing general callback function when `rb_ruby_parser_compile_generic` is used. `ptr` is used for managing the current pointer on String when `parser_compile_string` is used. This commit changes parser to used only `gets` and `input` then removes `gets_`. 2. Move parser_compile functions and `gets` functions from parse.y to ruby_parser.c This change reduces the dependence on ruby functions from parser. 3. Change ruby_parser and ripper to take care of `VALUE input` GC mark Move the responsibility of calling `rb_gc_mark` for `VALUE input` from parser to ruby_parser and ripper. `input` is arbitrary data pointer from the viewpoint of parser. 4. Introduce rb_parser_compile_array function Caller of `rb_parser_compile_generic` needs to take care about GC because ruby_parser doesn’t know about the detail of `lex_gets` and `input`. Introduce `rb_parser_compile_array` to reduce the complexity of ast.c.
2024-04-21Remove unused functions from struct `rb_parser_config_struct`S-H-GAMELINKS
2024-04-20Remove unused functionyui-knk
2024-04-20Remove unused functions from `struct rb_parser_config_struct`yui-knk
2024-04-20Parser and universal parser share wrapper functionsyui-knk
2024-04-16Remove unused functions from `struct rb_parser_config_struct`yui-knk
2024-04-15[Universal parser] DeVALUE of p->debug_lines and ast->body.script_linesHASUMI Hitoshi
This patch is part of universal parser work. ## Summary - Decouple VALUE from members below: - `(struct parser_params *)->debug_lines` - `(rb_ast_t *)->body.script_lines` - Instead, they are now `rb_parser_ary_t *` - They can also be a `(VALUE)FIXNUM` as before to hold line count - `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE - In order to do this, - Add `VALUE script_lines` param to `rb_iseq_new_with_opt()` - Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE` ## Other details - Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too - Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()` - Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]` - After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines` - Remove the second parameter of `rb_parser_set_script_lines()` to make it simple - Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines - Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called - With regard to this, please see *Future tasks* below ## Future tasks - Decouple IMEMO from `rb_ast_t *` - This lifts the five-members-restriction of Ruby object, - So we will be able to move the ownership of the `lex.string_buffer` from parser to AST - Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
2024-04-14Remove unused functions from `struct rb_parser_config_struct`yui-knk
2024-04-11compile.c: use rb_enc_interned_str to reduce allocationsJean Boussier
The `rb_fstring(rb_enc_str_new())` pattern is inneficient because: - It passes a mutable string to `rb_fstring` so if it has to be interned it will first be duped. - It an equivalent interned string already exists, we allocated the string for nothing. With `rb_enc_interned_str` we either directly get the pre-existing string with 0 allocations, or efficiently directly intern the one we create without first duping it.
2024-04-11Remove unused function from `struct rb_parser_config_struct`yui-knk
2024-04-09Remove unused function from `struct rb_parser_config_struct`yui-knk
2024-04-06Remove unused function from `struct rb_parser_config_struct`yui-knk
2024-04-06Remove unused functions from `struct rb_parser_config_struct`S-H-GAMELINKS
2024-04-05Remove unused functions from `struct rb_parser_config_struct`yui-knk
2024-04-04Remove unused function from `struct rb_parser_config_struct`yui-knk
2024-04-04Separate SCRIPT_LINES__ from ast.cHASUMI Hitoshi
This patch suggests relocating the code dealing with `SCRIPT_LINES__` from ast.c to ruby_parser.c. ## Background - I guess `AbstractSyntaxTree.of` method used to use `SCRIPT_LINES__` internally for some reason before - However, now it appears `SCRIPT_LINES__` is no longer used meaningfully by the method - As evidence of this, (and as my patch shows,) removing the function call of `rb_script_lines_for()` from `ast_s_of()` does not affect the result of `test/ruby/test_ast.rb` Given the above, I think two possibilities can be considered: - (A) `AbstractSyntaxTree.of` has not needed `SCRIPT_LINES__` already (I pick this) - (B) We lack a test case of `AbstractSyntaxTree.of` that needs to use `SCRIPT_LINES__` ## Besides, The current implementation causes strange behavior: ```console ruby -e"SCRIPT_LINES__ = {__FILE__ => []}; puts RubyVM::AbstractSyntaxTree.of(->{ 1 + 2 }, keep_script_lines: true).script_lines" => `-e:1:in '<main>': undefined method 'script_lines' for nil (NoMethodError)` ``` I think this is a bug because `AbstractSyntaxTree.of` is not supposed to return `nil` even in this case. This happens due to the ast.c's dependence on `SCRIPT_LINES__`. And at the end of the `ast_s_of()`, `node_find()` can not find the target child node obviously because it doesn't make sense to look for a corresponding node made from the parameter of `AbstractSyntaxTree.of` in the AST tree made from the value of `{__FILE__ => []}` ## Solution Since I think it's good enough `SCRIPT_LINES__` to be only referred by ruby.c, I chose the possibility "(A)" and wrote this patch which moves `rb_script_lines_for()` from ast.c to ruby_parser.c. So as the result: - `ast_s_of()` function no longer look up `SCRIPT_LINES__` - Even so, this patched code passes the existing tests - The strange behavior above no longer happens (I also added a test for it) Please correct me if I miss something🙏