Age | Commit message (Collapse) | Author |
|
Here `nb` should never be NULL. If it were, the following
`nb->buffer_list` would be strange.
A follow-up to ddd8da4b6ba3dfcca21ca710e7cef2fa3b9632d7
Notes:
Merged: https://github.com/ruby/ruby/pull/12208
|
|
Change UNDEF Node to hold their items to keep the original grammar
structure.
For example:
```
undef a, b
```
Before:
```
@ NODE_BLOCK (id: 4, line: 1, location: (1,6)-(1,10))*
+- nd_head (1):
| @ NODE_UNDEF (id: 1, line: 1, location: (1,6)-(1,7))
| +- nd_undef:
| @ NODE_SYM (id: 0, line: 1, location: (1,6)-(1,7))
| +- string: :a
+- nd_head (2):
@ NODE_UNDEF (id: 3, line: 1, location: (1,9)-(1,10))
+- nd_undef:
@ NODE_SYM (id: 2, line: 1, location: (1,9)-(1,10))
+- string: :b
```
After:
```
@ NODE_UNDEF (id: 1, line: 1, location: (1,6)-(1,10))*
+- nd_undefs:
+- length: 2
+- element (0):
| @ NODE_SYM (id: 0, line: 1, location: (1,6)-(1,7))
| +- string: :a
+- element (1):
@ NODE_SYM (id: 2, line: 1, location: (1,9)-(1,10))
+- string: :b
```
Notes:
Merged: https://github.com/ruby/ruby/pull/11213
|
|
|
|
`ast_new` can be embedded into `rb_ast_new`.
|
|
This patch moves `ast->node_buffer->config` to `ast->config` aiming to improve readability and maintainability of the source.
## Background
We could not add the `config` field to the `rb_ast_t *` due to the five-word restriction of the IMEMO object.
But it is now doable by merging https://github.com/ruby/ruby/pull/10618
## About assigning `&rb_global_parser_config` to `ast->config` in `ast_alloc()`
The approach of not setting `ast->config` in `ast_alloc()` means that the client, CRuby in this scenario, that directly calls `ast_alloc()` will be responsible for releasing it if a resource that is passed to AST needs to be released.
However, we have put on hold whether we can guarantee the above so far, thus, this patch looks like that.
```
// ruby_parser.c
static VALUE
ast_alloc(void)
{
rb_ast_t *ast;
VALUE vast = TypedData_Make_Struct(0, rb_ast_t, &ast_data_type, ast);
#ifdef UNIVERSAL_PARSER
ast = (rb_ast_t *)DATA_PTR(vast);
ast->config = &rb_global_parser_config;
#endif
return vast;
}
```
|
|
This patch adds `int line_count` field to `rb_ast_body_t` structure.
Instead, we no longer cast `script_lines` to Fixnum.
## Background
Ref https://github.com/ruby/ruby/pull/10618
In the PR above, we have decoupled IMEMO from `rb_ast_t`.
This means we could lift the five-words-restriction of the structure
that forced us to unionize `rb_ast_t *` and `FIXNUM` in one field.
## Relating refactor
- Remove the second parameter of `rb_ruby_ast_new()` function
## Attention
I will remove a code that assigns -1 to line_count, in `rb_binding_add_dynavars()`
of vm.c, because I don't think it is necessary.
But I will make another PR for this so that we can atomically revert
in case I was wrong (See the comment on the code)
|
|
This patch removes the `VALUE flags` member from the `rb_ast_t` structure making `rb_ast_t` no longer an IMEMO object.
## Background
We are trying to make the Ruby parser generated from parse.y a universal parser that can be used by other implementations such as mruby.
To achieve this, it is necessary to exclude VALUE and IMEMO from parse.y, AST, and NODE.
## Summary (file by file)
- `rubyparser.h`
- Remove the `VALUE flags` member from `rb_ast_t`
- `ruby_parser.c` and `internal/ruby_parser.h`
- Use TypedData_Make_Struct VALUE which wraps `rb_ast_t` `in ast_alloc()` so that GC can manage it
- You can retrieve `rb_ast_t` from the VALUE by `rb_ruby_ast_data_get()`
- Change the return type of `rb_parser_compile_XXXX()` functions from `rb_ast_t *` to `VALUE`
- rb_ruby_ast_new() which internally `calls ast_alloc()` is to create VALUE vast outside ruby_parser.c
- `iseq.c` and `vm_core.h`
- Amend the first parameter of `rb_iseq_new_XXXX()` functions from `rb_ast_body_t *` to `VALUE`
- This keeps the VALUE of AST on the machine stack to prevent being removed by GC
- `ast.c`
- Almost all change is replacement `rb_ast_t *ast` with `VALUE vast` (sorry for the big diff)
- Fix `node_memsize()`
- Now it includes `rb_ast_local_table_link`, `tokens` and script_lines
- `compile.c`, `load.c`, `node.c`, `parse.y`, `proc.c`, `ruby.c`, `template/prelude.c.tmpl`, `vm.c` and `vm_eval.c`
- Follow-up due to the above changes
- `imemo.{c|h}`
- If an object with `imemo_ast` appears, considers it a bug
Co-authored-by: Nobuyoshi Nakada <[email protected]>
|
|
|
|
|
|
This patch is part of universal parser work.
## Summary
- Decouple VALUE from members below:
- `(struct parser_params *)->debug_lines`
- `(rb_ast_t *)->body.script_lines`
- Instead, they are now `rb_parser_ary_t *`
- They can also be a `(VALUE)FIXNUM` as before to hold line count
- `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE
- In order to do this,
- Add `VALUE script_lines` param to `rb_iseq_new_with_opt()`
- Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE`
## Other details
- Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too
- Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()`
- Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]`
- After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines`
- Remove the second parameter of `rb_parser_set_script_lines()` to make it simple
- Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines
- Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called
- With regard to this, please see *Future tasks* below
## Future tasks
- Decouple IMEMO from `rb_ast_t *`
- This lifts the five-members-restriction of Ruby object,
- So we will be able to move the ownership of the `lex.string_buffer` from parser to AST
- Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
|
|
T_TYPES was needed once Ripper jumbled NODEs and other type
objects. However such hack was already removed.
Therefore don't need to set T_TYPES of NODE.
|
|
`nodetype_markable_p` always returns `false` then
`rb_ast_node_type_change` never calls `rb_bug`.
|
|
All types of Node are managed by `node_buffer_list_t unmarkable`
therefore merge them into `node_buffer_list_t buffer_list`.
|
|
|
|
|
|
- Introduce `rb_parser_ary_t` structure to partly eliminate RArray from parse.y
- In this patch, `parser_params->tokens` and `parser_params->ast->node_buffer->tokens` are now `rb_parser_ary_t *`
- Instead, `ast_node_all_tokens()` internally creates a Ruby Array object from the `rb_parser_ary_t`
- Also, delete `rb_ast_tokens()` and `rb_ast_set_tokens()` in node.c
- Implement `rb_parser_str_escape()`
- This is a port of the `rb_str_escape()` function in string.c
- `rb_parser_str_escape()` does not depend on `VALUE` (RString)
- Instead, it uses `rb_parser_stirng_t *`
- This function works when --dump=y option passed
- Because WIP of the universal parser, similar functions like `rb_parser_tokens_free()` exist in both node.c and parse.y. Refactoring them may be needed in some way in the future
- Although we considered redesigning the structure: `ast->node_buffer->tokens` into `ast->tokens`, we leave it as it is because `rb_ast_t` is an imemo. (We will address it in the future)
|
|
Rather than exposing that an imemo has a flag and four fields, this
changes the implementation to only expose one field (the klass) and
fills the rest with 0. The type will have to fill in the values themselves.
|
|
|
|
|
|
Introduce another semantic value stack for Ripper so that
Ripper can manage both Node and Ruby Object separately.
This rearchitectutre of Ripper solves these issues.
Therefore adding test cases for them.
* [Bug 10436] https://bugs.ruby-lang.org/issues/10436
* [Bug 18988] https://bugs.ruby-lang.org/issues/18988
* [Bug 20055] https://bugs.ruby-lang.org/issues/20055
Checked the differences of `Ripper.sexp` for files under `/test/ruby`
are only on test_pattern_matching.rb.
The differences comes from the differences between
`new_hash_pattern_tail` functions between parser and Ripper.
Ripper `new_hash_pattern_tail` didn’t call `assignable` then
`kw_rest_arg` wasn’t marked as local variable.
This is also fixed by this commit.
```
--- a/./tmp/before/test_pattern_matching.rb
+++ b/./tmp/after/test_pattern_matching.rb
@@ -3607,7 +3607,7 @@
[:in,
[:hshptn, nil, [], [:var_field, [:@ident, “a”, [984, 13]]]],
[[:binary,
- [:vcall, [:@ident, “a”, [985, 10]]],
+ [:var_ref, [:@ident, “a”, [985, 10]]],
:==,
[:hash, nil]]],
nil]]],
@@ -3662,7 +3662,7 @@
[:in,
[:hshptn, nil, [], [:var_field, [:@ident, “a”, [993, 13]]]],
[[:binary,
- [:vcall, [:@ident, “a”, [994, 10]]],
+ [:var_ref, [:@ident, “a”, [994, 10]]],
:==,
[:hash,
[:assoclist_from_args,
@@ -3813,7 +3813,7 @@
[:command,
[:@ident, “raise”, [1022, 10]],
[:args_add_block,
- [[:vcall, [:@ident, “b”, [1022, 16]]]],
+ [[:var_ref, [:@ident, “b”, [1022, 16]]]],
false]]],
[:else, [[:var_ref, [:@kw, “true”, [1024, 10]]]]]]]],
nil,
@@ -3876,7 +3876,7 @@
[:@int, “0”, [1033, 15]]],
:“&&“,
[:binary,
- [:vcall, [:@ident, “b”, [1033, 20]]],
+ [:var_ref, [:@ident, “b”, [1033, 20]]],
:==,
[:hash, nil]]]],
nil]]],
@@ -3946,7 +3946,7 @@
[:@int, “0”, [1042, 15]]],
:“&&“,
[:binary,
- [:vcall, [:@ident, “b”, [1042, 20]]],
+ [:var_ref, [:@ident, “b”, [1042, 20]]],
:==,
[:hash,
[:assoclist_from_args,
@@ -5206,7 +5206,7 @@
[[:assoc_new,
[:@label, “c:“, [1352, 22]],
[:@int, “0”, [1352, 25]]]]]],
- [:vcall, [:@ident, “r”, [1352, 29]]]],
+ [:var_ref, [:@ident, “r”, [1352, 29]]]],
false]]],
[:binary,
[:call,
@@ -5299,7 +5299,7 @@
[:assoc_new,
[:@label, “c:“, [1367, 34]],
[:@int, “0”, [1367, 37]]]]]],
- [:vcall, [:@ident, “r”, [1367, 41]]]],
+ [:var_ref, [:@ident, “r”, [1367, 41]]]],
false]]],
[:binary,
[:call,
@@ -5931,7 +5931,7 @@
[:in,
[:hshptn, nil, [], [:var_field, [:@ident, “r”, [1533, 11]]]],
[[:binary,
- [:vcall, [:@ident, “r”, [1534, 8]]],
+ [:var_ref, [:@ident, “r”, [1534, 8]]],
:==,
[:hash,
[:assoclist_from_args,
```
|
|
|
|
String nodes holds ruby string object on `VALUE nd_lit`.
This commit changes it to `struct rb_parser_string *string`
to reduce dependency on ruby object.
Sometimes these strings are concatenated with other string
therefore string concatenate functions are needed.
|
|
|
|
It's allocated outside of parser then no need to track
reference count in rb_parser_config.
|
|
|
|
`:sym` was managed by `NODE_LIT` with `Symbol` object.
This commit introduces `NODE_SYM` so that
1. Symbol literal is detectable from AST Node
2. Reduce dependency on ruby object
|
|
|
|
`__FILE__` was managed by `NODE_STR` with `String` object.
This commit introduces `NODE_FILE` and `struct rb_parser_string` so that
1. `__FILE__` is detectable from AST Node
2. Reduce dependency ruby object
|
|
|
|
|
|
|
|
Reproduction script:
```
require "ripper"
10.times do
20_000.times do
Ripper.parse("")
end
puts `ps -o rss= -p #{$$}`
end
```
Before:
```
28032
34432
40704
47232
53632
60032
66432
72832
79232
85632
```
After:
```
21760
21760
21760
21760
21760
21760
21760
21760
21760
21760
```
|
|
All kind of AST nodes use same struct RNode, which has u1, u2, u3 union members
for holding different kind of data.
This has two problems.
1. Low flexibility of data structure
Some nodes, for example NODE_TRUE, don’t use u1, u2, u3. On the other hand,
NODE_OP_ASGN2 needs more than three union members. However they use same
structure definition, need to allocate three union members for NODE_TRUE and
need to separate NODE_OP_ASGN2 into another node.
This change removes the restriction so make it possible to
change data structure by each node type.
2. No compile time check for union member access
It’s developer’s responsibility for using correct member for each node type when it’s union.
This change clarifies which node has which type of fields and enables compile time check.
This commit also changes node_buffer_elem_struct buf management to handle
different size data with alignment.
|
|
NODE_ARGS, NODE_ARYPTN, NODE_FNDPTN manage memory of their
structure by imemo tmpbuf Object.
However rb_ast_struct has reference to NODE. Then these
memory can be freed directly when rb_ast_struct is freed.
This commit reduces parser's dependency on CRuby functions.
|
|
This commit reduces dependency to CRuby object.
Notes:
Merged: https://github.com/ruby/ruby/pull/7950
|
|
Introduce Universal Parser mode for the parser.
This commit includes these changes:
* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
Notes:
Merged: https://github.com/ruby/ruby/pull/7927
|
|
98637d421dbe8bcf86cc2effae5e26bb96a6a4da changes the name of
the function. However this function is exported as global,
then change the name to origin one for keeping compatibility.
Notes:
Merged: https://github.com/ruby/ruby/pull/7852
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7844
|
|
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```
Performance:
```
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|
```
Notes:
Merged: https://github.com/ruby/ruby/pull/6770
|
|
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]
Notes:
Merged: https://github.com/ruby/ruby/pull/6512
|
|
In some causes node_id might have been left uninitialized leading to
undefined behavior on access. So always set it to -1, so we have *some*
valid value in there.
Notes:
Merged: https://github.com/ruby/ruby/pull/6202
|
|
[Misc #18891]
Notes:
Merged: https://github.com/ruby/ruby/pull/6094
|
|
This `NODE` type was used in pre-YARV implementation, to improve
the performance of assignment to dynamic local variable defined at
the innermost scope. It has no longer any actual difference with
`NODE_DASGN`, except for the node dump.
Notes:
Merged: https://github.com/ruby/ruby/pull/5251
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/5091
Merged-By: nobu <[email protected]>
|
|
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.
This change converts the hacky implementation to a normal struct.
Notes:
Merged: https://github.com/ruby/ruby/pull/5136
|
|
|
|
|
|
|
|
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
Notes:
Merged: https://github.com/ruby/ruby/pull/4581
|
|
to make imemo_ast WB-protected again. Only the test is kept.
Notes:
Merged: https://github.com/ruby/ruby/pull/4419
|